Add per_tensor_quantize to int8 quantize
* add per_tensor_quantize to dnn int8 module.
* change api flag from perTensor to perChannel, and recognize quantize type and onnx importer.
* change the default to hpp
* Added support for the ONNX "ReduceMean" Layer. (as this is the same as the GlobalAveragePool)
* Add ReduceMean test
* Fix ONNX importer
* Fix ReduceMean
* Add assert
* Split test
* Fix split test
Support asymmetric padding in pooling layer (#12519)
* Add Inception_V1 support in ONNX
* Add asymmetric padding in OpenCL and Inference engine
* Refactoring
* another round of dnn optimization:
* increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly
* improved SIMD optimization of pooling layer, optimized average pooling
* cleaned up convolution layer implementation
* made activation layer "attacheable" to all other layers, including fully connected and addition layer.
* fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology.
* greatly optimized permutation layer, which improved SSD performance
* parallelized element-wise binary/ternary/... ops (sum, prod, max)
* also, added missing copyrights to many of the layer implementation files
* temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders