* use universal intrinsic instead of raw intrinsic
* add 2 channels de-interleave on x86 platform
* add v_int32x4 version of v_muladd
* add accumulate version of v_dotprod based on the commit from seiko2plus on bf1852d
* remove some verify check in performance test
* avoid the out of boundary access and keep the performance
- removed tr1 usage (dropped in C++17)
- moved includes of vector/map/iostream/limits into ts.hpp
- require opencv_test + anonymous namespace (added compile check)
- fixed norm() usage (must be from cvtest::norm for checks) and other conflict functions
- added missing license headers