* use universal intrinsic for accumulate series using float/double
* accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
* add v_cvt_f64_high in both SSE/NEON
* add test for conversion v_cvt_f64_high in test_intrin.cpp
* improve some existing universal intrinsic by using new instructions in Aarch64
* add workaround for Android build in intrin_neon.hpp
* Added 2-channel ops to match existing 3-channel and 4-channel ops
* v_load_deinterleave() and v_store_interleave()
* Implements float32x4 only on SSE (but all types on NEON and CPP)
* Includes tests
* Will be used to vectorize 2D functions, such as estimateAffine2D()