opencv/modules/core/include/opencv2
Paul E. Murphy 33fb253a66 core: vectorize dotProd_32s
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.
2019-08-20 15:28:36 -05:00
..
core core: vectorize dotProd_32s 2019-08-20 15:28:36 -05:00
core.hpp Merge pull request #14440 from alalek:async_array 2019-06-08 20:57:15 +00:00