opencv

mirror of https://github.com/opencv/opencv.git synced 2025-07-24 14:06:27 +08:00

History

Paul E. Murphy 33fb253a66 core: vectorize dotProd_32s Use 4x FMA chains to sum on SIMD 128 FP64 targets. On x86 this showed about 1.4x improvement. For PPC, do a full multiply (32x32->64b), convert to DP then accumulate. This may be slightly less precise for some inputs. But is 1.5x faster than the above which is about 1.5x than the FMA above for ~2.5x speedup.	2019-08-20 15:28:36 -05:00
..
core	core: vectorize dotProd_32s	2019-08-20 15:28:36 -05:00
core.hpp	Merge pull request #14440 from alalek:async_array	2019-06-08 20:57:15 +00:00

Paul E. Murphy 33fb253a66 core: vectorize dotProd_32s

Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.

2019-08-20 15:28:36 -05:00

core

core: vectorize dotProd_32s

2019-08-20 15:28:36 -05:00

core.hpp

Merge pull request #14440 from alalek:async_array

2019-06-08 20:57:15 +00:00