mirror of
https://github.com/opencv/opencv.git
synced 2025-06-11 11:45:30 +08:00
![]() Use 4x FMA chains to sum on SIMD 128 FP64 targets. On x86 this showed about 1.4x improvement. For PPC, do a full multiply (32x32->64b), convert to DP then accumulate. This may be slightly less precise for some inputs. But is 1.5x faster than the above which is about 1.5x than the FMA above for ~2.5x speedup. |
||
---|---|---|
.. | ||
core | ||
core.hpp |