mirror of
https://github.com/opencv/opencv.git
synced 2025-06-07 09:25:45 +08:00
![]() Use 4x FMA chains to sum on SIMD 128 FP64 targets. On x86 this showed about 1.4x improvement. For PPC, do a full multiply (32x32->64b), convert to DP then accumulate. This may be slightly less precise for some inputs. But is 1.5x faster than the above which is about 1.5x than the FMA above for ~2.5x speedup. |
||
---|---|---|
.. | ||
opencv2 |