opencv/modules/core/include
Francesco Petrogalli 6ee23c9b85
Merge pull request #19486 from fpetrogalli:dotprod_fast-3.4
* [hal][neon] Optimize the v_dotprod_fast intrinsics for aarch64.

On Armv8 in AArch64 execution mode, we can skip the sequence

   v<op>_<ty>(vget_high_<ty>(x), vget_high_<ty>(y))

in favour of

   v<op>_high_<ty>(x, y)

This has better changes for recent compilers to use less data movement
operations and better register allocation. See for example:

   https://godbolt.org/z/bPq7vd

* [hal][neon] Fix build failure on armv7.

* [hal][neon] Address review comments in PR.

PR: https://github.com/opencv/opencv/pull/19486

* [hal][neon] Define macro to check for the AArch64 execution state of Armv8.

* [hal][neon] Fix macro definition for AArch64.

The fix is needed to prevent warnings when building for Armv7.
2021-02-11 13:24:09 +00:00
..
opencv2 Merge pull request #19486 from fpetrogalli:dotprod_fast-3.4 2021-02-11 13:24:09 +00:00