* Fix compile against lapack-3.10.0
Fix compilation against lapack >= 3.9.1 and 3.10.0 while not breaking older versions
OpenCVFindLAPACK.cmake & CMakeLists.txt: determine OPENCV_USE_LAPACK_PREFIX from LAPACK_VERSION
hal_internal.cpp : Only apply LAPACK_FUNC to functions whose number of inputs depends on LAPACK_FORTRAN_STR_LEN in lapack >= 3.9.1
lapack_check.cpp : remove LAPACK_FUNC which is not OK as function are not used with input parameters (so lapack.h preprocessing of "LAPACK_xxxx(...)" is not applicable with lapack >= 3.9.1
If not removed lapack_check fails so LAPACK is deactivated in build (not want we want)
use OCV_ prefix and don't use Global, instead generate OCV_LAPACK_FUNC depending on CMake Conditions
Remove CONFIG from find_package(LAPACK) and use LAPACK_GLOBAL and LAPACK_NAME to figure out if using netlib's reference LAPACK implementation and how to #define OCV_LAPACK_FUNC(f)
* Fix typos and grammar in comments
Fixed threshold(THRESH_TOZERO) at imgproc(IPP)
* Fixed#16085: imgproc(IPP): wrong result from threshold(THRESH_TOZERO)
* 1. Added test cases with float where all bits of mantissa equal 1, min and max float as inputs
2. Used nextafterf instead of cast to hex
* Used float value in test instead of hex and casts
* Changed input value in test
When computing:
t1 = (bayer[1] + bayer[bayer_step] + bayer[bayer_step+2] + bayer[bayer_step*2+1])*G2Y;
there is a T (unsigned short or char) multiplied by an int which can overflow.
Then again, it is stored to t1 which is unsigned so the overflow disappears.
Keeping all unsigned is safer.
Fow now, it is possible to define valid rectangle for which some
functions overflow (e.g. br(), ares() ...).
This patch fixes the intersection operator so that it works with
any rectangle.
* Fix integer overflow in cv::Luv2RGBinteger::process.
For LL=49, uu=205, vv=23, we end up with x=7373056 and y=458
which overflows y*x.
* imgproc(test): adjust test parameters to cover SIMD code
* dnn: LSTM optimisation
This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.
fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications:
- Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
- Allow for weight matrices where the number of columns is not a multiple of 8.
I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.
* Fix warning about initialisation order
* Remove C++11 syntax
* Fix build when AVX(2) is not available
In this case the CV_TRY_X macros are defined to 0, rather than being undefined.
* Minor changes as requested:
- Don't check hardware support for AVX(2) when dispatch is disabled for these
- Add braces
* Fix out-of-bounds access in fully connected layer
The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).
This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.
* Improve tail mask handling
- Use static array for generating tail masks (as requested)
- Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs
* Revert whitespace change
* Improve readability of conditions for using AVX
* dnn(lstm): minor coding style changes, replaced left aligned load