opencv

mirror of https://github.com/opencv/opencv.git synced 2025-06-12 04:12:52 +08:00

History

Andrew Ryrie ea7d4be3f8 Merge pull request #20658 from smbz:lstm_optimisation * dnn: LSTM optimisation This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm. fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications: - Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned. - Allow for weight matrices where the number of columns is not a multiple of 8. I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on. * Fix warning about initialisation order * Remove C++11 syntax * Fix build when AVX(2) is not available In this case the CV_TRY_X macros are defined to 0, rather than being undefined. * Minor changes as requested: - Don't check hardware support for AVX(2) when dispatch is disabled for these - Add braces * Fix out-of-bounds access in fully connected layer The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway). This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems. * Improve tail mask handling - Use static array for generating tail masks (as requested) - Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs * Revert whitespace change * Improve readability of conditions for using AVX * dnn(lstm): minor coding style changes, replaced left aligned load		2021-11-29 21:43:00 +00:00
..
calib3d	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
core	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
cudaarithm	cuda: fix inplace condition in cv::cuda::flip	2021-04-01 02:26:59 +00:00
cudabgsegm	fix test failure on Jetson TX2	2020-04-15 23:25:12 +09:00
cudacodec	cudacodec(build): fix detection in CMake, cleanup duplicate includes	2020-06-17 09:09:40 +00:00
cudafeatures2d
cudafilters	suppress GaussianBlur to generate empty images	2021-10-01 23:17:02 +09:00
cudaimgproc	Remove compiler warnings	2020-08-21 23:52:30 +09:00
cudalegacy	Merge pull request #19390 from tomoaki0705:fixCudaLegacyCalib3d	2021-01-25 13:32:43 +00:00
cudaobjdetect	suppress noisy warning	2019-08-08 21:49:32 +09:00
cudaoptflow	tvl1 cuda optflow optimization	2021-10-27 12:01:53 -07:00
cudastereo
cudawarping
cudev	Merge pull request #16150 from alalek:cmake_avoid_deprecated_link_private	2019-12-13 17:52:40 +03:00
dnn	Merge pull request #20658 from smbz:lstm_optimisation	2021-11-29 21:43:00 +00:00
features2d	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
flann	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
highgui	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
imgcodecs	Add warning message to imread()	2021-11-18 21:19:05 +01:00
imgproc	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
java	Automatically set the correct OpenCV version in build.gradle	2021-10-02 16:06:33 +02:00
js	Fix typos discovered by codespell	2021-11-26 12:29:56 +01:00
ml	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
objdetect	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
photo	add !empty assertion in seamlessClone()	2021-11-18 21:19:05 +01:00
python	pre: OpenCV 3.4.16 (version++)	2021-10-04 20:47:07 +00:00
shape
stitching	fix loop boundary condition	2021-04-20 22:08:01 -04:00
superres	build: eliminate build warnings	2021-08-28 17:11:26 +00:00
ts	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
video	Update perf_bgfg_mog2.cpp, perf_bgfg_knn.cpp	2021-09-25 23:06:50 +03:00
videoio	Merge pull request #21107 from take1014:remove_assert_21038	2021-11-27 18:34:52 +00:00
videostab	backport: fixed warnings produced by clang-9.0.0	2019-09-23 18:36:18 +03:00
viz	Added to Camera constructor parameter description	2020-04-26 00:17:39 -06:00
world
CMakeLists.txt