opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-15 01:39:10 +08:00

Author	SHA1	Message	Date
Alexander Smorkalov	db3e5620cd	Merge branch 4.x	2024-04-16 17:28:18 +03:00
Kumataro	b14ea19466	Merge pull request #25351 from Kumataro:fix25073_format_g core: persistence: output reals as human-friendly expression. #25351 Close #25073 Related https://github.com/opencv/opencv/pull/25087 This patch is need to merge same time with https://github.com/opencv/opencv_contrib/pull/3714 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-04-10 15:17:15 +03:00
Alexander Smorkalov	282c762ead	Merge branch 4.x	2024-04-10 11:27:47 +03:00
Alexander Smorkalov	e1ed422bdb	HALL interface for transpose2d.	2024-04-05 14:12:36 +03:00
Dmitry Kurtaev	56d586aa3e	Lest debug checks	2024-04-03 21:55:27 +03:00
Dmitry Kurtaev	357203facd	Resolve out of bound write in RNG::fill	2024-04-03 18:20:45 +03:00
Alexander Smorkalov	cb6d295f15	Merge branch 4.x	2024-04-02 16:39:54 +03:00
Alexander Smorkalov	9f123f8d74	Merge pull request #25285 from johnteslade:cgroupsv2-support core: Add cgroupsv2 support to parallel.cpp	2024-03-30 11:26:23 +03:00
Alexander Smorkalov	7945f2cf40	Fixed HAL invocation for DCT.	2024-03-29 11:01:42 +03:00
John Slade	7f1140b48b	core: Add cgroupsv2 support to parallel.cpp The parallel code works out how many CPUs are on the system by checking the quota it has been assigned in the Linux cgroup. The existing code works under cgroups v1 but the file structure changed in cgroups v2. From [1]: "cpu.cfs_quota_us" and "cpu.cfs_period_us" are replaced by "cpu.max" which contains both quota and period. This commit add support to parallel so it will read from the cgroups v2 location. v1 support is still retained. Resolves #25284 [1] `0d5936344f`	2024-03-28 11:52:47 +00:00
Pierre Chatelier	1a537ab98f	Merge pull request #24893 from chacha21:cart_polar_inplace Added in-place support for cartToPolar and polarToCart #24893 - a fused hal::cartToPolar[32\|64]f() is used instead of sequential hal::magnitude[32\|64]f/hal::fastAtan[32\|64]f - ipp_polarToCart is skipped for in-place processing (it seems not to support it correctly) relates to #24891 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-03-26 15:38:17 +03:00
Abdul Rahman ArM	55426ee195	Merge pull request #25197 from invarrow:invbranch-cleanup Remove OpenVX #25197 resolves https://github.com/opencv/opencv/issues/24995 OpenCV cleanup https://github.com/opencv/opencv/issues/25007	2024-03-26 15:17:18 +03:00
Yuantao Feng	8e342f8857	5.x core: rename `cv::bfloat16_t` to `cv::bfloat` (#25232 ) * rename cv::bfloat16_t to cv::bfloat * clean class bfloat	2024-03-22 03:45:59 +03:00
Yuantao Feng	3afe8ddaf8	core: Rename `cv::float16_t` to `cv::hfloat` (#25217 ) * rename cv::float16_t to cv::fp16_t * add typedef fp16_t float16_t * remove zero(), bits() from fp16_t class * fp16_t -> hfloat * remove cv::float16_t::fromBits; add hfloatFromBits * undo changes in conv_winograd_f63.simd.hpp and conv_block.simd.hpp * undo some changes in dnn	2024-03-21 23:44:19 +03:00
alexlyulkov	85cc02f4de	Allowed int64 constants in ONNX parser (#25148 ) * Removed automatic int64 to int32 conversion in ONNX parser * Fixed wrong rebase code * added tests, minor fixes * fixed Cast layer * Fixed Cast layer for fp16 backend * Fixed Cast layer for fp16 backend * Fixed Cast layer for fp16 backend * Allowed uint32, int64, uint64 types in OpenCL * Fixed Cast layer for fp16 backend * Use randu in test_int --------- Co-authored-by: Alexander Lyulkov <alexander.lyulkov@opencv.ai>	2024-03-13 11:48:23 +03:00
Maksim Shabunin	8cbdd0c833	Merge pull request #25075 from mshabunin:cleanup-imgproc-1 C-API cleanup: apps, imgproc_c and some constants #25075 Merge with https://github.com/opencv/opencv_contrib/pull/3642 * Removed obsolete apps - traincascade and createsamples (please use older OpenCV versions if you need them). These apps relied heavily on C-API * removed all mentions of imgproc C-API headers (imgproc_c.h, types_c.h) - they were empty, included core C-API headers * replaced usage of several C constants with C++ ones (error codes, norm modes, RNG modes, PCA modes, ...) - most part of this PR (split into two parts - all modules and calib+3d - for easier backporting) * removed imgproc C-API headers (as separate commit, so that other changes could be backported to 4.x) Most of these changes can be backported to 4.x.	2024-03-05 12:18:31 +03:00
Alexander Smorkalov	daa8f7dfc6	Partially back-port #25075 to 4.x	2024-03-05 12:15:39 +03:00
Alexander Smorkalov	cb7d38b477	Merge branch 4.x	2024-02-26 18:05:36 +03:00
Vincent Rabaud	f8aa2896a1	Merge pull request #25024 from vrabaud:neon Replace legacy __ARM_NEON__ by __ARM_NEON #25024 Even ACLE 1.1 referes to __ARM_NEON https://developer.arm.com/documentation/ihi0053/b/?lang=en ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-02-20 11:29:23 +03:00
Alexander Smorkalov	52be0b64fb	Fixed possible out-of-bound access in cv::Mat output formatter.	2024-02-13 17:34:40 +03:00
Alexander Smorkalov	3a55f50133	Merge branch 4.x	2024-02-12 14:20:35 +03:00
Vadim Pisarevsky	1d18aba587	Extended several core functions to support new types (#24962 ) * started adding support for new types (16f, 16bf, 32u, 64u, 64s) to arithmetic functions * fixed several tests; refactored and extended sum(), extended inRange(). * extended countNonZero(), mean(), meanStdDev(), minMaxIdx(), norm() and sum() to support new types (F16, BF16, U32, U64, S64) * put missing CV_DEPTH_MAX to some function dispatcher tables * extended findnonzero, hasnonzero with the new types support * extended mixChannels() to support new types * minor fix * fixed a few compile errors on Linux and a few failures in core tests * fixed a few more warnings and test failures * trying to fix the remaining warnings and test failures. The test `MulTestGPU.MathOpTest` was disabled - not clear whether to set tolerance - it's not bit-exact operation, as possibly assumed by the test, due to the use of scale and possibly limited accuracy of the intermediate floating-point calculations. * found that in the current snapshot G-API produces incorrect results in Mul, Div and AddWeighted (at least when using OpenCL on Windows x64 or MacOS x64). Disabled the respective tests.	2024-02-11 10:42:41 +03:00
ryanking13	422d519703	Enable file system on Emscripten	2024-01-31 11:28:59 -08:00
Alexander Alekhin	40533dbf69	Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918 relates #24909 relates #24917 relates #24892 Performance changes: - [x] 12700K (1 thread) + Intel iGPU \|Name of Test\|noOCL\|convertFp16\|convertTo BASE\|convertTo PATCH\| \|---\|:-:\|:-:\|:-:\|:-:\| \|ConvertFP16FP32MatMat::OCL_Core\|3.130\|3.152\|3.127\|3.136\| \|ConvertFP16FP32MatUMat::OCL_Core\|3.030\|3.996\|3.007\|2.671\| \|ConvertFP16FP32UMatMat::OCL_Core\|3.010\|3.101\|3.056\|2.854\| \|ConvertFP16FP32UMatUMat::OCL_Core\|3.016\|3.298\|2.072\|2.061\| \|ConvertFP32FP16MatMat::OCL_Core\|2.697\|2.652\|2.723\|2.721\| \|ConvertFP32FP16MatUMat::OCL_Core\|2.752\|4.268\|2.662\|2.947\| \|ConvertFP32FP16UMatMat::OCL_Core\|2.706\|2.601\|2.603\|2.528\| \|ConvertFP32FP16UMatUMat::OCL_Core\|2.704\|3.215\|1.999\|1.988\| Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map). There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization). - [x] 12700K + AMD dGPU \|Name of Test\|noOCL\|convertFp16 dGPU\|convertTo BASE dGPU\|convertTo PATCH dGPU\| \|---\|:-:\|:-:\|:-:\|:-:\| \|ConvertFP16FP32MatMat::OCL_Core\|3.130\|3.133\|3.172\|3.087\| \|ConvertFP16FP32MatUMat::OCL_Core\|3.030\|1.713\|9.559\|1.729\| \|ConvertFP16FP32UMatMat::OCL_Core\|3.010\|6.515\|6.309\|4.452\| \|ConvertFP16FP32UMatUMat::OCL_Core\|3.016\|0.242\|23.597\|0.170\| \|ConvertFP32FP16MatMat::OCL_Core\|2.697\|2.641\|2.713\|2.689\| \|ConvertFP32FP16MatUMat::OCL_Core\|2.752\|4.076\|6.483\|4.191\| \|ConvertFP32FP16UMatMat::OCL_Core\|2.706\|9.042\|16.481\|1.834\| \|ConvertFP32FP16UMatUMat::OCL_Core\|2.704\|0.229\|15.730\|0.176\| convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED. dGPU has much more power, so results are x16-17 better than single cpu core. Patched version is not worse than convertFp16 and convertTo baseline. There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers. Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>	2024-01-26 12:56:52 +03:00
Alexander Smorkalov	decf6538a2	Merge branch 4.x	2024-01-23 17:06:52 +03:00
Alexander Smorkalov	c739117a7c	Merge branch 4.x	2024-01-19 17:32:22 +03:00
Sean McBride	e64857c561	Merge pull request #23736 from seanm:c++11-simplifications Removed all pre-C++11 code, workarounds, and branches #23736 This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required. It is a nice clean up and simplification. * No longer unconditionally #include <array> in cvdef.h, include explicitly where needed * Removed deprecated CV_NODISCARD, already unused in the codebase * Removed some pre-C++11 workarounds, and simplified some backwards compat defines * Removed CV_CXX_STD_ARRAY * Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE * Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code. * Updated some documentation consequently. * Removed all tests of CV_CXX11, now assume it's always true * Fixed links. --------- Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com> Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>	2024-01-19 16:53:08 +03:00
Zhuo Zhang	37b02d170f	fix qnx-sdp-700 build based on https://github.com/opencv/opencv/pull/24864	2024-01-17 21:49:13 +08:00
Zhuo Zhang	b04de14fbb	Fix QNX build Based on https://github.com/opencv/opencv/issues/24567	2024-01-16 13:51:22 +08:00
Brad Smith	3b287770b9	Corrections for FreeBSD ARM support FreeBSD does not have the /proc file system. FreeBSD was added to the code path for aarch64 before the use of the /proc file system with `f7b4b750d8` but then /proc usage was added not long after with `b3269b08a1`	2024-01-06 20:09:36 -05:00
Brad Smith	34a871c855	Fix building on OpenBSD X86	2024-01-06 01:41:02 -05:00
Maksim Shabunin	adde942e34	OCL: fix incompatibility with Mali ruintime	2023-12-21 00:30:44 +03:00
Alexander Smorkalov	408730b7ab	Merge pull request #24618 from vrabaud:compilation Fix compilation on some 32-bit windows	2023-12-01 09:10:30 +03:00
Vincent Rabaud	0812659e92	Fix compilation on some 32-bit windows I do not have more info on the platform as it is internal. Without this fix, the error is: core/src/arithm.simd.hpp:868:1: error: too few arguments provided to function-like macro invocation 868 \| DEFINE_SIMD_ALL(cmp) \| ^ ./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:93:5: note: expanded from macro 'DEFINE_SIMD_ALL' 93 \| DEFINE_SIMD_NSAT(fun, __VA_ARGS__) \| ^ ./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:89:5: note: expanded from macro 'DEFINE_SIMD_NSAT' 89 \| DEFINE_SIMD_F64(fun, __VA_ARGS__) \| ^ ./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:77:9: note: expanded from macro 'DEFINE_SIMD_F64' 77 \| DEFINE_NOSIMD(__CV_CAT(fun, 64f), double, __VA_ARGS__) \| ^ ./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:47:56: note: expanded from macro 'DEFINE_NOSIMD' 47 \| DEFINE_NOSIMD_FUN(fun_name, c_type, __VA_ARGS__) \| ^ ./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:860:9: note: macro 'DEFINE_NOSIMD_FUN' defined here 860 \| #define DEFINE_NOSIMD_FUN(fun, _T1, _Tvec, ...) \	2023-11-29 16:27:11 +01:00
Kumataro	bae435a5a7	Merge pull request #24578 from Kumataro:fix_verify_unsupported_new_mat_depth Fix verify unsupported new mat depth for nonzero/minmax/lut #24578 `cv::LUI()`, `cv::minMaxLoc()`, `cv::minMaxIdx()`, `cv::countNonZero()`, `cv::findNonZero()` and `cv::hasNonZero()` uses depth-based function table. However, it is too short for `CV_16BF`, `CV_Bool`, `CV_64U`, `CV_64S` and `CV_32U` and it may occur out-boundary-access. This patch fix it. And If necessary, when someone extends these functions to support, please relax this test. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-23 12:15:58 +03:00
Hao Chen	c19adb4953	Change the lsx to baseline features. This patch change lsx to baseline feature, and lasx to dispatch feature. Additionally, the runtime detection methods for lasx and lsx have been modified.	2023-11-21 11:51:22 +08:00
Rostislav Vasilikhin	53aad98a1a	Merge pull request #23098 from savuor:nanMask finiteMask() and doubles for patchNaNs() #23098 Related to #22826 Connected PR in extra: [#1037@extra](https://github.com/opencv/opencv_extra/pull/1037) ### TODOs: - [ ] Vectorize `finiteMask()` for 64FC3 and 64FC4 ### Changes This PR: * adds a new function `finiteMask()` * extends `patchNaNs()` by CV_64F support * moves `patchNaNs()` and `finiteMask()` to a separate file NOTE: now the function is called `finiteMask()` as discussed with the OpenCV core team ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-09 10:32:47 +03:00
Alexander Smorkalov	34f34f6227	Merge branch 4.x	2023-11-08 14:39:48 +03:00
Rostislav Vasilikhin	ea47cb3ffe	Merge pull request #24480 from savuor:backport_patch_nans Backport to 4.x: patchNaNs() SIMD acceleration #24480 backport from #23098 connected PR in extra: [#1118@extra](https://github.com/opencv/opencv_extra/pull/1118) ### This PR contains: * new SIMD code for `patchNaNs()` * CPU perf test <details> <summary>Performance comparison</summary> Geometric mean (ms) \|Name of Test\|noopt\|sse2\|avx2\|sse2 vs noopt (x-factor)\|avx2 vs noopt (x-factor)\| \|---\|:-:\|:-:\|:-:\|:-:\|:-:\| \|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)\|0.019\|0.017\|0.018\|1.11\|1.07\| \|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)\|0.037\|0.037\|0.033\|1.00\|1.10\| \|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)\|0.032\|0.032\|0.033\|0.99\|0.98\| \|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)\|0.072\|0.072\|0.070\|1.00\|1.03\| \|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)\|0.051\|0.051\|0.050\|1.00\|1.01\| \|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)\|0.137\|0.138\|0.128\|0.99\|1.06\| \|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)\|0.137\|0.128\|0.129\|1.07\|1.06\| \|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)\|0.450\|0.450\|0.448\|1.00\|1.01\| \|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)\|0.149\|0.029\|0.020\|5.13\|7.44\| \|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)\|0.304\|0.058\|0.040\|5.25\|7.65\| \|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)\|0.448\|0.086\|0.059\|5.22\|7.55\| \|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)\|0.601\|0.133\|0.083\|4.51\|7.23\| \|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)\|0.451\|0.093\|0.060\|4.83\|7.52\| \|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)\|0.892\|0.184\|0.126\|4.85\|7.06\| \|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)\|1.345\|0.311\|0.230\|4.32\|5.84\| \|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)\|1.831\|0.546\|0.436\|3.35\|4.20\| \|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)\|1.017\|0.250\|0.160\|4.06\|6.35\| \|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)\|2.077\|0.646\|0.605\|3.21\|3.43\| \|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)\|3.134\|1.053\|0.961\|2.97\|3.26\| \|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)\|4.222\|1.436\|1.288\|2.94\|3.28\| \|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)\|4.225\|1.401\|1.277\|3.01\|3.31\| \|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)\|8.310\|2.953\|2.635\|2.81\|3.15\| \|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)\|12.396\|4.455\|4.252\|2.78\|2.92\| \|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)\|17.174\|5.831\|5.824\|2.95\|2.95\| </details> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-03 08:58:07 +03:00
Alexander Smorkalov	97620c053f	Merge branch 4.x	2023-10-23 11:53:04 +03:00
CNClareChen	d142a796d8	Merge pull request #23929 from CNClareChen:4.x * Optimize some function with lasx. Optimize some function with lasx. #23929 This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-20 14:20:09 +03:00
Alexander Smorkalov	1c0ca41b6e	Merge pull request #24371 from hanliutong:clean-up Clean up the obsolete API of Universal Intrinsic	2023-10-20 12:50:26 +03:00
Vadim Pisarevsky	ba4d6c859d	added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16) (#24420 ) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces	2023-10-18 22:06:20 +03:00
Liutong HAN	a287605c3e	Clean up the Universal Intrinsic API.	2023-10-13 19:23:30 +08:00
Sean McBride	5fb3869775	Merge pull request #23109 from seanm:misc-warnings * Fixed clang -Wnewline-eof warnings * Fixed all trivial clang -Wextra-semi and -Wc++98-compat-extra-semi warnings * Removed trailing semi from various macros * Fixed various -Wunused-macros warnings * Fixed some trivial -Wdocumentation warnings * Fixed some -Wdocumentation-deprecated-sync warnings * Fixed incorrect indentation * Suppressed some clang warnings in 3rd party code * Fixed QRCodeEncoder::Params documentation. --------- Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>	2023-10-06 13:33:21 +03:00
jvuillaumier	24fd39538e	Merge pull request #24233 from jvuillaumier:rotate_flip_hal_hooks Add HAL implementation hooks to cv::flip() and cv::rotate() functions from core module #24233 Hello, This change proposes the addition of HAL hooks for cv::flip() and cv::rotate() functions from OpenCV core module. Flip and rotation are functions commonly available from 2D hardware accelerators. This is convenient provision to enable custom optimized implementation of image flip/rotation on systems embedding such accelerator. Thank you ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-10-06 12:31:53 +03:00
HAN Liutong	07bf9cb013	Merge pull request #24325 from hanliutong:rewrite Rewrite Universal Intrinsic code: float related part #24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: #23885 First patch, an example #23980 Core module #24058 ImgProc module, part 1 #24132 ImgProc module, part 2 #24166 ImgProc module, part 3 #24301 Features2d and calib3d module #24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD \|\| CV_SIMD_SCALABLE)` instead of `#if CV_SIMD \|\| CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-10-05 17:57:25 +03:00
Alexander Smorkalov	163d544ecf	Merge branch 4.x	2023-10-02 10:17:23 +03:00
casualwinds	7b399c4248	Merge pull request #24280 from casualwind:parallel_opt Optimization for parallelization when large core number #24280 Problem description： When the number of cores is large, OpenCV’s thread library may reduce performance when processing parallel jobs. The reason for this problem: When the number of cores (the thread pool initialized the threads, whose number is as same as the number of cores) is large, the main thread will spend too much time on waking up unnecessary threads. When a parallel job needs to be executed, the main thread will wake up all threads in sequence, and then wait for the signal for the job completion after waking up all threads. When the number of threads is larger than the parallel number of a job slices, there will be a situation where the main thread wakes up the threads in sequence and the awakened threads have completed the job, but the main thread is still waking up the other threads. The threads woken up by the main thread after this have nothing to do, and the broadcasts made by the waking threads take a lot of time, which reduce the performance. Solution： Reduce the time for the process of main thread waking up the worker threads through the following two methods: • The number of threads awakened by the main thread should be adjusted according to the parallel number of a job slices. If the number of threads is greater than the number of the parallel number of job slices, the total number of threads awakened should be reduced. • In the process of waking up threads in sequence, if the main thread finds that all parallel job slices have been allocated, it will jump out of the loop in time and wait for the signal for the job completion. Performance Test: The tests were run in the manner described by https://github.com/opencv/opencv/wiki/HowToUsePerfTests. At core number = 160, There are big performance gain in some cases. Take the following cases in the video module as examples: OpticalFlowPyrLK_self::Path_Idx_Cn_NPoints_WSize_Deriv::("cv/optflow/frames/VGA_%02d.png", 2, 1, (9, 9), 11, true) Performance improves 191%:0.185405ms ->0.0636496ms perf::DenseOpticalFlow_VariationalRefinement::(320x240, 10, 10) Performance improves 112%:23.88938ms -> 11.2562ms Among all the modules, the performance improvement is greatest on module video, and there are also certain improvements on other modules. At core number = 160, the times labeled below are the geometric mean of the average time of all cases for one module. The optimization is available on each module. overall \| time(ms) \| \| \| \| \| \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- module name \| gapi \| dnn \| features2d \| objdetect \| core \| imgproc \| stitching \| video original \| 0.185 \| 1.586 \| 9.998 \| 11.846 \| 0.205 \| 0.215 \| 164.409 \| 0.803 optimized \| 0.174 \| 1.353 \| 9.535 \| 11.105 \| 0.199 \| 0.185 \| 153.972 \| 0.489 Performance improves \| 6% \| 17% \| 5% \| 7% \| 3% \| 16% \| 7% \| 64% Meanwhile, It is found that adjusting the order of test cases will have an impact on some test cases. For example, we used option --gtest-shuffle to run opencv_perf_gapi, the performance of TestPerformance::CmpWithScalarPerfTestFluid/CmpWithScalarPerfTest::(compare_f, CMP_GE, 1920x1080, 32FC1, { gapi.kernel_package }) case had 30% changes compared to the case without shuffle. I would like to ask if you have also encountered such a situation and could you share your experience? ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-09-27 16:21:20 +03:00
Maksim Shabunin	c3a37d0fcb	RISC-V: fix compilation in RVV scalable mode	2023-09-22 21:08:33 +03:00

1 2 3 4 5 ...

3079 Commits