Handle warnings in loongson-related code #24925
See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16
Warnings needs to be handled before we add the loongson server to our CI.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918
relates #24909
relates #24917
relates #24892
Performance changes:
- [x] 12700K (1 thread) + Intel iGPU
|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|
Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).
- [x] 12700K + AMD dGPU
|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|
convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core.
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
Removed all pre-C++11 code, workarounds, and branches #23736
This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required.
It is a nice clean up and simplification.
* No longer unconditionally #include <array> in cvdef.h, include explicitly where needed
* Removed deprecated CV_NODISCARD, already unused in the codebase
* Removed some pre-C++11 workarounds, and simplified some backwards compat defines
* Removed CV_CXX_STD_ARRAY
* Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE
* Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code.
* Updated some documentation consequently.
* Removed all tests of CV_CXX11, now assume it's always true
* Fixed links.
---------
Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
- intrinsics implementation (071) reworked to use modern RVV intrinsics syntax
- cmake toolchain file (071) now allows selecting from predefined configurations
Co-authored-by: Fang Sun <fangsun@linux.alibaba.com>
python: accept path-like objects wherever file names are expected #24773
Merry Christmas, all 🎄
Implements #15731
Support is enabled for all arguments named `filename` or `filepath` (case-insensitive), or annotated with `CV_WRAP_FILE_PATH`.
Support is based on `PyOS_FSPath`, which is available in Python 3.6+. When running on older Python versions the arguments must have a `str` value as before.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
FreeBSD does not have the /proc file system. FreeBSD was added to the code path
for aarch64 before the use of the /proc file system with f7b4b750d8
but then /proc usage was added not long after with b3269b08a1
Fix to convert float32 to int32/uint32 with rounding to nearest (ties to even). #24271
Fix https://github.com/opencv/opencv/issues/24163
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
(carotene is BSD)
Add experimental support for Apple VisionOS platform #24136
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
This is dependent on cmake support for VisionOs which is currently in progress.
Creating PR now to test that there are no regressions in iOS and macOS builds
I do not have more info on the platform as it is internal.
Without this fix, the error is:
core/src/arithm.simd.hpp:868:1: error: too few arguments provided to function-like macro invocation
868 | DEFINE_SIMD_ALL(cmp)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:93:5: note: expanded from macro 'DEFINE_SIMD_ALL'
93 | DEFINE_SIMD_NSAT(fun, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:89:5: note: expanded from macro 'DEFINE_SIMD_NSAT'
89 | DEFINE_SIMD_F64(fun, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:77:9: note: expanded from macro 'DEFINE_SIMD_F64'
77 | DEFINE_NOSIMD(__CV_CAT(fun, 64f), double, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:47:56: note: expanded from macro 'DEFINE_NOSIMD'
47 | DEFINE_NOSIMD_FUN(fun_name, c_type, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:860:9: note: macro 'DEFINE_NOSIMD_FUN' defined here
860 | #define DEFINE_NOSIMD_FUN(fun, _T1, _Tvec, ...) \
This patch change lsx to baseline feature, and lasx to dispatch
feature. Additionally, the runtime detection methods for lasx and
lsx have been modified.
* add Winograd FP16 implementation
* fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime
* fixed some warnings
* hopefully fixed winograd on x64 (and maybe other platforms)
---------
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
Get the SSE2 condition match the emmintrin.h inclusion condition. #24495
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Backport to 4.x: patchNaNs() SIMD acceleration #24480
backport from #23098
connected PR in extra: [#1118@extra](https://github.com/opencv/opencv_extra/pull/1118)
### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test
<details>
<summary>Performance comparison</summary>
Geometric mean (ms)
|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|
</details>
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Refactor ObjectiveC Range class #24454
### Pull Request Readiness Checklist
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
Fix for build issue in #24405
* Optimize some function with lasx.
Optimize some function with lasx. #23929
This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.
* hopefully fixed compile errors
* continue to fix CI
* another attempt to fix build on Linux aarch64
* * reverted to the original method to detect special arm neon instructions without signal()
* renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively
* removed extra whitespaces