Commit Graph

54 Commits

Author SHA1 Message Date
Alexander Smorkalov
daa8f7dfc6 Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
Maksim Shabunin
adde942e34 OCL: fix incompatibility with Mali ruintime 2023-12-21 00:30:44 +03:00
Alexander Smorkalov
1c0ca41b6e
Merge pull request #24371 from hanliutong:clean-up
Clean up the obsolete API of Universal Intrinsic
2023-10-20 12:50:26 +03:00
Vincent Rabaud
c96f48e7c9
Merge pull request #24412 from vrabaud:inter_area1
Speed up line merging in INTER_AREA #24412

This provides a 10 to 20% speed-up.

Related perf test fix: https://github.com/opencv/opencv/pull/24417
This is a split of https://github.com/opencv/opencv/pull/23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-19 14:06:50 +03:00
Liutong HAN
a287605c3e Clean up the Universal Intrinsic API. 2023-10-13 19:23:30 +08:00
HAN Liutong
5e9191558d
Merge pull request #24058 from hanliutong:rewrite-imgporc
Rewrite Universal Intrinsic code by using new API: ImgProc module. #24058

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. 

For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR https://github.com/opencv/opencv/pull/23885 and https://github.com/opencv/opencv/pull/23980.



### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-09-14 20:37:46 +03:00
Dmitry Kurtaev
c92135bdd1
Merge pull request #23634 from dkurt:fix_nearest_exact
Fix even input dimensions for INTER_NEAREST_EXACT #23634

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/22204
related: https://github.com/opencv/opencv/issues/9096#issuecomment-1551306017

/cc @Yosshi999

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-05-19 20:32:04 +03:00
Sean McBride
58e4a880a2 Deprecated convertTypeStr and made new variant that also takes the buffer size
This allows removing the unsafe sprintf.
2023-04-26 09:48:15 -04:00
Sean McBride
47bea69322
Merge pull request #23055 from seanm:sprintf2
* Replaced most remaining sprintf with snprintf
* Deprecated encodeFormat and introduced new method that takes the buffer length
* Also increased buffer size at call sites to be a little bigger, in case int is 64 bit
2023-04-18 09:22:59 +03:00
Sean McBride
1829eba584 Fixed most clang -Wextra-semi warnings 2022-09-27 18:06:46 -04:00
wxsheng
4154bd0667
Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX
* Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX
* Add resize.lasx.cpp for Loongson SIMD acceleration
* Add imgwarp.lasx.cpp for Loongson SIMD acceleration
* Add LASX acceleration support for dnn/conv
* Add CV_PAUSE(v) for Loongarch
* Set LASX by default on Loongarch64
* LoongArch: tune test threshold for Core/HAL.mat_decomp/15

Co-authored-by: shengwenxue <shengwenxue@loongson.cn>
2022-09-10 09:39:43 +03:00
Alexander Alekhin
659cf7249e imgproc(ocl): fix resizeLN, avoid integer overflow 2021-12-09 20:30:26 +00:00
yuki takehara
a6277370ca
Merge pull request #21107 from take1014:remove_assert_21038
resolves #21038

* remove C assert

* revert C header

* fix several points in review

* fix test_ds.cpp
2021-11-27 18:34:52 +00:00
Yosshi999
7495a4722f
Merge pull request #18053 from Yosshi999:bit-exact-resizeNN
Bit-exact Nearest Neighbor Resizing

* bit exact resizeNN

* change the value of method enum

* add bitexact-nn to ResizeExactTest

* test to compare with non-exact version

* add perf for bit-exact resizenn

* use cvFloor-equivalent

* 1/3 scaling is not stable for floating calculation

* stricter test

* bugfix: broken data in case of 6 or 12bytes elements

* bugfix: broken data in default pix_size

* stricter threshold

* use raw() for floor

* use double instead of int

* follow code reviews

* fewer cases in perf test

* center pixel convention
2020-08-28 21:20:05 +03:00
Alexander Alekhin
61c4cfd896 imgproc(resize): drop unused 'pix_size4' 2020-03-29 02:41:50 +00:00
Alexander Alekhin
be17f532e1 imgproc(resize): fix resizeNNInvoker handling of generic pixel size 2020-03-29 02:41:41 +00:00
Alexander Alekhin
3a546aa380 imgproc: revert resize changes from PR 16497 2020-02-17 15:23:59 +03:00
keeper121
d84360e7f3
Merge pull request #16497 from keeper121:master
* Fix NN resize with dimentions > 4

* add test check for nn resize with channels > 4

* Change types from float to double

* Del unnecessary test file. Move nn test to test_imgwarp. Add 5 channels test only.
2020-02-16 19:33:25 +03:00
Alexander Alekhin
f67c8e37d6 imgproc(resize): drop optimization for channels>4 2020-02-04 17:14:52 +03:00
Paul E. Murphy
c1cdb2416a imgproc(resize): improve 8u3 HResize vector exit calc
Actually, we can do this in constant time. xofs always
contains same or increasing offset values. We can instead
find the most extreme value used and never attempt to load it.

Similarly, we can note for all dx >= 0 and dx < (dwidth - cn)
where xofs[dx] + cn < xofs[dwidth-cn] implies dx < (dwidth - cn).

Thus, we can use this to control our loop termination optimally.

This fixes #16137 with little or no performance impact. I have
also added a debug check as a sanity check.
2020-01-03 14:46:59 -06:00
Alexander Alekhin
07729e396d imgproc(resize): avoid unnecessary type conversions 2019-12-26 00:02:52 +00:00
Alexander Alekhin
4733a19bab
Merge pull request #16194 from alalek:fix_16192
* imgproc(test): resize(LANCZOS4) reproducer 16192

* imgproc: fix resize LANCZOS4 coefficients generation
2019-12-19 13:20:42 +03:00
Vitaly Tuzov
f5a84f75c4 Fix for CV_8UC2 linear resize vectorization 2019-12-18 21:41:36 +00:00
Paul Murphy
1c4a64f0a1 Merge pull request #16138 from pmur:reg_16137
* imgproc: Prevent 1B overrun of 8C3 SIMD optimization

The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.

The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.

Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.

This should resolve #16137

* imgproc(resize): extra check
2019-12-12 13:00:44 +03:00
Paul Murphy
a011035ed6 Merge pull request #15257 from pmur:resize
* resize: HResizeLinear reduce duplicate work

There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.

Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.

The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this.  The performance
improvement is 10-50% depending on the scaling.

* imgproc: vectorize HResizeLinear

Performance is mostly gated by the gather operations
for x inputs.

Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.

While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.

For float types, this results in a more modest
1.2x improvement.

* Update U8 processing for non-bitexact linear resize

* core: hal: vsx: improve v_load_expand_q

With a little help, we can do this quickly without gprs on
all VSX enabled targets.

* resize: Fix cn == 3 step per feedback

Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
2019-12-09 14:54:06 +03:00
clunietp
2185bce4b7 Fix 13577 2019-11-18 07:41:34 -05:00
Vitaly Tuzov
3b015dfc7d Merge pull request #14210 from terfendail:wui_512
AVX512 wide universal intrinsics (#14210)

* Added implementation of 512-bit wide universal intrinsics(WIP)

* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks

* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros

* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines

* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.

* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build

* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
2019-06-03 18:05:35 +03:00
Vitaly Tuzov
99b39aa5bd Fixed out of bound reading in LINEAR_EXACT resize for 8UC3 2019-03-05 17:21:21 +03:00
Vitaly Tuzov
334c4d62b5 Merge pull request #13781 from terfendail:warp_wintr
Resize reworked using wide universal intrinsics (#13781)

* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize

* Reworked linear resize using new wide LUT intrinsics

* Fix for VSX intrinsics
2019-02-20 14:30:28 +03:00
Alexander Alekhin
2d5ccc7b3e imgproc(resize): update checks (static analyzers) 2018-12-03 13:13:48 +03:00
maver1
e397434cb6 Merge pull request #12877 from maver1:3.4
* Updated ICV packages and IPP integration

* core(test): minMaxIdx IPP regression test

* core(ipp): workaround minMaxIdx problem

* core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun

* Returned semicolon after CV_INSTRUMENT_REGION_IPP()
2018-10-24 15:02:53 +03:00
Michał Janiszewski
c8e6ce304f Catch exceptions by const-reference
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
2018-10-16 22:43:54 +02:00
take1014
24af70c7e0 resolves 11283 2018-10-12 23:08:25 +09:00
Vitaly Tuzov
9d602f2752 Replaced SSE2 area resize implementation with wide universal intrinsic implementation 2018-10-08 16:27:52 +03:00
Alexander Alekhin
92ec971453 Merge pull request #12526 from terfendail:avx2_resize_fix 2018-09-14 15:57:47 +00:00
Hamdi Sahloul
5d54def264 Add semicolons after CV_INSTRUMENT macros 2018-09-14 06:45:31 +09:00
Vitaly Tuzov
29770e13e8 Fixed bit-exact resize SIMD implementation for AVX2 baseline 2018-09-13 18:20:27 +03:00
Hamdi Sahloul
a39e0daacf Utilize CV_UNUSED macro 2018-09-07 20:33:52 +09:00
Vitaly Tuzov
f9a5c4d181 Fixed bit-exact resize wide intrinsics implementation for 16U 2018-09-03 20:37:25 +03:00
Vitaly Tuzov
e345cb03d5 Bit-exact resize reworked to use wide intrinsics (#12038)
* Bit-exact resize reworked to use wide intrinsics

* Reworked bit-exact resize row data loading

* Added bit-exact resize row data loaders for SIMD256 and SIMD512

* Fixed type punned pointer dereferencing warning

* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize
2018-08-31 16:54:05 +03:00
Alexander Alekhin
b09a4a98d4 opencv: Use cv::AutoBuffer<>::data() 2018-07-04 19:11:29 +03:00
gnthibault
b46fef327e Fixed Assertin error due to Size.area() overflowing 2018-06-08 11:22:36 +02:00
Alexander Alekhin
5d36ee2fe7 imgproc: apply CV_OVERRIDE/CV_FINAL 2018-03-28 17:57:59 +03:00
Maksim Shabunin
8b87c4b96a Fixed several warnings produced by clang 6 and static analyzers 2018-01-16 15:26:28 +03:00
Alexander Alekhin
8acd05f12a Merge pull request #10421 from cezheng:patch-1 2018-01-05 09:24:33 +00:00
Vadim Pisarevsky
3f68d6d8a7 Merge pull request #10392 from terfendail:bitexact_fallback 2017-12-22 13:23:55 +00:00
Vitaly Tuzov
5fdb42a7c9 Added fallback to generic linear resize in case bit-exact resize of provided matrix isn't supported 2017-12-22 14:29:50 +03:00
Ce Zheng
602b08d9c7
Update resize inline comments
Reading through the implementation, I feel this line of comment is not consistent with the actually code, so this is for correcting it.
2017-12-22 16:03:12 +08:00
Vitaly Tuzov
019162486c Disabled universal intrinsic based implementation for bit-exact resize of 3-channel images 2017-12-22 10:08:30 +03:00
Vitaly Tuzov
1eb2fa9efb Added universal intrinsics based implementations for CV_8UC2, CV_8UC3, CV_8UC4 bit-exact resizes. 2017-12-20 17:17:10 +03:00