Commit Graph

55 Commits

Author SHA1 Message Date
Everton Constantino
75315fb297 Merge pull request #15494 from everton1984:hal_vector_get_n
Improving VSX performance of integral function

* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.

* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.

* Adding SSE/NEON/AVX intrinsics.

* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.

* core(simd): add tests for v_extract_n/v_broadcast_element

- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations

* core(simd): fix compilation

- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only

* cleanup
2019-11-20 13:41:07 +03:00
Vitaly Tuzov
3b015dfc7d Merge pull request #14210 from terfendail:wui_512
AVX512 wide universal intrinsics (#14210)

* Added implementation of 512-bit wide universal intrinsics(WIP)

* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks

* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros

* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines

* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.

* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build

* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
2019-06-03 18:05:35 +03:00
Brad Kelly
0fe17eeb68 Implementing AVX512 Support for 1 channel mats for CV_64F format 2019-03-22 09:44:23 -07:00
Brad Kelly
507f8add1c Implementing AVX512 Support for 2 and 4 channel mats for CV_64F format 2019-02-19 11:31:20 -08:00
Brad Kelly
0165ffa90d Implementing AVX512 support for 3 channel cv::integral for CV_64F 2019-01-14 16:11:01 -08:00
Vitaly Tuzov
6b84990620 integral() implementation updated to utilize wide universal intrinsics 2018-10-01 17:25:43 +03:00
Hamdi Sahloul
5d54def264 Add semicolons after CV_INSTRUMENT macros 2018-09-14 06:45:31 +09:00
Alexander Alekhin
b09a4a98d4 opencv: Use cv::AutoBuffer<>::data() 2018-07-04 19:11:29 +03:00
Alexander Alekhin
0ed3209b00 ocl: avoid unnecessary loading/initializing OpenCL subsystem
If there are no OpenCL/UMat methods calls from application.

OpenCL subsystem is initialized:
- haveOpenCL() is called from application
- useOpenCL() is called from application
- access to OpenCL allocator: UMat is created (empty UMat is ignored) or UMat <-> Mat conversions are called

Don't call OpenCL functions if OPENCV_OPENCL_RUNTIME=disabled
(independent from OpenCL linkage type)
2017-11-28 14:02:42 +03:00
Pavel Vlasov
11c2ffaf1c Update for IPP for OpenCV 2017u2 integration;
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
2017-04-25 15:53:12 +03:00
Maksim Shabunin
3d5c0f1faf HAL interface for cv::integral 2016-09-29 12:12:10 +03:00
Pavel Vlasov
30a6cee2fe Instrumentation for OpenCV API regions and IPP functions; 2016-08-19 18:10:03 +03:00
Pavel Vlasov
680ca88ce0 Outdated ICV restrictions were removed; 2016-08-19 15:08:39 +03:00
Pavel Vlasov
89eee6ca99 Fixes for IPP integration:
dotProd_16s - disabled for IPP 9.0.0;
filter2D - fixed kernel preparation;
morphology - conditions fix and disabled FilterMin and FilterMax for IPP 9.0.0;
GaussianBlur - disabled for CV_8UC1 due to buffer overflow;
integral - disabled for IPP 9.0.0;

IppAutoBuffer class was added;
2015-10-12 10:51:28 +03:00
Pavel Vlasov
e837d69f8f IPPInitSingelton was added to contain IPP related global variables;
OPENCV_IPP env var now allows to select IPP architecture level for IPP9+;
IPP initialization logic was unified across modules;
2015-10-01 09:58:48 +03:00
Vadim Pisarevsky
56e637d5f4 Merge pull request #4135 from lupustr3:ipp_code_refactoring 2015-06-24 16:18:55 +00:00
Dmitry Budnikov
a5a21019b2 ipp_countNonZero build fix;
Removed IPP port for tiny arithm.cpp functions

Additional warnings fix on various platforms.

Build without OPENCL and GCC warnings fixed

Fixed warnings, trailing spaces and removed unused secure_cpy.

IPP code refactored.

IPP code path  implemented as separate static functions to simplify future work with IPP code and make it more readable.
2015-06-18 12:47:07 +03:00
Vadim Pisarevsky
5a94a95fbf improvements in Haar CascadeClassifier: 1) use CV_32S instead of CV_32F for the integral of squares (which is more accurate and more efficient); 2) skip the window if its contrast is too low 2015-05-28 19:33:21 +03:00
Ilya Lavrenov
fc0869735d used popcnt 2015-01-12 10:59:30 +03:00
Ilya Lavrenov
9436103ed6 SSE2 optimization of cv::integral 2014-12-29 15:51:55 +03:00
Pavel Vlasov
45958eaabc Implementation detector and selector for IPP and OpenCL;
IPP can be switched on and off on runtime;

Optional implementation collector was added (switched off by default in CMake). Gathers data of implementation used in functions and report this info through performance TS;

TS modifications for implementations control;
2014-10-15 14:24:41 +04:00
Adil Ibragimov
8a4a1bb018 Several type of formal refactoring:
1. someMatrix.data -> someMatrix.prt()
2. someMatrix.data + someMatrix.step * lineIndex -> someMatrix.ptr( lineIndex )
3. (SomeType*) someMatrix.data -> someMatrix.ptr<SomeType>()
4. someMatrix.data -> !someMatrix.empty() ( or !someMatrix.data -> someMatrix.empty() ) in logical expressions
2014-08-13 15:21:35 +04:00
Alexander Alekhin
55188fe991 world fix 2014-08-05 20:12:35 +04:00
vbystricky
09bcc061dd Change kernel for optimization. Remove restriction to align data
Fix kernel compilation errors on AMD system

Fix licanse information in cl file

Support CV_64F destination type

Change build options of the kernel

Optimize sum of square

Remove separate kernel for integral square

Increase epsilon for perfomance tests

Increase epsilon for perfomance tests

Test double support on AMD devices

Fix some issues

Try to fix problems with AMD device

Try to solve problem with AMD device

Fix error of destination size in kernel

Fix warnings
2014-06-24 18:32:52 +04:00
vbystricky
606df0469a Fix pointer conversion 2014-06-16 18:14:05 +04:00
vbystricky
504bc7634a Remove pre_invalid parameter 2014-06-16 13:07:39 +04:00
Alexander Alekhin
1f638a3e5b icv: enable functions 2014-05-12 15:38:38 +04:00
vbystricky
c9103730b6 Disable ipp integral in IPPICV version 2014-04-30 12:55:34 +04:00
Ilya Lavrenov
ce0941160e added status check 2014-04-17 11:08:02 +04:00
Vadim Pisarevsky
f417c79d16 Merge pull request #1932 from seth-planet:master 2014-04-10 13:36:14 +04:00
vbystricky
38913bf5f6 Change all 'ippStsNoErr==' to '0<=', and all 'ippStsNoErr!=' to '0>' 2014-04-07 14:31:34 +04:00
vbystricky
824ed8a3ae Fix errors 2014-04-07 14:31:31 +04:00
vbystricky
1b3651d8ee Undo changes ipp to ippicv prefix of function names 2014-04-07 14:30:03 +04:00
vbystricky
a9a0ea3706 Fix error not initialized IppStatus before ipp functions call 2014-04-07 14:26:50 +04:00
vbystricky
01a66a2938 Prepare codes for ippicv library 2014-04-07 14:24:05 +04:00
sprice
75ed2f52f1 Merge branch 'master' of https://github.com/Itseez/opencv
Conflicts:
	modules/features2d/include/opencv2/features2d.hpp
	modules/features2d/src/freak.cpp
	modules/features2d/src/stardetector.cpp
2014-03-06 15:39:06 -08:00
Ilya Lavrenov
78c2b3ca2a refactored imgproc 2014-01-27 18:47:16 +04:00
Leszek Swirski
9c22d4887c imgproc: IPP compilation fix and minor cleanup 2013-12-20 12:40:40 +00:00
sprice
1c15776cf5 Merge branch 'master' of https://github.com/Itseez/opencv
Conflicts:
	modules/imgproc/src/sumpixels.cpp
2013-12-11 14:35:51 -08:00
Ilya Lavrenov
73aa43d2ca fixed warnings 2013-12-07 01:05:07 +04:00
Ilya Lavrenov
3eaa8f149b added cv::intergal to T-API 2013-12-06 13:18:25 +04:00
sprice
2cc11e2c6a Updating STAR detector and FREAK descriptor to work with large and/or 16-bit images 2013-12-04 18:13:34 -08:00
Ilya Lavrenov
036e99d03a fixed ipp-related warnings 2013-10-05 14:54:00 +04:00
Roman Donchenko
3c137f7a04 Converted tabs to spaces. 2013-08-21 18:59:26 +04:00
kdrobnyh
f8eb806565 Add IPP support to integral function 2013-07-10 11:25:36 +04:00
Andrey Kamaev
bd0e0b5800 Merged the trunk r8589:8653 - all changes related to build warnings 2012-06-15 13:04:17 +00:00
Andrey Kamaev
95d659a3cf Refactored Tegra related macro usage 2011-12-27 11:21:45 +00:00
Andrey Kamaev
583ceef6a5 Terga optimization for integral_8u32s 2011-10-31 08:00:20 +00:00
Vadim Pisarevsky
2d2b8a496e renamed "None()" to "noArray()" to avoid conflicts with X11 (ticket #1122) 2011-06-08 06:55:04 +00:00
Vadim Pisarevsky
0c877f62e9 replaced "const InputArray&" => "InputArray"; made InputArray and OutputArray references. added "None()" constant (no array()). 2011-06-06 14:51:27 +00:00