Commit Graph

1760 Commits

Author SHA1 Message Date
Alexander Alekhin
6158bd2afa Merge pull request #15103 from alalek:simd_intrinsics_in_user_code 2019-07-25 11:36:36 +00:00
Hugo Lindström
2ee00e7f7d Merge pull request #15059 from hugolm84:improved-support-for-wince
* Improve support for Windows Embedded Compact

* Remove redundant set(WINCE true) and format CMake
2019-07-24 23:12:09 +03:00
Alexander Alekhin
8bac8b513c core: support SIMD intrinsics in user code 2019-07-19 20:33:32 +00:00
Alexander Alekhin
002904e445 Merge pull request #15050 from alalek:core_fix_base64_packed_struct 2019-07-18 19:07:06 +00:00
Alexander Alekhin
4ea8526e9f core(persistence): fix writeRaw() / readRaw() struct support
- writeRaw(): support structs
- readRaw(): 'len' is buffer limit in bytes (documentation is fixed)
2019-07-16 14:03:39 +03:00
Hugo Lindström
245c256b1c Support compiliation for <=VS13 2019-07-12 19:02:36 +02:00
Alexander Alekhin
69560588fe Merge pull request #14953 from alalek:core_static_analysis_eval_expr 2019-07-02 09:44:29 +00:00
Vitaly Tuzov
9befb7a1d7 Merge pull request #14916 from terfendail:wsignmask_deprecated
* Avoid using v_signmask universal intrinsic and mark it as deprecated

* Renamed v_find_negative to v_scan_forward
2019-07-01 19:53:51 +03:00
Alexander Alekhin
44836c7f78 core: evaluate CV_Error() parameters during static scans 2019-07-01 18:17:03 +03:00
Stefan Brüns
e9a2e665b2 Explicitly default operator= for Vec<T, n>
Due to the explicitly declared copy constructor Vec<T, n>::Vec(Vec <T,n>&)
GCC 9 warns if there is no assignment operator, as having one typically
requires the other (rule-of-three, constructor/desctructor/assginment).

As the values are just a plain array the default assignment operator does
the right thing. Tell the compiler explicitly to default it.

Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
2019-06-29 22:11:00 +02:00
Rostislav Vasilikhin
f2f600f807 fixed multi instrumentations 2019-06-27 01:17:26 +03:00
Alexander Alekhin
e8a703a71d core(intrin): v_load_low() workaround for aarch64+clang 2019-06-25 17:29:04 +03:00
Alexander Alekhin
779f59da6b pre: OpenCV 3.4.7 (version++) 2019-06-21 16:57:17 +03:00
Alexander Alekhin
aa6c66aa54 Merge pull request #14848 from alalek:build_warnings_avx512 2019-06-21 13:53:52 +00:00
Alexander Alekhin
5ac55fc132 core: eliminate AVX512 build warnings
from MSVS2017 and GCC8 -O1 mode
2019-06-20 20:00:09 +03:00
Alexander Alekhin
681e0323f2 core: backport toLowerCase()/toUpperCase() 2019-06-20 17:48:18 +03:00
Vitaly Tuzov
a29e59a770 Rename parameters in AVX512 implementation of v_load_deinterleave and v_store_interleave 2019-06-14 14:16:30 +03:00
Vitaly Tuzov
d2aadabc5e Merge pull request #14743 from terfendail:wui512_fixvswarn
Fix for MSVS2019 build warnings (#14743)

* AVX512 arch support for MSVS

* Fix for MSVS2019 build warnings: updated integral() AVX512 implementation

* Fix for MSVS2019 build warnings: reworked v_rotate_right AVX512 implementation

* fix indentation
2019-06-11 23:07:39 +03:00
Alexander Alekhin
52644f067e Merge pull request #14764 from alalek:core_intrin_drop_hasSIMD_checks 2019-06-09 17:11:45 +00:00
Alexander Alekhin
6d916c5bb4 Merge pull request #14440 from alalek:async_array 2019-06-08 20:57:15 +00:00
Alexander Alekhin
1e9ad5476d core(intrin): drop hasSIMD128 checks
- use compile-time checks instead (`#if CV_SIMD128`)
- runtime checks are useless
2019-06-08 19:20:20 +00:00
Alexander Alekhin
4a8fd71a2e core: fix visibility handling 2019-06-07 07:23:15 +00:00
Alexander Alekhin
aab9ef4290 Merge pull request #14667 from asashour:javadoc 2019-06-06 10:57:39 +00:00
Ahmed Ashour
5c56b8ce92 java: generated code to have javadoc 2019-06-05 12:44:03 +02:00
Ahmed Ashour
1aca1d582e Fix some typos 2019-06-05 12:24:13 +02:00
Vitaly Tuzov
3b015dfc7d Merge pull request #14210 from terfendail:wui_512
AVX512 wide universal intrinsics (#14210)

* Added implementation of 512-bit wide universal intrinsics(WIP)

* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks

* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros

* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines

* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.

* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build

* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
2019-06-03 18:05:35 +03:00
Vitaly Tuzov
723165f878 fix for AVX2 version of v_reduce_min intrinsic 2019-05-31 16:14:54 +03:00
Vitaly Tuzov
f0fb91f2d4 Fixed v_signmask implementation for AVX2, updated universal intrinsics tests. 2019-05-24 19:34:54 +03:00
Alexander Alekhin
9340af1a8a core: Async API / AsyncArray 2019-05-18 19:32:23 +00:00
catree
b5e2ec4ea4 Fix typo in NormTypes documentation. 2019-05-16 19:22:41 +02:00
Vitaly Tuzov
7a55f2af3b Updated AVX2 implementation of v_popcount for u8. 2019-05-15 19:39:25 +03:00
Vitaly Tuzov
1220dd4877 Updated v_popcount description, reference implementation and test. 2019-05-14 18:59:40 +03:00
Vitaly Tuzov
96ab78dc4f Reworked v_popcount implementation to provide number of bits in a single lane 2019-05-14 18:59:38 +03:00
Sayed Adel
5a77f4cee3 Merge pull request #14007 from seiko2plus:core_avx512_infa
* core: improve AVX512 infrastructure by adding more CPU features groups

* cmake: use groups for AVX512 optimization flags

* core: remove gap in CPU flags enumeration

* cmake: restore default CPU_DISPATCH
2019-05-05 14:19:49 +03:00
Sayed Adel
afb157df67 core:vsx fix sum of v_reduce_sad 2019-04-27 02:01:24 +02:00
Alexander Alekhin
b95fdc1992 Merge pull request #14394 from alalek:build_support_memory_sanitizers 2019-04-26 16:13:52 +00:00
Alexander Alekhin
d17699363c Merge pull request #14385 from terfendail:intrin_sad 2019-04-26 15:34:02 +00:00
Vitaly Tuzov
18d10d6b86 Fixed v_reduce_sad intrinsics implementation and added tests 2019-04-24 14:53:59 +03:00
Alexander Alekhin
c1981f28ad build: +OPENCV_ENABLE_MEMORY_SANITIZER flag 2019-04-22 21:35:25 +00:00
Vitaly Tuzov
4a54aa3fbd Cleared up deprecated intrinsics for FP16 2019-04-22 10:35:37 +03:00
Alexander Alekhin
b38de57f9a ts: test tags for flexible/reliable tests filtering
- added functionality to collect memory usage of OpenCL sybsystem
- memory usage of fastMalloc() (disabled by default):
  * It is not accurate sometimes - external memory profiler is required.
- specify common `CV_TEST_TAG_` macros
- added applyTestTag() function
- write memory usage / enabled tags into Google Tests output file (.xml)
2019-04-08 19:12:49 +00:00
Alexander Alekhin
dad2247b56 Merge tag '3.4.6' 2019-04-07 11:02:40 +00:00
Alexander Alekhin
33b765d797 OpenCV version++ (3.4.6)
OpenCV 3.4.6
2019-04-06 21:43:23 +00:00
Alexander Alekhin
d6b82dcd65
Merge pull request #14162 from alalek:eliminate_coverity_scan_issues
core: eliminate coverity scan issues (#14162)

* core(hal): avoid using of r,g,b,a parameters in interleave/deinterleave

- static analysis tools blame on possible parameters reordering
- align AVX parameters with corresponding SSE/NEO/VSX/cpp code

* core: avoid "i,j" parameters in Matx methods

- static analysis tools blame on possible parameters reordering

* core: resolve coverity scan issues
2019-03-27 15:48:00 +03:00
Alexander Alekhin
55366caecd Merge pull request #14155 from alalek:fix_macos_ocl_warnings_3.4 2019-03-26 15:34:49 +00:00
Alexander Alekhin
6686559c70 ocl: define CL_SILENCE_DEPRECATION on MacOSX 2019-03-26 13:11:53 +03:00
Alexander Alekhin
cedd78d526 Merge pull request #14142 from mshabunin:fix-c-api-3.4 2019-03-25 18:58:28 +00:00
Maksim Shabunin
41da3ef1d2 Fixed cvdef.h for MSVC C users 2019-03-25 16:44:08 +03:00
Sayed Adel
f41359688b core:vsx Add support for VSX3 half precision conversions 2019-03-20 10:19:42 +02:00
Sayed Adel
4fe2d9bdbc core:vsx Several improvements(3)
* optimize v_lut_deinterleave
 * optimize v_interleave_/pairs/quads/triplets
 * optimize v_lut, use vec_extract instead of aligned store
2019-03-19 12:30:50 +02:00
Sayed Adel
872e7894b4 core:vsx working around gcc aligned memory access bug
- allow cmake to check sanity of vsx aligned ld/st
 - force universal intrinsics v_load_aligned/v_store_aligned
   to failback to unaligned ld/st if cmake runtime vsx aligned test fail
2019-03-14 01:55:40 +02:00
Alexander Alekhin
80e5642ca2 pre: OpenCV 3.4.6 (version++) 2019-03-12 13:29:42 +03:00
Alexander Alekhin
842c58a7d6 core(intrin): NEON v_load_expand_q() support unaligned addr 2019-03-11 12:06:05 +00:00
Alexander Alekhin
8b541e450b imgproc: dispatch color*
Lab/XYZ modes have been postponed (color_lab.cpp):
- need to split code for tables initialization and for pixels processing first
- no significant performance improvements for switching between SSE42 / AVX2 code generation
2019-03-07 15:45:05 +03:00
Sayed Adel
5478165e16 core:vsx Fix narrowing warning on vector splats 2019-03-01 00:48:38 +00:00
Alexander Alekhin
a9f67c2d1d Merge pull request #13905 from terfendail:pyr_wintr2 2019-02-28 14:53:42 +00:00
berak
20afae5a14 core: fix mat matx multiplication 2019-02-28 14:22:54 +01:00
Vitaly Tuzov
9548093b46 Horizontal line processing for pyrDown() reworked using wide universal intrinsics. 2019-02-28 00:12:57 +03:00
Vitaly Tuzov
334c4d62b5 Merge pull request #13781 from terfendail:warp_wintr
Resize reworked using wide universal intrinsics (#13781)

* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize

* Reworked linear resize using new wide LUT intrinsics

* Fix for VSX intrinsics
2019-02-20 14:30:28 +03:00
Alexander Alekhin
cd66f6e3db core: dispatch matmul
- gemm: keep baseline only (lapack is 10x+ faster, lets reduce binary size)
- transform / distTransform
- scaleAdd (32f/64f only)
- Mahalanobis: keep baseline only (no perf tests)
- mulTransposed: keep baseline only (no perf tests)
- dot
2019-02-18 14:36:46 +03:00
klemens
5d9c6723ee spelling fixes
backport 997b7b18af
2019-02-11 15:35:10 +03:00
Namgoo Lee
fb8e652c3f Add CV_16UC1 support for cuda::CLAHE
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.

The amount of memory needed for building histogram is:

    65536 * 4byte = 256KB

and shared memory limit is 48KB typically.

Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.

There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
2019-02-06 17:21:55 +00:00
Alexander Alekhin
dc5e69b4d4 Revert "Merge pull request #13586 from eightco:Core_bugfix3"
This reverts commit 3721c8bb06
except changes in modules/dnn/test/test_tf_importer.cpp
2019-01-18 18:29:12 +03:00
Lee Jaehwan
3721c8bb06 Merge pull request #13586 from eightco:Core_bugfix3
* Add Operator override for multi-channel Mat with literal constant.

* simple test

* Operator overloading channel constraint for primitive types

* fix some test for #13586
2019-01-17 17:23:09 +03:00
Vitaly Tuzov
ea882d58c6 Added CV_ALWAYS_INLINE macro 2019-01-11 22:40:35 +03:00
Alexander Alekhin
b11566bfc7 Merge pull request #13553 from luctowers:master 2019-01-09 13:33:45 +00:00
Lucas Towers
9cc12ff0ac Fix improper defining of CV_XADD when using Intel C++ 2019-01-09 14:41:21 +03:00
Namgoo Lee
4b4874e67a Remove build warning msg with CUDA10.0 2019-01-08 10:57:12 +09:00
Vitaly Tuzov
c8f59bf1e0 Fixed operations on Mat and Matx simultaneously 2018-12-25 19:22:09 +03:00
Alexander Alekhin
f35e043cf9 Merge tag '3.4.5' 2018-12-21 21:48:03 +03:00
Alexander Alekhin
8f1356c3c5 OpenCV version++ (3.4.5)
OpenCV 3.4.5
2018-12-21 17:31:20 +03:00
Vitaly Tuzov
06f32e3b3e Reworked separable filter to use wide universal intrinsics 2018-12-19 17:50:09 +03:00
Alexander Alekhin
f605898bae core: fix eigen2cv() - don't change fixed type of 'dst' 2018-12-16 06:43:08 +00:00
Sayed Adel
4e16ae9a1f core:vsx fix build failure on GCC<=6 due implementation of v_reduce_sum(v_float64x2) 2018-12-14 19:24:12 +00:00
Vitaly Tuzov
3903174f7c Merge pull request #13334 from terfendail:histogram_wintr
* added performance test for compareHist

* compareHist reworked to use wide universal intrinsics

* Disabled vectorization for CV_COMP_CORREL and CV_COMP_BHATTACHARYYA if f64 is unsupported
2018-12-13 14:20:22 +03:00
Alexander Alekhin
a811059bfb Merge pull request #13336 from sergiud:core_sse_immediates_gcc-5.4.0 2018-11-30 09:51:59 +00:00
Sergiu Deitsch
e43a5ff9be fixed gcc 5.4.0 compilation errors 2018-11-30 08:48:19 +01:00
Vitaly Tuzov
00c9ab8c23 Merge pull request #13317 from terfendail:norm_wintr
* Added performance tests for hal::norm functions

* Added sum of absolute differences intrinsic

* norm implementation updated to use wide universal intrinsics

* improve and fix v_reduce_sad on VSX
2018-11-29 19:34:14 +03:00
Maksim Shabunin
89f0e0a8d1 Fixed misleading indentation in intrin_cpp.hpp 2018-11-27 15:29:37 +03:00
Etienne Brateau
736683ce2f Fix missing check part (defined(__cplusplus)) in header types_c.h 2018-11-22 01:39:09 +01:00
Alexander Alekhin
6e67fd2752 Merge pull request #13224 from seiko2plus:core_ppc64le_infa 2018-11-20 21:26:05 +00:00
Sayed Adel
474a0dac49 core: several improves and fixes on ppc64le infrastructure
- add infrastructure support for Power9/VSX3
  - fix missing VSX flags on GCC4.9 and CLANG4(#13210, #13222)
  - fix disable VSX optimzation on GCC by using flag ENABLE_VSX
  - flag ENABLE_VSX is deprecated now, use CPU_BASELINE, CPU_DISPATCH instead
  - add VSX3 to arithmetic dispatchable flags
2018-11-20 15:28:46 +00:00
1over
b6367f5821 fixed operator- for Rect 2018-11-20 00:48:17 +01:00
Alexander Alekhin
605071e76f Merge pull request #13146 from terfendail:bilateral_nan 2018-11-19 15:59:12 +00:00
Alexander Alekhin
183bc5c281 Merge tag '3.4.4'
OpenCV 3.4.4
2018-11-17 13:00:28 +00:00
Alexander Alekhin
a1fe8f754f OpenCV version++ (3.4.4)
OpenCV 3.4.4
2018-11-17 10:22:17 +00:00
Alexander Alekhin
1d5a528107
Merge pull request #12354 from alalek:samples_find_file 2018-11-16 22:40:49 +03:00
Vitaly Tuzov
f5b6bea2d4 Raised bilateralFilter processing precision for CV_32F matrices containing NaNs 2018-11-16 12:07:04 +03:00
Alexander Alekhin
1c04a5ec47 Merge pull request #12965 from terfendail:medianBlur_wintr 2018-11-16 00:47:11 +00:00
Alexander Alekhin
2fa9bd221d core: add utils::findDataFile() / samples::findFile() 2018-11-16 00:25:06 +00:00
Alexander Alekhin
96c71dd3d2 dnn: reduce set of ignored warnings 2018-11-15 13:15:59 +03:00
Vitaly Tuzov
28fd967148 Updated bilateralFilter implementations to use wide universal intrinsics 2018-11-09 15:27:30 +03:00
Alexander Alekhin
bb7cfcbcdb Merge pull request #12064 from seiko2plus:coreUnvintrinArithm2 2018-11-08 14:02:40 +00:00
Vitaly Tuzov
e5d7f446d6 Merge pull request #13056 from terfendail:box_wintr
* Updated boxFilter implementations to use wide universal intrinsics

* boxFilter implementation moved to separate file

* Replaced ROUNDUP macro with roundUp() function
2018-11-07 23:59:36 +03:00
Alexander Alekhin
d4e3405db2
Merge pull request #13045 from LaurentBerger:kmeansdoc
typo in kmeans doc
2018-11-06 20:00:47 +03:00
LaurentBerger
5132102863 typo in kmeans doc 2018-11-04 21:30:31 +01:00
Alexander Alekhin
79dc0ed175 docs: intro formatting update, minor cleanup 2018-11-04 02:36:24 +00:00
Sayed Adel
93ffebc273 core: reimplement SIMD arithmetic, logic and comparison operations into wide universal intrinsics
- initialize arithmetic dispatcher
  - add new universal intrinsic v_absdiffs
  - add new universal intrinsic v_pack_b
  - add accumulate version of universal intrinsic v_round
  - fix sse/avx2:uint8 multiplication overflow
  - reimplement arithmetic, logic and comparison operations into wide universal intrinsics
    with full support for all types
  - reimplement IPP arithmetic, logic and comparison operations in a sperate file arithm_ipp.hpp
  - avoid scalar multiplication if scaling factor eq 1 and use integer multiplication
  - move C arithmetic operations to precomp.hpp and delete [arithm_simd|arithm_core].hpp
  - add compatibility with new opencv4 divide policy
2018-10-30 12:48:31 +02:00
Rostislav Vasilikhin
daff6e6484 _mm256_zeroupper replaced by zeroall 2018-10-26 18:12:07 +03:00
Alexander Alekhin
7f608db244 core: move compiler defines from base.hpp into cvdef.h 2018-10-25 03:02:01 +00:00