Commit Graph

4351 Commits

Author SHA1 Message Date
Alexander Alekhin
6bdb9ca725 OpenCV release (3.4.8)
OpenCV 3.4.8
2019-10-09 14:42:29 +03:00
Maksim Shabunin
1ca74c3c03 Merge pull request #15544 from mshabunin:disable_posix_memalign
* Disable posix_memalign by default

* core: fix memalign parameter handling
2019-10-09 14:06:12 +03:00
Alexander Smorkalov
c3a588037a Merge pull request #15666 from seanm:Wnewline 2019-10-09 11:04:44 +00:00
Marcin Tolysz
3fd36c1be1 Merge pull request #15658 from tolysz:patch-1
* Cuda + OpenGL on ARM

There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations. 
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.

* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
2019-10-09 11:38:10 +03:00
Sean McBride
24effe8cd6 Fixed clang -Wnewline-eof warning by adding newline to end of file 2019-10-09 10:12:09 +03:00
Sayed Adel
f2fe6f40c2 Merge pull request #15510 from seiko2plus:issue15506
* core: rework and optimize SIMD implementation of dotProd

  - add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
  - add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
    pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
  - fix clang build on ppc64le
  - support wide universal intrinsics for dotProd_32s
  - remove raw SIMD and activate universal intrinsics for dotProd_8
  - implement SIMD optimization for dotProd_s16&u16
  - extend performance test data types of dotprod
  - fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
  - optimize v_mul_expand(int32) on VSX

* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast

  this changes made depend on "terfendail" review
2019-10-07 22:01:35 +03:00
Alexander Alekhin
7837ae0e19 Merge pull request #15654 from sturkmen72:patch-3 2019-10-07 16:15:04 +00:00
Marcin Tolysz
53400d86e2 Fix compiler warnings for latest cuda npp which defines this itself as:
```
#define NPP_VER_MAJOR 10
#define NPP_VER_MINOR 2
#define NPP_VER_PATCH 0
#define NPP_VER_BUILD 243

#define NPP_VERSION (NPP_VER_MAJOR * 1000 +     \
                     NPP_VER_MINOR *  100 +     \
                     NPP_VER_PATCH)
2019-10-07 11:45:26 +01:00
Suleyman TURKMEN
c0489963bb
Update copy.cpp 2019-10-07 11:59:52 +03:00
Alexander Alekhin
98fc098216 Merge pull request #15646 from alalek:fix_avx512_detection 2019-10-05 15:30:09 +00:00
Alexander Alekhin
22d0c57a1c Merge pull request #15602 from alalek:core_softfloat_ubsan_shift 2019-10-05 15:27:35 +00:00
Alexander Alekhin
bdc097495a fix avx512 detection
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
2019-10-05 11:03:57 +00:00
Alexander Alekhin
77346d7286 core: workaround transform() inplace calls 2019-10-01 16:52:14 +03:00
Alexander Alekhin
ed9bca969c core: fix UBSAN in softfloat 2019-09-27 16:29:50 +03:00
Alexander Alekhin
677b94c92e Merge pull request #15579 from alalek:ocl_use_host_mem_ptr_flag 2019-09-25 15:12:59 +00:00
Alexander Alekhin
eacadf0e73 core(ocl): add flag OPENCV_OPENCL_ENABLE_MEM_USE_HOST_PTR
to control CL_MEM_USE_HOST_PTR usage
2019-09-25 15:12:36 +03:00
Alexander Alekhin
d2cacac07a Merge pull request #15573 from alalek:build_cxx11_warnings 2019-09-24 22:08:55 +00:00
Wenzhao Xiang
c2096771cb Merge pull request #15371 from Wenzhao-Xiang:gsoc_2019
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)

* [GSoC 2019]

Improve the performance of JavaScript version of OpenCV (OpenCV.js):
1. Create the base of OpenCV.js performance test:
     This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it.
2. Optimize the OpenCV.js performance by WASM threads:
     This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser.
3. Optimize the OpenCV.js performance by WASM SIMD:
     Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.

* [GSoC2019] 

1. use short license header
2. fix documentation node issue
3. remove the unused `hasSIMD128()` api

* [GSoC2019]

1. fix emscripten define
2. use fallback function for f16

* [GSoC2019]

Fix rebase issue
2019-09-24 16:30:42 +03:00
Alexander Alekhin
3cf9185159 Merge pull request #15538 from terfendail:wui_checkany 2019-09-23 15:52:24 +00:00
Maksim Shabunin
c8abf2ad14 backport: fixed warnings produced by clang-9.0.0
ea3dc78986
83fc27cb99
2019-09-23 18:36:18 +03:00
Alexander Alekhin
fcc69d5a60 core(test): fix check conditions 2019-09-22 11:28:41 +00:00
mipsopen-fwu
b1ea91d8bd Merge pull request #15422 from mipsopen-fwu:msa-dev
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed some unused code in mips.toolchain.cmake.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
2019-09-20 19:52:48 +03:00
Vitaly Tuzov
66842f5a18 Extended v_check_any/v_check_all universal intrinsics to support 64-bit integer 2019-09-19 18:31:31 +03:00
Paul E. Murphy
b465c82696 core: workaround old gcc vec_mul{e,o} (Issue #15506)
ISA 2.07 (aka POWER8) effectively extended the expanding multiply
operation to word types. The altivec intrinsics prior to gcc 8 did
not get the update.

Workaround this deficiency similar to other fixes.

This was exposed by commit 33fb253a66
which leverages the int -> dword expanding multiply.

This fixes Issue #15506
2019-09-12 09:54:02 -05:00
Alexander Alekhin
0a13633411 Merge pull request #15444 from alalek:ocl_fix_fft_kernel 2019-09-04 16:25:34 +00:00
Alexander Alekhin
8bd2720c28 core(ocl): fix fft kernel compilation
- error: variables in the local address space can only be declared in the outermost scope of a kernel function
2019-09-03 15:46:53 +03:00
Alexander Alekhin
7e46766c8d Merge pull request #15437 from devnexen:fbsd_opencl_build_fix 2019-09-03 12:21:02 +00:00
Alexander Alekhin
9ef5373776 Merge pull request #15435 from alalek:update_version_3.4.8-pre 2019-09-03 12:04:23 +00:00
Alexander Alekhin
abd7d63b74 Merge pull request #15424 from mshabunin:add-cmake-docs 2019-09-03 10:50:45 +00:00
David Carlier
6769ee3748 OpenCL: FreeBSD build fix 2019-09-02 18:30:53 +01:00
Alexander Alekhin
0fda243a05 pre: OpenCV 3.4.8 (version++) 2019-09-02 14:20:49 +03:00
Alexander Alekhin
048ddbf9ee Merge pull request #15339 from pmur:dotprod-32s-vsx 2019-08-31 11:16:04 +00:00
Alexander Alekhin
2a6527e751 Merge pull request #15402 from ChipKerchner:normUnroll 2019-08-31 11:10:05 +00:00
Maksim Shabunin
f3aab47f94 Assorted documentation fixes
* removed private flann documentation
* common tutorial images moved to doc/images
* grouping issues
2019-08-31 01:50:11 +03:00
Alexander Alekhin
f224d740a3 Merge pull request #15414 from kuzi117:instr 2019-08-30 12:03:19 +00:00
Braedy Kuzma
9bf8b496d6 Use commonly supported instruction mnemonic. 2019-08-29 10:00:40 -06:00
Braedy Kuzma
d4120dd2fe Disambiguate vecpopcnt for (u)dword2. 2019-08-29 09:54:56 -06:00
Vitaly Tuzov
d134ec54c5 Extend tests for v_check_any and v_check_all intrinsics 2019-08-28 14:53:31 +03:00
Alexander Alekhin
ca7640e10f Merge pull request #15401 from ChipKerchner:vectorReduceInt8Bug 2019-08-27 19:59:39 +00:00
ChipKerchner
288e6f9c07 –Improve vectorization in the 'norm' functions 2019-08-27 12:15:19 -05:00
ChipKerchner
70b883cfeb Fix macro bug with v_reduce_min and v_reduce_max for chars in VSX 2019-08-27 11:38:53 -05:00
Vitaly Tuzov
1b40528e1a Fix for AVX2 implementation of v_check_any(), v_check_all() intrinsics 2019-08-27 14:31:23 +03:00
Alexander Alekhin
d7409604b5 core: handle empty Mat in Mat_ assignment operators 2019-08-23 16:54:24 +03:00
Alexander Alekhin
56e832ee43 Merge pull request #15372 from alalek:core_stat_fix_intrin 2019-08-22 20:52:54 +00:00
Alexander Alekhin
8a0b93bc4d core: update fastmath.hpp 2019-08-22 16:43:07 +03:00
Alexander Alekhin
8b1fe8f6e0 core: fix stat SIMD code 2019-08-22 16:37:26 +03:00
Zyrin
869ea22f34 Use std::move in Mat_<T> move constructors 2019-08-21 11:12:00 +02:00
Zyrin
8ef8088686 Fix stack overflow on gcc with c++17 (#15343) 2019-08-21 10:57:03 +02:00
Paul E. Murphy
33fb253a66 core: vectorize dotProd_32s
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.
2019-08-20 15:28:36 -05:00
luz.paz
fcc7d8dd4e Fix modules/ typos
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`

backporting of commit: ec43292e1e
2019-08-16 17:34:29 +03:00
Alexander Alekhin
13ecd5bb25 Merge pull request #15122 from pmur:fast-math-improvements 2019-08-14 19:28:05 +00:00
Alexander Alekhin
a703b9ed84 Merge pull request #15101 from alalek:cmake_initialization 2019-08-14 19:17:07 +00:00
Alexander Alekhin
32772a5436 3.4: backported changes from 'master' branch 2019-08-14 16:36:08 +03:00
Alexander Alekhin
7c96857c02 Merge pull request #15292 from alalek:build_warnings_xcode_10_3 2019-08-13 14:18:48 +00:00
Alexander Alekhin
15b8a8d935 build: eliminate warnings with Xcode 10.3 2019-08-13 15:06:13 +03:00
Hugo Lindström
935067ee05 Merge pull request #15265 from hugolm84:wince-armv7-supports-neon
* WINCE 8.0 requires ARMv7 Thumb2 and thus have NEON instructions

* Only add NEON if on _ARM_
2019-08-09 18:01:37 +03:00
Alexander Alekhin
5ef548a985 cmake: update initialization 2019-08-08 15:23:16 +03:00
Paul E. Murphy
f38a61c66d fast_math: implement optimized PPC routines
Implement cvRound using inline asm. No compiler support
exists today to properly optimize this. This results in
about a 4x speedup over the default rounding. Likewise,
simplify the growing number of rounding function overloads.

For P9 enabled targets, utilize the classification
testing instruction to test for Inf/Nan values. Operation
speedup is about 1.2x for FP32, and 1.5x for FP64 operands.

For P8 targets, fallback to the GCC nan inline. It provides
a 1.1/1.4x improvement for FP32/FP64 arguments.
2019-08-07 15:01:18 -05:00
Paul E. Murphy
3f92bcc11a fast_math: selectively use GCC rounding builtins when available
Add a new macro definition OPENCV_USE_FASTMATH_GCC_BUILTINS to enable
usage of GCC inline math functions, if available and requested by the
user.

Likewise, enable it for POWER. This is nearly always a substantial
improvement over using integer manipulation as most operations can
be done in several instructions with no branching. The result is a
1.5-1.8x speedup in the ceil/floor operations.

1. As tested with AT 12.0-1 (GCC 8.3.1) compiler on P9 LE.
2019-08-07 15:01:18 -05:00
Paul E. Murphy
b2135be594 fast_math: add extra perf/unit tests
Add a basic sanity test to verify the rounding functions
work as expected.

Likewise, extend the rounding performance test to cover the
additional float -> int fast math functions.
2019-08-07 14:59:46 -05:00
Alexander Alekhin
821f17d666 Merge pull request #15235 from pmur:vsx-v_signmask-vbpermq 2019-08-06 20:09:22 +00:00
Victor Romero
987bb2ca61 Fix build for UWP
backport of commit: f18cbd036a
2019-08-05 17:19:36 +03:00
Paul E. Murphy
1031b7f4bc hal: vsx: further optimize v_signmask
Use the quadword bit permutation instruction to creatively move
the sign bits to create the mask. Note that values above 127 will
result in 0.
2019-08-05 09:00:22 -05:00
Alexander Alekhin
ba934ff1ce Merge pull request #15202 from hugolm84:support_build_shared_for_wince 2019-08-02 15:34:02 +00:00
Hugo Lindström
03fe1cb7fc Support building shared libraries on WINCE. 2019-08-01 15:28:04 +02:00
Maksim Shabunin
6d5ac67681 Restored IPP call reduction 2019-07-31 15:41:22 +03:00
Maksim Shabunin
eec9fa9d5e Merge pull request #15181 from berak:java_print_blob 2019-07-30 14:13:02 +00:00
berak
4d3989817c java: fix Mat.toString() for higher dimensions 2019-07-29 19:39:09 +02:00
Alexander Alekhin
2693ed9b22 Merge tag '3.4.7' 2019-07-25 19:19:49 +00:00
Alexander Alekhin
4a7ca5a291 OpenCV version++ (3.4.7)
OpenCV 3.4.7
2019-07-25 19:01:19 +00:00
Chip Kerchner
0db4fb1835 Merge pull request #15136 from ChipKerchner:dotProd_unroll
* Unroll multiply and add instructions in dotProd_32f - 35% faster.

* Eliminate unnecessary v_reduce_sum instructions.
2019-07-25 21:21:32 +03:00
Alexander Alekhin
6158bd2afa Merge pull request #15103 from alalek:simd_intrinsics_in_user_code 2019-07-25 11:36:36 +00:00
Hugo Lindström
2ee00e7f7d Merge pull request #15059 from hugolm84:improved-support-for-wince
* Improve support for Windows Embedded Compact

* Remove redundant set(WINCE true) and format CMake
2019-07-24 23:12:09 +03:00
Alexander Alekhin
8bac8b513c core: support SIMD intrinsics in user code 2019-07-19 20:33:32 +00:00
Alexander Alekhin
002904e445 Merge pull request #15050 from alalek:core_fix_base64_packed_struct 2019-07-18 19:07:06 +00:00
Alexander Alekhin
4ea8526e9f core(persistence): fix writeRaw() / readRaw() struct support
- writeRaw(): support structs
- readRaw(): 'len' is buffer limit in bytes (documentation is fixed)
2019-07-16 14:03:39 +03:00
Alexander Alekhin
c3b838b738 core(persistence): struct storage layout without alignment gaps 2019-07-15 21:37:20 +00:00
Hugo Lindström
245c256b1c Support compiliation for <=VS13 2019-07-12 19:02:36 +02:00
Alexander Alekhin
69560588fe Merge pull request #14953 from alalek:core_static_analysis_eval_expr 2019-07-02 09:44:29 +00:00
Vitaly Tuzov
9befb7a1d7 Merge pull request #14916 from terfendail:wsignmask_deprecated
* Avoid using v_signmask universal intrinsic and mark it as deprecated

* Renamed v_find_negative to v_scan_forward
2019-07-01 19:53:51 +03:00
Alexander Alekhin
44836c7f78 core: evaluate CV_Error() parameters during static scans 2019-07-01 18:17:03 +03:00
Stefan Brüns
e9a2e665b2 Explicitly default operator= for Vec<T, n>
Due to the explicitly declared copy constructor Vec<T, n>::Vec(Vec <T,n>&)
GCC 9 warns if there is no assignment operator, as having one typically
requires the other (rule-of-three, constructor/desctructor/assginment).

As the values are just a plain array the default assignment operator does
the right thing. Tell the compiler explicitly to default it.

Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
2019-06-29 22:11:00 +02:00
Alexander Alekhin
e59c6caee5 Merge pull request #14905 from savuor:fix/inst_region_unique 2019-06-27 10:16:59 +00:00
Rostislav Vasilikhin
f2f600f807 fixed multi instrumentations 2019-06-27 01:17:26 +03:00
Alexander Alekhin
4112866821 Merge pull request #14886 from alalek:fix_grabcut_kmeans_call_14879 2019-06-26 20:03:04 +00:00
Alexander Alekhin
e8a703a71d core(intrin): v_load_low() workaround for aarch64+clang 2019-06-25 17:29:04 +03:00
Alexander Alekhin
4a6888ccf6 imgproc: fix kmeans() call from grabCut() 2019-06-25 13:42:04 +03:00
Alexander Alekhin
779f59da6b pre: OpenCV 3.4.7 (version++) 2019-06-21 16:57:17 +03:00
Alexander Alekhin
aa6c66aa54 Merge pull request #14848 from alalek:build_warnings_avx512 2019-06-21 13:53:52 +00:00
Alexander Alekhin
5ac55fc132 core: eliminate AVX512 build warnings
from MSVS2017 and GCC8 -O1 mode
2019-06-20 20:00:09 +03:00
Alexander Alekhin
681e0323f2 core: backport toLowerCase()/toUpperCase() 2019-06-20 17:48:18 +03:00
Vitaly Tuzov
a29e59a770 Rename parameters in AVX512 implementation of v_load_deinterleave and v_store_interleave 2019-06-14 14:16:30 +03:00
Alexander Alekhin
1c661e7548 Merge pull request #14784 from alalek:backport_14779 2019-06-11 20:36:16 +00:00
Vitaly Tuzov
d2aadabc5e Merge pull request #14743 from terfendail:wui512_fixvswarn
Fix for MSVS2019 build warnings (#14743)

* AVX512 arch support for MSVS

* Fix for MSVS2019 build warnings: updated integral() AVX512 implementation

* Fix for MSVS2019 build warnings: reworked v_rotate_right AVX512 implementation

* fix indentation
2019-06-11 23:07:39 +03:00
Alexander Alekhin
f8791f072d core: avoid function type cast, make happy UBSAN
backporting of commit: d3d13c41c4
2019-06-11 19:36:47 +00:00
Alexander Alekhin
52644f067e Merge pull request #14764 from alalek:core_intrin_drop_hasSIMD_checks 2019-06-09 17:11:45 +00:00
Alexander Alekhin
6d916c5bb4 Merge pull request #14440 from alalek:async_array 2019-06-08 20:57:15 +00:00
Alexander Alekhin
1e9ad5476d core(intrin): drop hasSIMD128 checks
- use compile-time checks instead (`#if CV_SIMD128`)
- runtime checks are useless
2019-06-08 19:20:20 +00:00
Alexander Alekhin
4a8fd71a2e core: fix visibility handling 2019-06-07 07:23:15 +00:00
Alexander Alekhin
aab9ef4290 Merge pull request #14667 from asashour:javadoc 2019-06-06 10:57:39 +00:00
Ahmed Ashour
5c56b8ce92 java: generated code to have javadoc 2019-06-05 12:44:03 +02:00
Ahmed Ashour
1aca1d582e Fix some typos 2019-06-05 12:24:13 +02:00
Ted Steiner
f1fb002682 Merge pull request #14678 from tedsteiner:qnx
Fix build issue on QNX platform (#14678)

* QNX compatibility

* core: unify gettimeofday() usage
2019-06-04 19:45:21 +03:00
Vitaly Tuzov
3b015dfc7d Merge pull request #14210 from terfendail:wui_512
AVX512 wide universal intrinsics (#14210)

* Added implementation of 512-bit wide universal intrinsics(WIP)

* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks

* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros

* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines

* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.

* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build

* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
2019-06-03 18:05:35 +03:00
Vitaly Tuzov
723165f878 fix for AVX2 version of v_reduce_min intrinsic 2019-05-31 16:14:54 +03:00
Alexander Alekhin
29b3f66507 Merge pull request #14606 from asashour:java_inline_return 2019-05-29 20:03:54 +00:00
Ahmed Ashour
ca8a1d2cff java: generated code inline return 2019-05-28 08:34:02 +02:00
Alexander Alekhin
d73261844e Merge pull request #14622 from asashour:junit 2019-05-27 14:55:45 +00:00
Alexander Alekhin
1094ed64e9 Merge pull request #14605 from terfendail:wsignmask_fixavx 2019-05-27 13:19:39 +00:00
Vitaly Tuzov
f0fb91f2d4 Fixed v_signmask implementation for AVX2, updated universal intrinsics tests. 2019-05-24 19:34:54 +03:00
Ahmed Ashour
f9564e053d java: test: use assertNotNull and assertFalse 2019-05-24 10:45:09 +02:00
Ahmed Ashour
f3319f6140 java: remove redundant declaration of java.lang package 2019-05-23 14:06:34 +02:00
Alexander Alekhin
9340af1a8a core: Async API / AsyncArray 2019-05-18 19:32:23 +00:00
catree
b5e2ec4ea4 Fix typo in NormTypes documentation. 2019-05-16 19:22:41 +02:00
Alexander Alekhin
84fd8190f3 Merge pull request #14232 from terfendail:popcount_rework 2019-05-15 17:58:11 +00:00
Vitaly Tuzov
7a55f2af3b Updated AVX2 implementation of v_popcount for u8. 2019-05-15 19:39:25 +03:00
Daniel Ingram
962d57b4d6 Merge pull request #14559 from daniel-s-ingram:master
* Fix typo: 'divisble' -> 'divisible'

* Fix typo: 'One of arguments' -> 'One of the arguments'
2019-05-15 18:41:43 +03:00
Vitaly Tuzov
1220dd4877 Updated v_popcount description, reference implementation and test. 2019-05-14 18:59:40 +03:00
Vitaly Tuzov
96ab78dc4f Reworked v_popcount implementation to provide number of bits in a single lane 2019-05-14 18:59:38 +03:00
Sayed Adel
5a77f4cee3 Merge pull request #14007 from seiko2plus:core_avx512_infa
* core: improve AVX512 infrastructure by adding more CPU features groups

* cmake: use groups for AVX512 optimization flags

* core: remove gap in CPU flags enumeration

* cmake: restore default CPU_DISPATCH
2019-05-05 14:19:49 +03:00
Sayed Adel
afb157df67 core:vsx fix sum of v_reduce_sad 2019-04-27 02:01:24 +02:00
Alexander Alekhin
b95fdc1992 Merge pull request #14394 from alalek:build_support_memory_sanitizers 2019-04-26 16:13:52 +00:00
Alexander Alekhin
d17699363c Merge pull request #14385 from terfendail:intrin_sad 2019-04-26 15:34:02 +00:00
Vitaly Tuzov
18d10d6b86 Fixed v_reduce_sad intrinsics implementation and added tests 2019-04-24 14:53:59 +03:00
Alexander Alekhin
c1981f28ad build: +OPENCV_ENABLE_MEMORY_SANITIZER flag 2019-04-22 21:35:25 +00:00
Alexander Alekhin
de4be8e610 Merge pull request #14384 from terfendail:fp16_depint 2019-04-22 15:52:53 +00:00
masa-iwm
5c404bb142 Merge pull request #14376 from masa-iwm:3.4
* fix getting platformIDs in initializeContextFromD3D11Device
2019-04-22 18:50:31 +03:00
Vitaly Tuzov
4a54aa3fbd Cleared up deprecated intrinsics for FP16 2019-04-22 10:35:37 +03:00
Alexander Alekhin
b38de57f9a ts: test tags for flexible/reliable tests filtering
- added functionality to collect memory usage of OpenCL sybsystem
- memory usage of fastMalloc() (disabled by default):
  * It is not accurate sometimes - external memory profiler is required.
- specify common `CV_TEST_TAG_` macros
- added applyTestTag() function
- write memory usage / enabled tags into Google Tests output file (.xml)
2019-04-08 19:12:49 +00:00
Alexander Alekhin
dad2247b56 Merge tag '3.4.6' 2019-04-07 11:02:40 +00:00
Alexander Alekhin
33b765d797 OpenCV version++ (3.4.6)
OpenCV 3.4.6
2019-04-06 21:43:23 +00:00
Alexander Alekhin
1e583942b9 core(lda): don't perform calculations in constructor
- exceptions from constructor will not cause destructor calls
2019-03-31 21:48:44 +00:00
David Carlier
06a4c20f60 OpenBSD build fix
required for close calls.
2019-03-31 10:54:47 +01:00
Alexander Alekhin
d6b82dcd65
Merge pull request #14162 from alalek:eliminate_coverity_scan_issues
core: eliminate coverity scan issues (#14162)

* core(hal): avoid using of r,g,b,a parameters in interleave/deinterleave

- static analysis tools blame on possible parameters reordering
- align AVX parameters with corresponding SSE/NEO/VSX/cpp code

* core: avoid "i,j" parameters in Matx methods

- static analysis tools blame on possible parameters reordering

* core: resolve coverity scan issues
2019-03-27 15:48:00 +03:00
Alexander Alekhin
5368a4ac41 Merge pull request #14102 from alalek:core_refactor_eigenvalues 2019-03-27 12:46:51 +00:00
Alexander Alekhin
55366caecd Merge pull request #14155 from alalek:fix_macos_ocl_warnings_3.4 2019-03-26 15:34:49 +00:00
Alexander Alekhin
6686559c70 ocl: define CL_SILENCE_DEPRECATION on MacOSX 2019-03-26 13:11:53 +03:00
Alexander Alekhin
cedd78d526 Merge pull request #14142 from mshabunin:fix-c-api-3.4 2019-03-25 18:58:28 +00:00
Maksim Shabunin
41da3ef1d2 Fixed cvdef.h for MSVC C users 2019-03-25 16:44:08 +03:00
iPanda
097fc1a271 Merge pull request #13972 from Mainvooid:add_cuda_support_for_D3D11_interop
* Add CUDA support for D3D11 interop. #13888

color_detail.hpp: fixed build error : dynamic initialization is not supported for a __constant__ variable.
directx.cpp: Add CUDA support(cl_nv_d3d11_sharing) for D3D11 interop.  #13888

Update directx.cpp

Format adjustment.

Update directx.cpp

fix error.

Update directx.cpp

Format adjustment

Update directx.cpp

fix trailing whitespace.

fix format errors

convert indentation to spaces .
Trim trailing whitespace.
Add information about source of cl_d3d11_ext.h
Avoid unrelated changes.

Increase compile-time conditional judgment.

Increase the judgment of whether the OCL device has the required extensions at compile time.

Add compilation option  `HAVE_CLNVEXT`.Check CL support in runtime.

Check result of `clGetExtensionFunctionAddressForPlatform` for KHR is invalid.It always can get the address(from OpenCL.dll),So I check NV support(from nvopencl64.dll) before KHR when `HAVE_CLNVEXT` is enabled.

Delete cl_d3d11_ext.h

Modified parameter list

fix "cannot open include file: 'CL/cl_d3d11_ext.h'"

 remove not referenced var

fix C2143: syntax error

Improve compile-time judgment.

dlrectx.cpp Modify the detection order.
initializeContextFromD3D11Device:
```
    // try with NV(Need to check it first)
    // try with KHR
```

fix warnig C4100

Revert "fix warnig C4100"

This reverts commit 76e5becb67780071d0cbde61cc4f5f807ad7c5ac.

fix warning C4100

fix warning C4505

Format alignment

Format adjustment and automatically detect header files.

Automatically detect header files when users are not configured or configuration errors occur.

avoid unrelated changes.

Update .cmake

Update .cmake

* fix build errors

* fix warning:defined but not used

* Revert "fix warning:defined but not used"

This reverts commit 7ab3537cd0.

* fix warning:defined but not used

* fix build error for mac

* fix build error for win

* optimizing branch judgment

* Revert "optimizing branch judgment"

This reverts commit 88b72b870e.

* fix warning C4702: unreachable code

* remove unused code

* Fix problems that may lead to undefined behavior

* Add status check

* fix error C2664,C2665 : cannot convert argument

* Format adjustment

VSCODE will automatically format the indentation to 4 spaces in some situation.

* fix error C2440

* fix error C2440

* add cl_d3d11_ext.h

* Format adjustment

* remove unnecessary checks
2019-03-24 18:34:09 +03:00
Alexander Alekhin
4e60db9030 Merge pull request #14110 from seiko2plus:core_vsx_fp16 2019-03-20 16:04:20 +00:00
Alexander Alekhin
a8e635f177 Merge pull request #14069 from terfendail:transform_wintr 2019-03-20 15:39:40 +00:00
Sayed Adel
f41359688b core:vsx Add support for VSX3 half precision conversions 2019-03-20 10:19:42 +02:00
Alexander Alekhin
6c862fae13 Merge pull request #14099 from seiko2plus:vsx_improvements_3 2019-03-19 14:59:58 +00:00
Alexander Alekhin
93a402d0f2 core: fix Core_EigenNonSymmetric.convergence test 2019-03-19 15:18:43 +03:00
Sayed Adel
4fe2d9bdbc core:vsx Several improvements(3)
* optimize v_lut_deinterleave
 * optimize v_interleave_/pairs/quads/triplets
 * optimize v_lut, use vec_extract instead of aligned store
2019-03-19 12:30:50 +02:00
Vitaly Tuzov
d43597c199 transform() implementation updated to utilize wide universal intrinsics 2019-03-18 20:33:19 +03:00
Alexander Alekhin
5451b89aed core: refactor EigenvalueDecomposition (hqr2)
- fix resource allocation management
- reduce variables scope
- fix complex_div
- fix comments, constants
- simplify add/sub operations
2019-03-18 19:07:34 +03:00
Alexander Alekhin
a7c4ee9ae1 core: add iterations limit check in eigenNonSymmetric() 2019-03-18 17:49:17 +03:00
Alexander Alekhin
c03a936cfc Merge pull request #14040 from seiko2plus:issue13211 2019-03-14 13:04:35 +00:00
Sayed Adel
872e7894b4 core:vsx working around gcc aligned memory access bug
- allow cmake to check sanity of vsx aligned ld/st
 - force universal intrinsics v_load_aligned/v_store_aligned
   to failback to unaligned ld/st if cmake runtime vsx aligned test fail
2019-03-14 01:55:40 +02:00
Alexander Alekhin
8c8715c4dd fix static analysis issues 2019-03-13 17:19:39 +03:00
Alexander Alekhin
80e5642ca2 pre: OpenCV 3.4.6 (version++) 2019-03-12 13:29:42 +03:00
Alexander Alekhin
98e2bb8721 Merge pull request #14022 from alalek:core_fix_neon_intrinsics 2019-03-11 13:52:43 +00:00
Alexander Alekhin
842c58a7d6 core(intrin): NEON v_load_expand_q() support unaligned addr 2019-03-11 12:06:05 +00:00
Giles Payne
11dbd86aa3 Merge pull request #13956 from komakai:java-mat-class-improvements
* Expose more C++ functionality in the Java wrapper of the Mat class
In particular expose methods for handling Mat with more than 2 dimensions
* add constructors taking an array of dimension sizes
* add constructor taking an existing Mat and an array of Ranges
* add override of the create method taking an array of dimension sizes
* add overrides of the ones and zeros methods taking an array of dimension sizes
* add override of the submat method taking an array of ranges
* add overrides of put and get taking arrays of indices
* add wrapper for copySize method
* fix crash in the JNI wrapper of the reshape(int cn, int[] newshape) method
* add test for each method added to Mat.java

* Fix broken test
2019-03-10 00:11:04 +03:00
Alexander Alekhin
7e8cc580c9 Merge pull request #13997 from alalek:imgproc_dispatch_cvtcolor 2019-03-08 16:18:44 +00:00
Alexander Alekhin
8b541e450b imgproc: dispatch color*
Lab/XYZ modes have been postponed (color_lab.cpp):
- need to split code for tables initialization and for pixels processing first
- no significant performance improvements for switching between SSE42 / AVX2 code generation
2019-03-07 15:45:05 +03:00
Alexander Alekhin
b9d2e6664d Merge pull request #13979 from alalek:issue_13772 2019-03-07 09:53:25 +00:00
Alexander Alekhin
7bcfe8f204 Merge pull request #13974 from alalek:fix_build_warnings 2019-03-05 14:57:50 +00:00
Alexander Alekhin
7366eebebb core: fix condition in OutputArray::create(allowTransposed=True) 2019-03-05 16:26:59 +03:00
Alexander Alekhin
35edad3e74 build: fix warnings 2019-03-05 14:47:04 +03:00
Alexander Alekhin
f5b58e5fc9 bindings: backport generator from OpenCV 4.x
- better handling of enum arguments
- less merge conflicts
2019-03-01 20:18:48 +00:00
Sayed Adel
5478165e16 core:vsx Fix narrowing warning on vector splats 2019-03-01 00:48:38 +00:00
Alexander Alekhin
a9f67c2d1d Merge pull request #13905 from terfendail:pyr_wintr2 2019-02-28 14:53:42 +00:00
Alexander Alekhin
ed8f0ab265 Merge pull request #13931 from berak:core_fix_mat_max_mult 2019-02-28 14:29:31 +00:00
berak
20afae5a14 core: fix mat matx multiplication 2019-02-28 14:22:54 +01:00
Vitaly Tuzov
9548093b46 Horizontal line processing for pyrDown() reworked using wide universal intrinsics. 2019-02-28 00:12:57 +03:00
catree
576ab3df9a Add division operators for Matx. 2019-02-27 19:36:23 +01:00
Alexander Alekhin
f1f0f630c7 core: disable I/O perf test
- can be enable separately if needed
- not stable (due storage I/O processing)
2019-02-27 18:07:45 +03:00
Alexander Alekhin
fd49ee5f39 core: dispatch merge.cpp 2019-02-23 15:42:26 +00:00
Alexander Alekhin
93a36b0df1 core: keep history of merge.cpp 2019-02-23 15:41:39 +00:00
Alexander Alekhin
4e12febe90 core: clone merge.simd.hpp 2019-02-23 15:41:33 +00:00
Alexander Alekhin
6eabe6bc14 core: clone merge.dispatch.cpp 2019-02-23 15:41:33 +00:00
Alexander Alekhin
91d152e2c2 core: dispatch split.cpp 2019-02-22 09:54:31 +00:00
Alexander Alekhin
1d8b30bf4f core: keep history of split.cpp 2019-02-22 09:18:51 +00:00
Alexander Alekhin
0311770e8b core: clone split.simd.hpp 2019-02-22 09:18:27 +00:00
Alexander Alekhin
82cd2f8c93 core: clone split.dispatch.cpp 2019-02-22 09:17:51 +00:00
Alexander Alekhin
3064e40d9e Merge pull request #13866 from alalek:core_dispatch_mean 2019-02-20 11:50:21 +00:00
Vitaly Tuzov
334c4d62b5 Merge pull request #13781 from terfendail:warp_wintr
Resize reworked using wide universal intrinsics (#13781)

* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize

* Reworked linear resize using new wide LUT intrinsics

* Fix for VSX intrinsics
2019-02-20 14:30:28 +03:00
Alexander Alekhin
dc84cf9914 core: dispatch mean.cpp 2019-02-19 16:58:32 +03:00
Alexander Alekhin
4b82c8a22b core: keep history of mean.cpp 2019-02-19 16:46:46 +03:00
Alexander Alekhin
7af7bcae18 core: clone mean.dispatch.cpp 2019-02-19 16:46:28 +03:00
Alexander Alekhin
93cea6e46e core: clone mean.simd.hpp 2019-02-19 16:45:42 +03:00
Alexander Alekhin
cd66f6e3db core: dispatch matmul
- gemm: keep baseline only (lapack is 10x+ faster, lets reduce binary size)
- transform / distTransform
- scaleAdd (32f/64f only)
- Mahalanobis: keep baseline only (no perf tests)
- mulTransposed: keep baseline only (no perf tests)
- dot
2019-02-18 14:36:46 +03:00
Alexander Alekhin
fbde57dba8 core: keep history of matmul.cpp 2019-02-14 19:07:41 +03:00
Alexander Alekhin
dcee7b1605 core: clone matmul.dispatch.cpp 2019-02-14 19:07:37 +03:00
Alexander Alekhin
b769ad2c23 core: clone matmul.simd.hpp 2019-02-14 19:07:37 +03:00
Alexander Alekhin
e3633ec4a2 core: dispatch count_non_zero 2019-02-14 13:16:20 +03:00
Alexander Alekhin
0b49680339 core: keep history of count_non_zero.cpp 2019-02-14 13:15:43 +03:00
Alexander Alekhin
439e43a027 core: clone count_non_zero.dispatch.cpp 2019-02-14 13:15:39 +03:00
Alexander Alekhin
af8a3a0b66 core: clone count_non_zero.simd.hpp 2019-02-14 13:15:39 +03:00
Alexander Alekhin
b40a7ffbe4 core: dispatch sum 2019-02-13 18:17:38 +03:00
Alexander Alekhin
c88e6b344b core: keep history of sum.cpp 2019-02-13 13:49:36 +03:00
Alexander Alekhin
6e88bff3e3 core: clone sum.dispatch.cpp 2019-02-13 13:49:29 +03:00
Alexander Alekhin
5aceac6b93 core: clone sum.simd.hpp 2019-02-13 13:49:29 +03:00
Alexander Alekhin
2e28ff78c1 Merge pull request #13780 from alalek:core_dispatch_convertTo 2019-02-12 12:08:30 +00:00
klemens
5d9c6723ee spelling fixes
backport 997b7b18af
2019-02-11 15:35:10 +03:00
Alexander Alekhin
f8786c9bf4 Merge pull request #13783 from alalek:fix_13741 2019-02-09 15:25:48 +00:00
Alexander Alekhin
d32d576d6d core: dispatch convert_scale 2019-02-08 18:32:10 +03:00