Commit Graph

2617 Commits

Author SHA1 Message Date
Alexander Alekhin
b6a58818bb Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-11-11 20:25:42 +00:00
Alexander Alekhin
f42d5399aa core(persistence): add more checks for implementation limitations 2019-11-07 14:19:00 +03:00
Igor Murzov
cdbfdcc363 Fix OpenCL device detection when some OpenCL platform has no devices
It's not an error if some OpenCL platform has no devices. This makes
OpenCL device detection work correctly in the following scenario:

$ OPENCV_OPENCL_DEVICE=:GPU: ./opencv_test_dnn

OpenCV version: 4.1.2-dev
OpenCV VCS version: 4.1.2-80-g467748ee98-dirty
Build type: Debug
Compiler: /usr/bin/g++  (ver 7.4.0)
Parallel framework: pthreads
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?
Intel(R) IPP version: ippIP AVX2 (l9) 2019.0.0 Gold (-) Jul 24 2018
OpenCL Platforms:
    AMD Accelerated Parallel Processing
    Portable Computing Language
        CPU: pthread-AMD Ryzen 7 2700X Eight-Core Processor (OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-znver1)
    NVIDIA CUDA
        dGPU: GeForce GTX 1080 (OpenCL 1.2 CUDA)
Current OpenCL device:
    Type = dGPU
    Name = GeForce GTX 1080
    Version = OpenCL 1.2 CUDA
    Driver version = 430.26
2019-11-05 20:02:39 +03:00
Alexander Alekhin
dcf72e49e2 core(persistence): fix processing of multiple documents 2019-11-05 18:28:15 +03:00
Alexander Alekhin
0d7f770996 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-11-04 09:58:29 +00:00
yuriyluxriot
4e156a162f Merge pull request #15812 from yuriyluxriot:fls_replaces_tls
* Use FlsAlloc/FlsFree/FlsGetValue/FlsSetValue instead of TlsAlloc/TlsFree/TlsGetValue/TlsSetValue to implment TLS value cleanup when thread has been terminated on Windows Vista and above

* Fix 32-bit build

* Fixed calling convention of cleanup callback

* WINAPI changed to NTAPI

* Use proper guard macro
2019-11-01 22:33:12 +03:00
Chip Kerchner
ed7e4273cd Merge pull request #15555 from ChipKerchner:flipVectorize
* Vectorize flipHoriz and flipVert functions.

* Change v_load_mirror_1 to use vec_revb for VSX

* Only use vec_revb in ISA3.0

* Removing vec_revb code since some of the older compilers don't fully support it.

* Use new v_reverse intrinsic and cleanup code.

* Ensure there are no alignment issues with copies
2019-11-01 22:30:48 +03:00
Alexander Alekhin
ea5499fa51 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-29 20:46:51 +00:00
Alexander Alekhin
bad4e5c3eb Merge pull request #15692 from alalek:core_tls_handle_thread_termination 2019-10-29 20:40:35 +00:00
Alexander Alekhin
6ec5ae0215 core(trace): add ITT control parameter
- OPENCV_TRACE_ITT_ENABLE
2019-10-26 15:03:51 +00:00
Alexander Alekhin
055ffc0425 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-24 18:21:19 +00:00
Alexander Alekhin
17e2bf5717 core(tls): implement releasing of TLS on thread termination
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method

(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)
2019-10-24 06:36:18 +00:00
Alexander Alekhin
938d8dce06 Merge pull request #15685 from pmur:cnz64f-simd 2019-10-18 20:19:40 +00:00
Alexander Alekhin
ad5d14ec0e Merge pull request #15701 from alalek:issue_15691 2019-10-16 11:13:07 +00:00
Alexander Alekhin
823884b064 core(alloc): force initialization of memalign flag
- before main() launch
2019-10-15 13:07:11 +03:00
Alexander Alekhin
6a7d1c15d3 core(ipp): skip huge input in flip()
- IPP/SSE4.2 works well
2019-10-14 18:26:19 +03:00
Alexander Alekhin
e42560bed5 Merge pull request #15659 from malfet:use-atomic-in-getExpTab32f 2019-10-12 20:27:58 +00:00
Alexander Alekhin
d6630ab35b Merge pull request #15655 from malfet:use-atomic-in-parallel-for 2019-10-12 20:26:15 +00:00
Paul E. Murphy
ec91a3d59d core: vectorize countNonZero64f
Improves performance a bit. 2.2x on P9 and 2 - 3x on coffee lake
x86-64.
2019-10-11 09:02:46 -05:00
Alexander Alekhin
65573784c4 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-09 19:46:18 +00:00
Maksim Shabunin
1ca74c3c03 Merge pull request #15544 from mshabunin:disable_posix_memalign
* Disable posix_memalign by default

* core: fix memalign parameter handling
2019-10-09 14:06:12 +03:00
Marcin Tolysz
3fd36c1be1 Merge pull request #15658 from tolysz:patch-1
* Cuda + OpenGL on ARM

There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations. 
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.

* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
2019-10-09 11:38:10 +03:00
Nikita Shulga
ec37364762 Use std::atomic in getExpTab32f and getLogTab32f
Reads and writes to volatile bool are not guaranteed to be atomic.
2019-10-07 16:35:07 -07:00
Nikita Shulga
23288b7cb5 Use atomic operations to modify flagNestedParallelFor
This ensures uniform behavior on any C++11 compliant compiler
2019-10-07 16:26:30 -07:00
Sayed Adel
f2fe6f40c2 Merge pull request #15510 from seiko2plus:issue15506
* core: rework and optimize SIMD implementation of dotProd

  - add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
  - add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
    pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
  - fix clang build on ppc64le
  - support wide universal intrinsics for dotProd_32s
  - remove raw SIMD and activate universal intrinsics for dotProd_8
  - implement SIMD optimization for dotProd_s16&u16
  - extend performance test data types of dotprod
  - fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
  - optimize v_mul_expand(int32) on VSX

* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast

  this changes made depend on "terfendail" review
2019-10-07 22:01:35 +03:00
Suleyman TURKMEN
c0489963bb
Update copy.cpp 2019-10-07 11:59:52 +03:00
Alexander Alekhin
626bfbf309 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-05 15:45:31 +00:00
Alexander Alekhin
98fc098216 Merge pull request #15646 from alalek:fix_avx512_detection 2019-10-05 15:30:09 +00:00
Alexander Alekhin
22d0c57a1c Merge pull request #15602 from alalek:core_softfloat_ubsan_shift 2019-10-05 15:27:35 +00:00
Alexander Alekhin
bdc097495a fix avx512 detection
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
2019-10-05 11:03:57 +00:00
Alexander Alekhin
3fb6617d62 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-02 17:49:19 +03:00
Alexander Alekhin
77346d7286 core: workaround transform() inplace calls 2019-10-01 16:52:14 +03:00
Alexander Alekhin
ed9bca969c core: fix UBSAN in softfloat 2019-09-27 16:29:50 +03:00
Alexander Alekhin
bc927f9788 Merge pull request #15591 from alalek:core_persistence_fix 2019-09-26 12:59:37 +00:00
Alexander Alekhin
e2a5a6a05c Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-25 18:32:44 +00:00
Alexander Alekhin
677b94c92e Merge pull request #15579 from alalek:ocl_use_host_mem_ptr_flag 2019-09-25 15:12:59 +00:00
Alexander Alekhin
eacadf0e73 core(ocl): add flag OPENCV_OPENCL_ENABLE_MEM_USE_HOST_PTR
to control CL_MEM_USE_HOST_PTR usage
2019-09-25 15:12:36 +03:00
Alexander Alekhin
6e246ee58c core(persistence): fix reserveNodeSpace() implementation
- avoid data copying after buffer block shrink
- resize current block in case of single FileNode
2019-09-25 15:02:20 +03:00
Wenzhao Xiang
c2096771cb Merge pull request #15371 from Wenzhao-Xiang:gsoc_2019
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)

* [GSoC 2019]

Improve the performance of JavaScript version of OpenCV (OpenCV.js):
1. Create the base of OpenCV.js performance test:
     This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it.
2. Optimize the OpenCV.js performance by WASM threads:
     This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser.
3. Optimize the OpenCV.js performance by WASM SIMD:
     Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.

* [GSoC2019] 

1. use short license header
2. fix documentation node issue
3. remove the unused `hasSIMD128()` api

* [GSoC2019]

1. fix emscripten define
2. use fallback function for f16

* [GSoC2019]

Fix rebase issue
2019-09-24 16:30:42 +03:00
Alexander Alekhin
a74fe2ec01 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-20 21:11:49 +00:00
mipsopen-fwu
b1ea91d8bd Merge pull request #15422 from mipsopen-fwu:msa-dev
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed some unused code in mips.toolchain.cmake.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
2019-09-20 19:52:48 +03:00
Alexander Alekhin
bea2c75452 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-05 14:29:22 +03:00
Alexander Alekhin
0a13633411 Merge pull request #15444 from alalek:ocl_fix_fft_kernel 2019-09-04 16:25:34 +00:00
Alexander Alekhin
8bd2720c28 core(ocl): fix fft kernel compilation
- error: variables in the local address space can only be declared in the outermost scope of a kernel function
2019-09-03 15:46:53 +03:00
David Carlier
6769ee3748 OpenCL: FreeBSD build fix 2019-09-02 18:30:53 +01:00
Alexander Alekhin
048ddbf9ee Merge pull request #15339 from pmur:dotprod-32s-vsx 2019-08-31 11:16:04 +00:00
Alexander Alekhin
2a6527e751 Merge pull request #15402 from ChipKerchner:normUnroll 2019-08-31 11:10:05 +00:00
ChipKerchner
288e6f9c07 –Improve vectorization in the 'norm' functions 2019-08-27 12:15:19 -05:00
Alexander Alekhin
a7b954f655 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-23 19:24:37 +03:00
Kazuma Furuhashi
ccecd3405a Merge pull request #15007 from 284km:fixatypo
s/last_occurence/last_occurrence/
2019-08-22 17:32:25 +03:00