Commit Graph

4272 Commits

Author SHA1 Message Date
Alexander Alekhin
d42e04d0df core(SIMD): fix MSA build - add v_reduce_min/max for u8/s8 2020-01-20 15:10:03 +03:00
Chip Kerchner
301626ba26 Merge pull request #15488 from ChipKerchner:vectorizeMinMax2
Vectorize minMaxIdx functions

* Updated documentation and intrinsic tests for v_reduce

* Add other files back in from the forced push

* Prevent an constant overflow with v_reduce for int8 type

* Another alternative to fix constant overflow warning.

* Fix another compiler warning.

* Update comments and change comparison form to be consistent with other vectorized loops.

* Change return type of v_reduce_min & max for v_uint8 and v_uint16 to be same as lane type.

* Cast v_reduce functions to int to avoid overflow. Reduce number of parameters in MINMAXIDX_REDUCE macro.

* Restore cast type for v_reduce_min & max to LaneType
2020-01-17 19:37:35 +03:00
Alexander Alekhin
a9f3acb125 core(simd): fix NEON alignmnet issue 2020-01-11 18:39:50 +00:00
Alexander Alekhin
e180cc050b
Merge pull request #16236 from alalek:fix_core_simd_emulator
* core: fix intrin_cpp, allow to build modules with SIMD emulator

* core(arithm): fix v_zero initialization

* core(simd): 'strict' types for binary/bitwise operations

* features2d: avoid aligned load issue in GCC 5.4 with emulated SIMD

* core(simd): alignment checks in SIMD emulator
2020-01-10 21:31:02 +03:00
Nuzhny007
7d484d21f7 Fixed compilation on windows with openvx 2020-01-06 06:32:56 +03:00
Alexander Alekhin
523f081923 core(check): add Size_<int> 2019-12-28 13:50:39 +00:00
Brian Wignall
f9c514b391 Fix spelling typos
backport commit 659ffaddb4
2019-12-27 12:46:53 +00:00
Alexander Alekhin
5e2bcc9149 Merge tag '3.4.9' 2019-12-20 12:44:15 +03:00
Alexander Alekhin
64e6cf9fe5 release: OpenCV 3.4.9 2019-12-19 18:16:47 +03:00
Alexander Alekhin
dff8e29f98 Merge pull request #16139 from alalek:core_flip_avoid_unaligned 2019-12-19 10:29:07 +00:00
Alexander Alekhin
8d22ac200f core: workaround flipHoriz() alignment issues 2019-12-19 00:05:23 +00:00
Tatsuro Shibamura
971ae00942 Merge pull request #16027 from shibayan:arm64-windows10
* Support ARM64 Windows 10 platform

* Fixed detection issue for ARM64 Windows 10

* Try enabling ARM NEON intrin

* build: disable NEON with MSVC compiler

* samples(directx): gdi32 dependency
2019-12-17 00:23:30 +03:00
Alexander Alekhin
a45928045a
Merge pull request #16150 from alalek:cmake_avoid_deprecated_link_private
* cmake: avoid deprecated LINK_PRIVATE/LINK_PUBLIC

see CMP0023 (CMake 2.8.12+)

* cmake: fix 3rdparty list

- don't include OpenCV modules
2019-12-13 17:52:40 +03:00
RAJKIRAN NATARAJAN
e6ce752da1 Merge pull request #15966 from saskatchewancatch:issue-15760
Add checks for empty operands in Matrix expressions that don't check properly

* Starting to add checks for empty operands in Matrix expressions that
don't check properly.

* Adding checks and delcarations for checker functions

* Fix signatures and add checks for each class of Matrix Expr operation

* Make it catch the right exception

* Don't expose helper functions to public API
2019-12-12 19:23:57 +03:00
Alexander Alekhin
f2cce5fd8c Merge pull request #16125 from alalek:core_safe_xadd 2019-12-11 14:15:46 +00:00
Alexander Alekhin
7d61426279 Merge pull request #16124 from alalek:issue_13354 2019-12-11 14:15:23 +00:00
Alexander Alekhin
416848066c core: provide safe implementations of CV_XADD() only 2019-12-11 00:48:45 +00:00
Alexander Alekhin
76b5e19eb3 core: add "namespace cv" in CV_StaticAssert fallback implementation 2019-12-11 00:35:13 +00:00
Alexander Alekhin
a675c4937a core: OPENCV_INCLUDE_PORT_FILE for custom platform configuration 2019-12-11 00:31:45 +00:00
Alexander Alekhin
a2642d83d3 Merge pull request #16093 from alalek:core_itt_thread_name_16072 2019-12-09 18:29:53 +00:00
Paul Murphy
a011035ed6 Merge pull request #15257 from pmur:resize
* resize: HResizeLinear reduce duplicate work

There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.

Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.

The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this.  The performance
improvement is 10-50% depending on the scaling.

* imgproc: vectorize HResizeLinear

Performance is mostly gated by the gather operations
for x inputs.

Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.

While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.

For float types, this results in a more modest
1.2x improvement.

* Update U8 processing for non-bitexact linear resize

* core: hal: vsx: improve v_load_expand_q

With a little help, we can do this quickly without gprs on
all VSX enabled targets.

* resize: Fix cn == 3 step per feedback

Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
2019-12-09 14:54:06 +03:00
Alexander Alekhin
816f82682b core(trace/itt): avoid calling __itt_thread_set_name() by default
- don't override current application thread names
- set name for own threads only
2019-12-07 21:41:15 +00:00
Alexander Alekhin
76a27e3399 pre: OpenCV 3.4.9 (version++) 2019-12-05 18:28:38 +00:00
Alexander Alekhin
72f35e0626
Merge pull request #16052 from alalek:issue_16040
* calib3d: use normalized input in solvePnPGeneric()

* calib3d: java regression test for solvePnPGeneric

* calib3d: python regression test for solvePnPGeneric
2019-12-05 15:36:39 +03:00
Alexander Alekhin
f21bde4d9f
Merge pull request #16046 from alalek:issue_15990
* core: disable invalid constructors in C API by default

- C API objects will lose their default initializers through constructors

* samples: stop using of C API
2019-12-05 14:48:18 +03:00
Alexander Alekhin
95e36fd488 Merge pull request #16055 from alalek:issue_16041 2019-12-05 07:54:17 +00:00
Alexander Alekhin
4dfa0a0383 bindings: basic support for #if preprocessor directives
- #if 0
- #ifdef __OPENCV_BUILD
2019-12-04 18:42:31 +03:00
Alexander Alekhin
818585fd12 core(tls): unblock TlsAbstraction destructor call
- required to unregister callbacks from system
2019-12-04 08:27:01 +00:00
Vadim Levin
8d74101f07 Merge pull request #15955 from VadimLevin:dev/vlevin/generator_tests
Tests for argument conversion of Python bindings generator

* Tests for parsing elemental types from Python bindings

  - Add positive and negative tests for int, float, double, size_t,
    const char*, bool.
  - Tests with wrong conversion behavior are skipped.

* Move implicit conversion of bool to integer/floating types to wrong
conversion behavior.
2019-11-29 16:24:13 +03:00
Alexander Alekhin
70146700aa Merge pull request #15839 from alalek:core_simd_v_setall_template 2019-11-27 19:19:35 +00:00
Brian Wignall
af997529a1 Fix some typos 2019-11-26 18:41:19 +03:00
Alexander Alekhin
50ac880335 Merge pull request #15971 from alalek:core_kmeans_handle_overflow 2019-11-22 21:36:02 +00:00
Natsu
54e6f5c237 Merge pull request #15970 from akemimadoka:master
* Fix android armv7 c++_static init crash

* core: move initialization of 'ios_base::Init' for Android
2019-11-22 18:42:25 +03:00
Alexander Alekhin
3266ac7667 core(kmeans): bailout if can't select cluster center 2019-11-22 14:40:02 +00:00
Alexander Alekhin
ec55b6f6db core: fix MSA build 2019-11-21 18:59:41 +03:00
Everton Constantino
75315fb297 Merge pull request #15494 from everton1984:hal_vector_get_n
Improving VSX performance of integral function

* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.

* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.

* Adding SSE/NEON/AVX intrinsics.

* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.

* core(simd): add tests for v_extract_n/v_broadcast_element

- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations

* core(simd): fix compilation

- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only

* cleanup
2019-11-20 13:41:07 +03:00
Alexander Alekhin
e07a488012
Merge pull request #15925 from alalek:core_test_simd_cpp_emulation
core(test): extending tests with SIMD C++ emulation code (intrin_cpp.hpp)

* core(test): test SIMD CPP emulation code (intrin_cpp.hpp)

* core(simd): eliminate build warnings from intrin_cpp.hpp
2019-11-19 21:08:45 +03:00
clunietp
2185bce4b7 Fix 13577 2019-11-18 07:41:34 -05:00
Alexander Alekhin
6773b938b3 Merge pull request #15896 from alalek:build_gcc_9 2019-11-14 14:22:02 +00:00
Christoph Bachhuber
c638f085aa Refactor for clarity and avoiding code duplication
Implement GArik's comments

Remove unnecessary c_str()

Fix brace position
2019-11-12 19:22:42 +01:00
Alexander Alekhin
7ecdcf6ca6 build: GCC9 compilation 2019-11-12 18:49:34 +03:00
Alexander Alekhin
e9dcecf9b4 Merge pull request #15826 from alalek:cmake_fix_itt_define_condition 2019-11-10 09:22:22 +00:00
Alexander Alekhin
d66aa2e0ff Merge pull request #15848 from alalek:backport_test_15842 2019-11-07 16:48:41 +00:00
Alexander Alekhin
54d9597522 Merge pull request #15814 from i-murzov:3.4-ocl-cleanup 2019-11-06 09:54:29 +00:00
Igor Murzov
cdbfdcc363 Fix OpenCL device detection when some OpenCL platform has no devices
It's not an error if some OpenCL platform has no devices. This makes
OpenCL device detection work correctly in the following scenario:

$ OPENCV_OPENCL_DEVICE=:GPU: ./opencv_test_dnn

OpenCV version: 4.1.2-dev
OpenCV VCS version: 4.1.2-80-g467748ee98-dirty
Build type: Debug
Compiler: /usr/bin/g++  (ver 7.4.0)
Parallel framework: pthreads
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?
Intel(R) IPP version: ippIP AVX2 (l9) 2019.0.0 Gold (-) Jul 24 2018
OpenCL Platforms:
    AMD Accelerated Parallel Processing
    Portable Computing Language
        CPU: pthread-AMD Ryzen 7 2700X Eight-Core Processor (OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-znver1)
    NVIDIA CUDA
        dGPU: GeForce GTX 1080 (OpenCL 1.2 CUDA)
Current OpenCL device:
    Type = dGPU
    Name = GeForce GTX 1080
    Version = OpenCL 1.2 CUDA
    Driver version = 430.26
2019-11-05 20:02:39 +03:00
TH3CHARLie
2c2716de0f core(test): add test for YAML parse multiple documents
- added removal of temporary file
2019-11-05 18:39:07 +03:00
Igor Murzov
6d5b900324 Simplify OpenCL info dumping code:
* Reduce code nesting
* Drop redundant .c_str() calls
2019-11-05 14:49:49 +03:00
Alexander Alekhin
a893969ec9 core(simd): v_setall template 2019-11-03 12:49:25 +00:00
yuriyluxriot
4e156a162f Merge pull request #15812 from yuriyluxriot:fls_replaces_tls
* Use FlsAlloc/FlsFree/FlsGetValue/FlsSetValue instead of TlsAlloc/TlsFree/TlsGetValue/TlsSetValue to implment TLS value cleanup when thread has been terminated on Windows Vista and above

* Fix 32-bit build

* Fixed calling convention of cleanup callback

* WINAPI changed to NTAPI

* Use proper guard macro
2019-11-01 22:33:12 +03:00
Chip Kerchner
ed7e4273cd Merge pull request #15555 from ChipKerchner:flipVectorize
* Vectorize flipHoriz and flipVert functions.

* Change v_load_mirror_1 to use vec_revb for VSX

* Only use vec_revb in ISA3.0

* Removing vec_revb code since some of the older compilers don't fully support it.

* Use new v_reverse intrinsic and cleanup code.

* Ensure there are no alignment issues with copies
2019-11-01 22:30:48 +03:00