Commit Graph

4182 Commits

Author SHA1 Message Date
Alexander Alekhin
eacadf0e73 core(ocl): add flag OPENCV_OPENCL_ENABLE_MEM_USE_HOST_PTR
to control CL_MEM_USE_HOST_PTR usage
2019-09-25 15:12:36 +03:00
Alexander Alekhin
3cf9185159 Merge pull request #15538 from terfendail:wui_checkany 2019-09-23 15:52:24 +00:00
Alexander Alekhin
fcc69d5a60 core(test): fix check conditions 2019-09-22 11:28:41 +00:00
mipsopen-fwu
b1ea91d8bd Merge pull request #15422 from mipsopen-fwu:msa-dev
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed some unused code in mips.toolchain.cmake.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
2019-09-20 19:52:48 +03:00
Vitaly Tuzov
66842f5a18 Extended v_check_any/v_check_all universal intrinsics to support 64-bit integer 2019-09-19 18:31:31 +03:00
Paul E. Murphy
b465c82696 core: workaround old gcc vec_mul{e,o} (Issue #15506)
ISA 2.07 (aka POWER8) effectively extended the expanding multiply
operation to word types. The altivec intrinsics prior to gcc 8 did
not get the update.

Workaround this deficiency similar to other fixes.

This was exposed by commit 33fb253a66
which leverages the int -> dword expanding multiply.

This fixes Issue #15506
2019-09-12 09:54:02 -05:00
Alexander Alekhin
0a13633411 Merge pull request #15444 from alalek:ocl_fix_fft_kernel 2019-09-04 16:25:34 +00:00
Alexander Alekhin
8bd2720c28 core(ocl): fix fft kernel compilation
- error: variables in the local address space can only be declared in the outermost scope of a kernel function
2019-09-03 15:46:53 +03:00
Alexander Alekhin
7e46766c8d Merge pull request #15437 from devnexen:fbsd_opencl_build_fix 2019-09-03 12:21:02 +00:00
Alexander Alekhin
9ef5373776 Merge pull request #15435 from alalek:update_version_3.4.8-pre 2019-09-03 12:04:23 +00:00
Alexander Alekhin
abd7d63b74 Merge pull request #15424 from mshabunin:add-cmake-docs 2019-09-03 10:50:45 +00:00
David Carlier
6769ee3748 OpenCL: FreeBSD build fix 2019-09-02 18:30:53 +01:00
Alexander Alekhin
0fda243a05 pre: OpenCV 3.4.8 (version++) 2019-09-02 14:20:49 +03:00
Alexander Alekhin
048ddbf9ee Merge pull request #15339 from pmur:dotprod-32s-vsx 2019-08-31 11:16:04 +00:00
Alexander Alekhin
2a6527e751 Merge pull request #15402 from ChipKerchner:normUnroll 2019-08-31 11:10:05 +00:00
Maksim Shabunin
f3aab47f94 Assorted documentation fixes
* removed private flann documentation
* common tutorial images moved to doc/images
* grouping issues
2019-08-31 01:50:11 +03:00
Alexander Alekhin
f224d740a3 Merge pull request #15414 from kuzi117:instr 2019-08-30 12:03:19 +00:00
Braedy Kuzma
9bf8b496d6 Use commonly supported instruction mnemonic. 2019-08-29 10:00:40 -06:00
Braedy Kuzma
d4120dd2fe Disambiguate vecpopcnt for (u)dword2. 2019-08-29 09:54:56 -06:00
Vitaly Tuzov
d134ec54c5 Extend tests for v_check_any and v_check_all intrinsics 2019-08-28 14:53:31 +03:00
Alexander Alekhin
ca7640e10f Merge pull request #15401 from ChipKerchner:vectorReduceInt8Bug 2019-08-27 19:59:39 +00:00
ChipKerchner
288e6f9c07 –Improve vectorization in the 'norm' functions 2019-08-27 12:15:19 -05:00
ChipKerchner
70b883cfeb Fix macro bug with v_reduce_min and v_reduce_max for chars in VSX 2019-08-27 11:38:53 -05:00
Vitaly Tuzov
1b40528e1a Fix for AVX2 implementation of v_check_any(), v_check_all() intrinsics 2019-08-27 14:31:23 +03:00
Alexander Alekhin
d7409604b5 core: handle empty Mat in Mat_ assignment operators 2019-08-23 16:54:24 +03:00
Alexander Alekhin
56e832ee43 Merge pull request #15372 from alalek:core_stat_fix_intrin 2019-08-22 20:52:54 +00:00
Alexander Alekhin
8a0b93bc4d core: update fastmath.hpp 2019-08-22 16:43:07 +03:00
Alexander Alekhin
8b1fe8f6e0 core: fix stat SIMD code 2019-08-22 16:37:26 +03:00
Zyrin
869ea22f34 Use std::move in Mat_<T> move constructors 2019-08-21 11:12:00 +02:00
Zyrin
8ef8088686 Fix stack overflow on gcc with c++17 (#15343) 2019-08-21 10:57:03 +02:00
Paul E. Murphy
33fb253a66 core: vectorize dotProd_32s
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.
2019-08-20 15:28:36 -05:00
luz.paz
fcc7d8dd4e Fix modules/ typos
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`

backporting of commit: ec43292e1e
2019-08-16 17:34:29 +03:00
Alexander Alekhin
13ecd5bb25 Merge pull request #15122 from pmur:fast-math-improvements 2019-08-14 19:28:05 +00:00
Alexander Alekhin
a703b9ed84 Merge pull request #15101 from alalek:cmake_initialization 2019-08-14 19:17:07 +00:00
Alexander Alekhin
32772a5436 3.4: backported changes from 'master' branch 2019-08-14 16:36:08 +03:00
Alexander Alekhin
7c96857c02 Merge pull request #15292 from alalek:build_warnings_xcode_10_3 2019-08-13 14:18:48 +00:00
Alexander Alekhin
15b8a8d935 build: eliminate warnings with Xcode 10.3 2019-08-13 15:06:13 +03:00
Hugo Lindström
935067ee05 Merge pull request #15265 from hugolm84:wince-armv7-supports-neon
* WINCE 8.0 requires ARMv7 Thumb2 and thus have NEON instructions

* Only add NEON if on _ARM_
2019-08-09 18:01:37 +03:00
Alexander Alekhin
5ef548a985 cmake: update initialization 2019-08-08 15:23:16 +03:00
Paul E. Murphy
f38a61c66d fast_math: implement optimized PPC routines
Implement cvRound using inline asm. No compiler support
exists today to properly optimize this. This results in
about a 4x speedup over the default rounding. Likewise,
simplify the growing number of rounding function overloads.

For P9 enabled targets, utilize the classification
testing instruction to test for Inf/Nan values. Operation
speedup is about 1.2x for FP32, and 1.5x for FP64 operands.

For P8 targets, fallback to the GCC nan inline. It provides
a 1.1/1.4x improvement for FP32/FP64 arguments.
2019-08-07 15:01:18 -05:00
Paul E. Murphy
3f92bcc11a fast_math: selectively use GCC rounding builtins when available
Add a new macro definition OPENCV_USE_FASTMATH_GCC_BUILTINS to enable
usage of GCC inline math functions, if available and requested by the
user.

Likewise, enable it for POWER. This is nearly always a substantial
improvement over using integer manipulation as most operations can
be done in several instructions with no branching. The result is a
1.5-1.8x speedup in the ceil/floor operations.

1. As tested with AT 12.0-1 (GCC 8.3.1) compiler on P9 LE.
2019-08-07 15:01:18 -05:00
Paul E. Murphy
b2135be594 fast_math: add extra perf/unit tests
Add a basic sanity test to verify the rounding functions
work as expected.

Likewise, extend the rounding performance test to cover the
additional float -> int fast math functions.
2019-08-07 14:59:46 -05:00
Alexander Alekhin
821f17d666 Merge pull request #15235 from pmur:vsx-v_signmask-vbpermq 2019-08-06 20:09:22 +00:00
Victor Romero
987bb2ca61 Fix build for UWP
backport of commit: f18cbd036a
2019-08-05 17:19:36 +03:00
Paul E. Murphy
1031b7f4bc hal: vsx: further optimize v_signmask
Use the quadword bit permutation instruction to creatively move
the sign bits to create the mask. Note that values above 127 will
result in 0.
2019-08-05 09:00:22 -05:00
Alexander Alekhin
ba934ff1ce Merge pull request #15202 from hugolm84:support_build_shared_for_wince 2019-08-02 15:34:02 +00:00
Hugo Lindström
03fe1cb7fc Support building shared libraries on WINCE. 2019-08-01 15:28:04 +02:00
Maksim Shabunin
6d5ac67681 Restored IPP call reduction 2019-07-31 15:41:22 +03:00
Maksim Shabunin
eec9fa9d5e Merge pull request #15181 from berak:java_print_blob 2019-07-30 14:13:02 +00:00
berak
4d3989817c java: fix Mat.toString() for higher dimensions 2019-07-29 19:39:09 +02:00