[GSoC] OpenCV.js: WASM SIMD optimization 2.0
* gsoc_2020_simd Add perf test for filter2d
* add perf test for kernel scharr and kernel gaussianBlur
* add perf test for blur, medianBlur, erode, dilate
* fix the errors for the opencv PR robot
fix the trailing whitespace.
* add perf tests for kernel remap, warpAffine, warpPersepective, pyrDown
* fix a bug in modules/js/perf/perf_imgproc/perf_remap.js
* add function smoothBorder in helpfun.js and remove replicated function in perf test of warpAffine and warpPrespective
* fix the trailing white space issues
* add OpenCV.js loader
* Implement the Loader with help of WebAssembly Feature Detection, remove trailing whitespaces
* modify the explantion for loader in js_setup.markdown and fix bug in loader.js
- OpenCL kernel cleanup processing is asynchronous and can be called even after forced clFinish()
- buffers are released later in asynchronous mode
- silence these false positive cases for asynchronous cleanup
changed OpenCV license from BSD to Apache 2 license
* as discussed and announced earlier, changed OpenCV license from BSD to Apache 2. Many files still contain old-style copyrights though
* changed wording a bit; preserve the original OpenCV BSD license
- Added cross compile cmake file for target riscv64-clang
- Extended cmake for RISC-V and added instruction checks
- Created intrin_rvv.hpp with C++ version universal intrinsics
* Add documentation about usage of cv2eigen functions in eigen.hpp
* Fixed Doxygen syntax.
Co-authored-by: Alexander Smorkalov <smorkalov.a.m@gmail.com>
Objc binding
* Initial work on Objective-C wrapper
* Objective-C generator script; update manually generated wrappers
* Add Mat tests
* Core Tests
* Imgproc wrapper generation and tests
* Fixes for Imgcodecs wrapper
* Miscellaneous fixes. Swift build support
* Objective-C wrapper build/install
* Add Swift wrappers for videoio/objdetect/feature2d
* Framework build;iOS support
* Fix toArray functions;Use enum types whenever possible
* Use enum types where possible;prepare test build
* Update test
* Add test runner scripts for iOS and macOS
* Add test scripts and samples
* Build fixes
* Fix build (cmake 3.17.x compatibility)
* Fix warnings
* Fix enum name conflicting handling
* Add support for document generation with Jazzy
* Swift/Native fast accessor functions
* Add Objective-C wrapper for calib3d, dnn, ml, photo and video modules
* Remove IntOut/FloatOut/DoubleOut classes
* Fix iOS default test platform value
* Fix samples
* Revert default framework name to opencv2
* Add converter util functions
* Fix failing test
* Fix whitespace
* Add handling for deprecated methods;fix warnings;define __OPENCV_BUILD
* Suppress cmake warnings
* Reduce severity of "jazzy not found" log message
* Fix incorrect #include of compatibility header in ios.h
* Use explicit returns in subscript/get implementation
* Reduce minimum required cmake version to 3.15 for Objective-C/Swift binding
* fixed#17044
1. fixed Python part of the tutorial about using OpenCV XML-YAML-JSON I/O functionality from C++ and Python.
2. added startWriteStruct() and endWriteStruct() methods to FileStorage
3. modifed FileStorage::write() methods to make them work well inside sequences, not only mappings.
* try to fix the doc builder
* added Python regression test for FileStorage I/O API ([TODO] iterating through long sequences can be very slow)
* fixed yaml testing
* Fix integer overflow in parseOption().
Previous code does not work for values like 100000MB.
* Fix warning during 32-bit build on inactive code path.
* fix build without C++11
* add eigen tensor conversion functions
* add eigen tensor conversion tests
* add support for column major order
* update eigen tensor tests
* fix coding style and add conditional compilation
* fix conditional compilation checks
* remove whitespace
* rearrange functions for easier reading
* reformat function documentation and add tensormap unit test
* cleanup documentation of unit test
* remove condition duplication
* check Eigen major version, not minor version
* restrict to Eigen v3.3.0+
* add documentation note and add type checking to cv2eigen_tensormap()
* fixed several problems when running tests on Mac:
* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)
* retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code
* moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine)
* force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
trying to fix handling file storages with extremely long lines
* trying to fix handling of file storages with extremely long lines: https://github.com/opencv/opencv/issues/11061
* * fixed errorneous pointer access in JSON parser.
* it's now crash-test time! temporarily set the initial parser buffer size to just 40 bytes. let's run all the test and check if the buffer is always correctly resized and handled
* fixed pointer use in JSON parser; added the proper test to catch this case
* fixed the test to make it more challenging. generate test json with
*
**
***
etc. shape
* Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster on VSX
* Add is_same method so c++11 is not required
* Remove trailing whitespaces.
* Change is_same to DataType depth check
Vectorize minMaxIdx functions
* Updated documentation and intrinsic tests for v_reduce
* Add other files back in from the forced push
* Prevent an constant overflow with v_reduce for int8 type
* Another alternative to fix constant overflow warning.
* Fix another compiler warning.
* Update comments and change comparison form to be consistent with other vectorized loops.
* Change return type of v_reduce_min & max for v_uint8 and v_uint16 to be same as lane type.
* Cast v_reduce functions to int to avoid overflow. Reduce number of parameters in MINMAXIDX_REDUCE macro.
* Restore cast type for v_reduce_min & max to LaneType
* add cv::compare test when Mat type == CV_16F
* add assertion in cv::compare when src.depth() == CV_16F
* cv::compare assertion minor fix
* core: add more checks
Add checks for empty operands in Matrix expressions that don't check properly
* Starting to add checks for empty operands in Matrix expressions that
don't check properly.
* Adding checks and delcarations for checker functions
* Fix signatures and add checks for each class of Matrix Expr operation
* Make it catch the right exception
* Don't expose helper functions to public API
* resize: HResizeLinear reduce duplicate work
There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.
Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.
The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this. The performance
improvement is 10-50% depending on the scaling.
* imgproc: vectorize HResizeLinear
Performance is mostly gated by the gather operations
for x inputs.
Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.
While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.
For float types, this results in a more modest
1.2x improvement.
* Update U8 processing for non-bitexact linear resize
* core: hal: vsx: improve v_load_expand_q
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
* resize: Fix cn == 3 step per feedback
Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
* calib3d: use normalized input in solvePnPGeneric()
* calib3d: java regression test for solvePnPGeneric
* calib3d: python regression test for solvePnPGeneric
* core: disable invalid constructors in C API by default
- C API objects will lose their default initializers through constructors
* samples: stop using of C API
Tests for argument conversion of Python bindings generator
* Tests for parsing elemental types from Python bindings
- Add positive and negative tests for int, float, double, size_t,
const char*, bool.
- Tests with wrong conversion behavior are skipped.
* Move implicit conversion of bool to integer/floating types to wrong
conversion behavior.
Improving VSX performance of integral function
* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.
* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.
* Adding SSE/NEON/AVX intrinsics.
* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.
* core(simd): add tests for v_extract_n/v_broadcast_element
- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations
* core(simd): fix compilation
- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only
* cleanup
* Use FlsAlloc/FlsFree/FlsGetValue/FlsSetValue instead of TlsAlloc/TlsFree/TlsGetValue/TlsSetValue to implment TLS value cleanup when thread has been terminated on Windows Vista and above
* Fix 32-bit build
* Fixed calling convention of cleanup callback
* WINAPI changed to NTAPI
* Use proper guard macro
* Vectorize flipHoriz and flipVert functions.
* Change v_load_mirror_1 to use vec_revb for VSX
* Only use vec_revb in ISA3.0
* Removing vec_revb code since some of the older compilers don't fully support it.
* Use new v_reverse intrinsic and cleanup code.
* Ensure there are no alignment issues with copies
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method
(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)
Fixing bug with comparison of v_int64x2 or v_uint64x2
* Casting v_uint64x2 to v_float64x2 and comparing does NOT work in all cases. Rewrite using epi64 instructions - faster too.
* Fix bad merge.
* Fix equal comparsion for non-SSE4.1. Add test cases for v_int64x2 comparisons.
* Try to fix merge conflict.
* Only test v_int64x2 comparisons if CV_SIMD_64F
* Fix compiler warning.
* New v_reverse HAL intrinsic for reversing the ordering of a vector
* Fix conflict.
* Try to resolve conflict again.
* Try one more time.
* Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and paste issue with NEON.
* Change v_uint16x8 SSE2 version to use shuffles