- persistence.cpp code expects special sizeof value for passed structures
- this assumption is lead to memory corruption problems
- fixed/workarounded test to prevent memory corruption on Linux 32-bit systems
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
Add support for std::array<T, N> (#8535)
* Add support for std::array<T, N>
* Add std::array<Mat, N> support
* Remove UMat constructor with std::array parameter
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f5469.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
Although both `cl_platform_info` and `cl_device_info` are defined as macro `cl_uint`, it needs to use `cl_platform_info` to get
the platform information.
- don't use undefined flag=0. It should be CONSTANT instead.
- don't allow 'UMat* m=NULL' argument (except LOCAL/CONSTANT flags).
This case is not handled well to provide NULL __global pointers.
It is better to use '-D' macro defines instead (at least for performance)
Append zero to trailing decimal place for FileStorage JSON write of a float or double value (#7952)
* Fix for FileStorage JSON write of a float or double value that has no fractional part; appends a zero character after the trailing decimal place to meet JSON standard.
* strlen return to size_t type rather than unnecessary cast to int
* moved BLAS/LAPACK detection scripts from opencv_contrib/dnn to the main repository.
* trying to fix the bug with undefined symbols sgesdd_ and dgesdd_
* removed extra whitespaces; disabled LAPACK on IOS
Currently, to select a submatrix of a N-dimensional matrix, it requires
two lines of code while only one line of code is required if using a 2D
array.
I added functionality to be able to select an N-dim submatrix using a
vector list instead of a Range pointer. This allows initializer lists to
be used for a one-line selection.
This allows for an N-dimensional array to be setup in one line instead of two when using C++11 initializer lists. cv::Mat(3, {zDim, yDim, xDim}, ...) can be used instead of having to create an int pointer to hold the size array.
Maximum depth limit var was added to the instrumentation structure;
Trace names output console output fix: improper tree formatting could happen;
Output in case of error was added;
Custom regions improvements;
Improved timing and weight calculation for parallel regions; New TC (threads counter) value to indicate how many different threads accessed particular node;
parallel_for, warnings fixes and ReturnAddress code from Alexander Alekhin;
* use hasSIMD128 rather than calling checkHardwareSupport
* add SIMD check in spartialgradient.cpp
* add SIMD check in stereosgbm.cpp
* add SIMD check in canny.cpp
In YAML 1.0 the colon is mandatory. See http://yaml.org/spec/1.0/#id2558635.
This also allows prior releases to read YAML files created with the current version.
* use __GNUC_MINOR__ in correct place to check the version of GCC
* check processor support of FP16 at run time
* check compiler support of FP16 and pass correct compiler option
* rely on ENABLE_AVX on gcc since AVX is generated when mf16c is passed
* guard correctly using ifdef in case of various configuration
* use v_float16x4 correctly by including the right header file
- calculate ticksTotal instead of ticksMean
- local / global width is based on ticksTotal value
- added instrumentation for OpenCL program compilation
- added instrumentation for OpenCL kernel execution
Minor fix in MatAllocator::upload
Minor fix in MatAllocator::copy
Minor fix in setSize function
Minor fix in Mat::Mat
Minor fix in cvMatNDToMat function
Minor fix in _InputArray::getMatVector
Minor fix in _InputArray::getUMatVector
Minor fix in cv::hconcat
Minor fix in cv::vconcat
Minor fix in cv::setIdentity
Minor fix in cv::trace
Minor fix in transposeI_ template function
Minor fix in reduceC_ template function
Minor fix in sort_ template function
Minor fix in sortIdx_ template function
Minor fix in cvRange function
Minor fix in MatConstIterator::seek
Minor fix in SparseMat::create
Minor fix in SparseMat::copyTo
Minor fix in SparseMat::convertTo
Minor fix in SparseMat::convertTo
Minor fix in SparseMat::ptr
Minor fix in SparseMat::resizeHashTab
Fixes indentation
* use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation
* define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed
* brush up implementation on old compiler (guard correctly)
* add test for v_load_f16 and round trip conversion of v_float16x4
* fix conversion error