* another round of dnn optimization:
* increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly
* improved SIMD optimization of pooling layer, optimized average pooling
* cleaned up convolution layer implementation
* made activation layer "attacheable" to all other layers, including fully connected and addition layer.
* fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology.
* greatly optimized permutation layer, which improved SSD performance
* parallelized element-wise binary/ternary/... ops (sum, prod, max)
* also, added missing copyrights to many of the layer implementation files
* temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders
Fixed snprintf for VS 2013 (#8816)
* Fixed snprintf for VS 2013
* snprintf: removed declaration from header, changed implementation
* cv_snprintf corrected according to comments
* update snprintf patch
* avoid link error (move the implementation of software version to header)
* make getConvertFuncFp16 local (move from precomp.hpp to convert.hpp)
* fix error on 32bit x86
Parallelize Canny with custom gradient (#8694)
* New Canny implementation. Restructuring code in parallelCanny class. Align mag buffer and map.
* Fix warnings.
* Missing SIMD check added.
* Replaced local trailingZeros in contours.cpp. Use alignSize in canny.cpp
* Fix warnings in alignSize and allocate just minimum extra columns.
* Fix another warning in map.create.
* Exchange for loop by do loop to avoid double check at the beginning.
Define extra SIMD CANNY_CHECK to avoid unnecessary continue.
* Correct the existing documented T-API functions to match the doxygen format.
* docs: fix comments style
* T-API documentation: minor formatting changes
- persistence.cpp code expects special sizeof value for passed structures
- this assumption is lead to memory corruption problems
- fixed/workarounded test to prevent memory corruption on Linux 32-bit systems
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
Add support for std::array<T, N> (#8535)
* Add support for std::array<T, N>
* Add std::array<Mat, N> support
* Remove UMat constructor with std::array parameter
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f5469.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
This test case uses a matrix with more dimensions than columns. Without
the fix in
b45e784beb
this crashes with a segmentation fault, hangs or simply fails with wrong
values.
- use suffixes like '.avx.cpp'
- added CMake-generated files for '.simd.hpp' optimization approach
- wrap HAL intrinsic headers into separate namespaces for different build flags
- automatic vzeroupper insertion (via CV_INSTRUMENT_REGION macro)
`template<typename _Tp> inline const _Tp* Mat_<_Tp>::operator [](int y) const` does not support 3d matrix since it checks rows.
This operator[] shall check size.p[0] instead.
* Fix the documentation for Mat::diag(int).
Fix issue #8181
* Fix the documentation for Mat::diag(int).
Fix issue #8181.
* Add support for printing out cv::Complex.
* Remove extra spaces.
* cv::Complex is submitted as a new pull request.
Add support for printing out cv::Complex. (#8208)
* Add support for printing out cv::Complex.
* Conform to the format of std::complex.
* Remove extra spaces.
* Remove extra spaces.
Although both `cl_platform_info` and `cl_device_info` are defined as macro `cl_uint`, it needs to use `cl_platform_info` to get
the platform information.
- don't use undefined flag=0. It should be CONSTANT instead.
- don't allow 'UMat* m=NULL' argument (except LOCAL/CONSTANT flags).
This case is not handled well to provide NULL __global pointers.
It is better to use '-D' macro defines instead (at least for performance)
Fix typos in the documentation for AutoBuffer. (#8197)
* Allocate 1000 floats to match the documentation
Fix the documentation of `AutoBuffer`. By default, the following code
```.cpp
cv::AutoBuffer<float> m;
````
allocates only 264 floats. But the comment in the demonstration code says it allocates 1000 floats, which is
not correct.
* fix typo in the comment.
Append zero to trailing decimal place for FileStorage JSON write of a float or double value (#7952)
* Fix for FileStorage JSON write of a float or double value that has no fractional part; appends a zero character after the trailing decimal place to meet JSON standard.
* strlen return to size_t type rather than unnecessary cast to int
* moved BLAS/LAPACK detection scripts from opencv_contrib/dnn to the main repository.
* trying to fix the bug with undefined symbols sgesdd_ and dgesdd_
* removed extra whitespaces; disabled LAPACK on IOS
Currently, to select a submatrix of a N-dimensional matrix, it requires
two lines of code while only one line of code is required if using a 2D
array.
I added functionality to be able to select an N-dim submatrix using a
vector list instead of a Range pointer. This allows initializer lists to
be used for a one-line selection.
This allows for an N-dimensional array to be setup in one line instead of two when using C++11 initializer lists. cv::Mat(3, {zDim, yDim, xDim}, ...) can be used instead of having to create an int pointer to hold the size array.
Maximum depth limit var was added to the instrumentation structure;
Trace names output console output fix: improper tree formatting could happen;
Output in case of error was added;
Custom regions improvements;
Improved timing and weight calculation for parallel regions; New TC (threads counter) value to indicate how many different threads accessed particular node;
parallel_for, warnings fixes and ReturnAddress code from Alexander Alekhin;
* use hasSIMD128 rather than calling checkHardwareSupport
* add SIMD check in spartialgradient.cpp
* add SIMD check in stereosgbm.cpp
* add SIMD check in canny.cpp
A bug in ICC improperly identified the first parameter as "void*"
rather than the proper "volatile long*". This is scheduled to be
fixed in ICC in a future release.
This patch casts only to a "long*" to preserve backwards compatibility
with the ICC 16 and ICC 17 releases.
In YAML 1.0 the colon is mandatory. See http://yaml.org/spec/1.0/#id2558635.
This also allows prior releases to read YAML files created with the current version.
[GSOC] New camera model for stitching pipeline
* implement estimateAffine2D
estimates affine transformation using robust RANSAC method.
* uses RANSAC framework in calib3d
* includes accuracy test
* uses SVD decomposition for solving 3 point equation
* implement estimateAffinePartial2D
estimates limited affine transformation
* includes accuracy test
* stitching: add affine matcher
initial version of matcher that estimates affine transformation
* stitching: added affine transform estimator
initial version of estimator that simply chain transformations in homogeneous coordinates
* calib3d: rename estimateAffine3D test
test Calib3d_EstimateAffineTransform rename to Calib3d_EstimateAffine3D. This is more descriptive and prevents confusion with estimateAffine2D tests.
* added perf test for estimateAffine functions
tests both estimateAffine2D and estimateAffinePartial2D
* calib3d: compare error in square in estimateAffine2D
* incorporates fix from #6768
* rerun affine estimation on inliers
* stitching: new API for parallel feature finding
due to ABI breakage new functionality is added to `FeaturesFinder2`, `SurfFeaturesFinder2` and `OrbFeaturesFinder2`
* stitching: add tests for parallel feature find API
* perf test (about linear speed up)
* accuracy test compares results with serial version
* stitching: use dynamic_cast to overcome ABI issues
adding parallel API to FeaturesFinder breaks ABI. This commit uses dynamic_cast and hardcodes thread-safe finders to avoid breaking ABI.
This should be replaced by proper method similar to FeaturesMatcher on next ABI break.
* use estimateAffinePartial2D in AffineBestOf2NearestMatcher
* add constructor to AffineBestOf2NearestMatcher
* allows to choose between full affine transform and partial affine transform. Other params are the as for BestOf2NearestMatcher
* added protected field
* samples: stitching_detailed support affine estimator and matcher
* added new flags to choose matcher and estimator
* stitching: rework affine matcher
represent transformation in homogeneous coordinates
affine matcher: remove duplicite code
rework flow to get rid of duplicite code
affine matcher: do not center points to (0, 0)
it is not needed for affine model. it should not affect estimation in any way.
affine matcher: remove unneeded cv namespacing
* stitching: add stub bundle adjuster
* adds stub bundle adjuster that does nothing
* can be used in place of standard bundle adjusters to omit bundle adjusting step
* samples: stitching detailed, support no budle adjust
* uses new NoBundleAdjuster
* added affine warper
* uses R to get whole affine transformation and propagates rotation and translation to plane warper
* add affine warper factory class
* affine warper: compensate transformation
* samples: stitching_detailed add support for affine warper
* add Stitcher::create method
this method follows similar constructor methods and returns smart pointer. This allows constructing Stitcher according to OpenCV guidelines.
* supports multiple stitcher configurations (PANORAMA and SCANS) for convenient setup
* returns cv::Ptr
* stitcher: dynamicaly determine correct estimator
we need to use affine estimator for affine matcher
* preserves ABI (but add hints for ABI 4)
* uses dynamic_cast hack to inject correct estimator
* sample stitching: add support for multiple modes
shows how to use different configurations of stitcher easily (panorama stitching and scans affine model)
* stitcher: find features in parallel
use new FeatureFinder API to find features in parallel. Parallelized using TBB.
* stitching: disable parallel feature finding for OCL
it does not bring much speedup to run features finder in parallel when OpenCL is enabled, because finder needs to wait for OCL device.
Also, currently ORB is not thread-safe when OCL is enabled.
* stitching: move matcher tests
move matchers tests perf_stich.cpp -> perf_matchers.cpp
* stitching: add affine stiching integration test
test basic affine stitching (SCANS mode of stitcher) with images that have only translation between them
* enable surf for stitching tests
stitching.b12 test was failing with surf
investigated the issue, surf is producing good result. Transformation is only slightly different from ORB, so that resulting pano does not exactly match ORB's result. That caused sanity check to fail.
* added size checks similar to other tests
* sanity check will be applied only for ORB
* stitching: fix wrong estimator choice
if case was exactly wrong, estimators were chosen wrong
added logging for estimated transformation
* enable surf for matchers stitching tests
* enable SURF
* rework sanity checking. Check estimated transform instead of matches. Est. transform should be more stable and comparable between SURF and ORB.
* remove regression checking for VectorFeatures tests. It has a lot if data andtest is the same as previous except it test different vector size for performance, so sanity checking does not add any value here. Added basic sanity asserts instead.
* stitching tests: allow relative error for transform
* allows .01 relative error for estimated homography sanity check in stitching matchers tests
* fix VS warning
stitching tests: increase relative error
increase relative error to make it pass on all platforms (results are still good).
stitching test: allow bigger relative error
transformation can differ in small values (with small absolute difference, but large relative difference). transformation output still looks usable for all platforms. This difference affects only mac and windows, linux passes fine with small difference.
* stitching: add tests for affine matcher
uses s1, s2 images. added also new sanity data.
* stitching tests: use different data for matchers tests
this data should yeild more stable transformation (it has much more matches, especially for surf). Sanity data regenerated.
* stitching test: rework tests for matchers
* separated rotation and translations as they are different by scale.
* use appropriate absolute error for them separately. (relative error does not work for values near zero.)
* stitching: fix affine warper compensation
calculation of rotation and translation extracted for plane warper was wrong
* stitching test: enable surf for opencl integration tests
* enable SURF with correct guard (HAVE_OPENCV_XFEATURES2D)
* add OPENCL guard and correct namespace as usual for opencl tests
* stitching: add ocl accuracy test for affine warper
test consistent results with ocl on and off
* stitching: add affine warper ocl perf test
add affine warper to existing warper perf tests. Added new sanity data.
* stitching: do not overwrite inliers in affine matcher
* estimation is run second time on inliers only, inliers produces in second run will not be therefore correct for all matches
* calib3d: add Levenberg–Marquardt refining to estimateAffine2D* functions
this adds affine Levenberg–Marquardt refining to estimateAffine2D functions similar to what is done in findHomography.
implements Levenberg–Marquardt refinig for both full affine and partial affine transformations.
* stitching: remove reestimation step in affine matcher
reestimation step is not needed. estimateAffine2D* functions are running their own reestimation on inliers using the Levenberg-Marquardt algorithm, which is better than simply rerunning RANSAC on inliers.
* implement partial affine bundle adjuster
bundle adjuster that expect affine transform with 4DOF. Refines parameters for all cameras together.
stitching: fix bug in BundleAdjusterAffinePartial
* use the invers properly
* use static buffer for invers to speed it up
* samples: add affine bundle adjuster option to stitching_detailed
* add support for using affine bundle adjuster with 4DOF
* improve logging of initial intristics
* sttiching: add affine bundle adjuster test
* fix build warnings
* stitching: increase limit on sanity check
prevents spurious test failures on mac. values are still pretty fine.
* stitching: set affine bundle adjuster for SCANS mode
* fix bug with AffineBestOf2NearestMatcher (we want to select affine partial mode)
* select right bundle adjuster
* stitching: increase error bound for matcher tests
* this prevents failure on mac. tranformation is still ok.
* stitching: implement affine bundle adjuster
* implements affine bundle adjuster that is using full affine transform
* existing test case modified to test both affinePartial an full affine bundle adjuster
* add stitching tutorial
* show basic usage of stitching api (Stitcher class)
* stitching: add more integration test for affine stitching
* added new datasets to existing testcase
* removed unused include
* calib3d: move `haveCollinearPoints` to common header
* added comment to make that this also checks too close points
* calib3d: redone checkSubset for estimateAffine* callback
* use common function to check collinearity
* this also ensures that point will not be too close to each other
* calib3d: change estimateAffine* functions API
* more similar to `findHomography`, `findFundamentalMat`, `findEssentialMat` and similar
* follows standard recommended semantic INPUTS, OUTPUTS, FLAGS
* allows to disable refining
* supported LMEDS robust method (tests yet to come) along with RANSAC
* extended docs with some tips
* calib3d: rewrite estimateAffine2D test
* rewrite in googletest style
* parametrize to test both robust methods (RANSAC and LMEDS)
* get rid of boilerplate
* calib3d: rework estimateAffinePartial2D test
* rework in googletest style
* add testing for LMEDS
* calib3d: rework estimateAffine*2D perf test
* test for LMEDS speed
* test with/without Levenberg-Marquart
* remove sanity checking (this is covered by accuracy tests)
* calib3d: improve estimateAffine*2D tests
* test transformations in loop
* improves test by testing more potential transformations
* calib3d: rewrite kernels for estimateAffine*2D functions
* use analytical solution instead of SVD
* this version is faster especially for smaller amount of points
* calib3d: tune up perf of estimateAffine*2D functions
* avoid copying inliers
* avoid converting input points if not necessary
* check only `from` point for collinearity, as `to` does not affect stability of transform
* tutorials: add commands examples to stitching tutorials
* add some examples how to run stitcher sample code
* mention stitching_detailed.cpp
* calib3d: change computeError for estimateAffine*2D
* do error computing in floats instead of doubles
this have required precision + we were storing the result in float anyway. This make code faster and allows auto-vectorization by smart compilers.
* documentation: mention estimateAffine*2D function
* refer to new functions on appropriate places
* prefer estimateAffine*2D over estimateRigidTransform
* stitching: add camera models documentations
* mention camera models in module documentation to give user a better overview and reduce confusion
* use __GNUC_MINOR__ in correct place to check the version of GCC
* check processor support of FP16 at run time
* check compiler support of FP16 and pass correct compiler option
* rely on ENABLE_AVX on gcc since AVX is generated when mf16c is passed
* guard correctly using ifdef in case of various configuration
* use v_float16x4 correctly by including the right header file
- calculate ticksTotal instead of ticksMean
- local / global width is based on ticksTotal value
- added instrumentation for OpenCL program compilation
- added instrumentation for OpenCL kernel execution
Minor fix in MatAllocator::upload
Minor fix in MatAllocator::copy
Minor fix in setSize function
Minor fix in Mat::Mat
Minor fix in cvMatNDToMat function
Minor fix in _InputArray::getMatVector
Minor fix in _InputArray::getUMatVector
Minor fix in cv::hconcat
Minor fix in cv::vconcat
Minor fix in cv::setIdentity
Minor fix in cv::trace
Minor fix in transposeI_ template function
Minor fix in reduceC_ template function
Minor fix in sort_ template function
Minor fix in sortIdx_ template function
Minor fix in cvRange function
Minor fix in MatConstIterator::seek
Minor fix in SparseMat::create
Minor fix in SparseMat::copyTo
Minor fix in SparseMat::convertTo
Minor fix in SparseMat::convertTo
Minor fix in SparseMat::ptr
Minor fix in SparseMat::resizeHashTab
Fixes indentation
* use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation
* define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed
* brush up implementation on old compiler (guard correctly)
* add test for v_load_f16 and round trip conversion of v_float16x4
* fix conversion error