mirror of
https://github.com/opencv/opencv.git
synced 2024-11-24 19:20:28 +08:00
2922738b6d
Gemm kernels for Intel GPU (#8104) * Fix an issue with Kernel object reset release when consecutive Kernel::run calls Kernel::run launch OCL gpu kernels and set a event callback function to decreate the ref count of UMat or remove UMat when the lauched workloads are completed. However, for some OCL kernels requires multiple call of Kernel::run function with some kernel parameter changes (e.g., input and output buffer offset) to get the final computation result. In the case, the current implementation requires unnecessary synchronization and cleanupMat. This fix requires the user to specify whether there will be more work or not. If there is no remaining computation, the Kernel::run will reset the kernel object Signed-off-by: Woo, Insoo <insoo.woo@intel.com> * GEMM kernel optimization for Intel GEN The optimized kernels uses cl_intel_subgroups extension for better performance. Note: This optimized kernels will be part of ISAAC in a code generation way under MIT license. Signed-off-by: Woo, Insoo <insoo.woo@intel.com> * Fix API compatibility error This patch fixes a OCV API compatibility error. The error was reported due to the interface changes of Kernel::run. To resolve the issue, An overloaded function of Kernel::run is added. It take a flag indicating whether there are more work to be done with the kernel object without releasing resources related to it. Signed-off-by: Woo, Insoo <insoo.woo@intel.com> * Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp Signed-off-by: Woo, Insoo <insoo.woo@intel.com> * Revert "Fix API compatibility error" This reverts commit |
||
---|---|---|
.. | ||
calib3d | ||
core | ||
cudaarithm | ||
cudabgsegm | ||
cudacodec | ||
cudafeatures2d | ||
cudafilters | ||
cudaimgproc | ||
cudalegacy | ||
cudaobjdetect | ||
cudaoptflow | ||
cudastereo | ||
cudawarping | ||
cudev | ||
features2d | ||
flann | ||
highgui | ||
imgcodecs | ||
imgproc | ||
java | ||
ml | ||
objdetect | ||
photo | ||
python | ||
shape | ||
stitching | ||
superres | ||
ts | ||
video | ||
videoio | ||
videostab | ||
viz | ||
world | ||
CMakeLists.txt |