Commit Graph

7 Commits

Author SHA1 Message Date
Alexander Alekhin
40533dbf69
Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918

relates #24909
relates #24917
relates #24892

Performance changes:

- [x] 12700K (1 thread) + Intel iGPU

|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).


- [x] 12700K + AMD dGPU

|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core. 
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 12:56:52 +03:00
Mikhail Nikolskii
a604d44d06
Merge pull request #19755 from mikhail-nikolskiy:ffmpeg-umat
cv::UMat output/input in VideoCapture/VideoWriter (data stays in GPU memory)

* FFMPEG with UMat input/output

* OpenCL_D3D* context

* fix Linux build

* cosmetic changes

* fix build if USE_AV_HW_CODECS=0

* simplify how child context pointer stored in parent context

* QSV interop with OpenCL on Windows

* detect_msdk.cmake via pkg-config

* fix av_buffer_ref() usage

* revert windows-decode-mfx whitelisting; remove debug msg

* address review comments

* rename property to HW_ACCELERATION_USE_OPENCL

* fix issue with "cl_khr_d3d11_sharing" extension not reported by OpenCL GPU+CPU platform

* core(ocl): add OpenCL stubs for configurations without OpenCL

* videoio(ffmpeg): update #if guards

* Put OpenCL related code under HAVE_OPENCL; simplify reuse of media context from OpenCL context

* videoio(test): skip unsupported tests

- plugins don't support OpenCL/UMat yet
- change handling of *_USE_OPENCL flag

* videoio(ffmpeg): OpenCL dependency

* videoio(ffmpeg): MediaSDK/oneVPL dependency

* cleanup, logging

* cmake: fix handling of 3rdparty interface targets

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2021-05-14 16:48:50 +00:00
Alexander Alekhin
1d0334fc07 Merge pull request #19584 from diablodale:fix19573_ocl_move 2021-02-21 19:20:03 +00:00
Dale Phurrough
96a15434a2
add move construct/assigns to cv::ocl main classes
- enables inline construct and assigns with r-values
- enables compiler-created default move
  construct/assigns
- includes test cases
2021-02-20 18:56:04 +01:00
Dale Phurrough
77e26a7db3
add noexcept to default constructors of cv::ocl
- follows iso c++ guideline C.44
- enables default compiler-created constructors to
  also be noexcept
2021-02-20 14:16:47 +01:00
Alexander Alekhin
2129c72bc0 core(OpenCL): thread-local OpenCL execution context 2020-09-02 05:04:20 +00:00
Alexander Alekhin
efcf307b4c ocl: cleanup dead code in case of disabled OpenCL 2020-08-31 11:30:42 +00:00