opencv/modules/core
Alexander Alekhin 40533dbf69
Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918

relates #24909
relates #24917
relates #24892

Performance changes:

- [x] 12700K (1 thread) + Intel iGPU

|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).


- [x] 12700K + AMD dGPU

|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core. 
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 12:56:52 +03:00
..
3rdparty/SoftFloat Add install component for 3rdparty libraries licenses 2018-03-06 16:32:30 +03:00
cmake/parallel core(parallel): plugins support 2021-02-15 17:07:36 +00:00
doc Merge pull request #23342 from n0099:#23335 2023-05-03 14:15:53 +03:00
include/opencv2 Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement 2024-01-26 12:56:52 +03:00
misc Merge pull request #24454 from komakai:refactorObjcRange 2023-10-27 14:31:41 +03:00
perf Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement 2024-01-26 12:56:52 +03:00
src Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement 2024-01-26 12:56:52 +03:00
test Merge pull request #23736 from seanm:c++11-simplifications 2024-01-19 16:53:08 +03:00
CMakeLists.txt build: first class cuda support 2023-12-26 09:39:18 +03:00