opencv/modules
Alexander Alekhin 40533dbf69
Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918

relates #24909
relates #24917
relates #24892

Performance changes:

- [x] 12700K (1 thread) + Intel iGPU

|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).


- [x] 12700K + AMD dGPU

|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core. 
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 12:56:52 +03:00
..
calib3d Added exception warning to calibrateCamera description. 2023-12-26 09:23:11 +03:00
core Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement 2024-01-26 12:56:52 +03:00
dnn Merge pull request #23736 from seanm:c++11-simplifications 2024-01-19 16:53:08 +03:00
features2d Added Java bindings for BOWImgDescriptorExtractor constructor. 2023-10-31 11:23:47 +03:00
flann Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
gapi Ifdef OpenVINO API 1.0 usage in G-API module 2024-01-17 13:28:53 +00:00
highgui fix highgui qt's statusbar text got cropped 2024-01-07 06:32:29 -05:00
imgcodecs Merge pull request #24875 from tailsu:sd/jpeg-turbo-color-extensions 2024-01-23 14:32:56 +03:00
imgproc Merge pull request #24750 from YusukeKameda:4.x 2024-01-18 15:06:36 +03:00
java Merge pull request #24869 from alexlyulkov:al/android-camera-view-rotate 2024-01-17 21:35:35 +03:00
js Merge pull request #24458 from laolaolulu:4.x 2023-11-13 14:51:20 +03:00
ml Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
objc Merge pull request #24136 from komakai:visionos_support 2023-12-20 15:35:10 +03:00
objdetect Merge pull request #24873 from AleksandrPanov:fix_charuco_board 2024-01-23 15:33:56 +03:00
photo Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
python Merge pull request #23736 from seanm:c++11-simplifications 2024-01-19 16:53:08 +03:00
stitching Merge pull request #23736 from seanm:c++11-simplifications 2024-01-19 16:53:08 +03:00
ts Merge pull request #23736 from seanm:c++11-simplifications 2024-01-19 16:53:08 +03:00
video Merge pull request #24852 from Octopus136:4.x 2024-01-17 10:20:03 +03:00
videoio Merge pull request #23736 from seanm:c++11-simplifications 2024-01-19 16:53:08 +03:00
world cmake: use /INCREMENTAL:NO with MSVS 2015 2023-12-07 19:46:27 +00:00