opencv/modules/core/src/opencl
Alexander Alekhin 40533dbf69
Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918

relates #24909
relates #24917
relates #24892

Performance changes:

- [x] 12700K (1 thread) + Intel iGPU

|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).


- [x] 12700K + AMD dGPU

|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core. 
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 12:56:52 +03:00
..
runtime Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-12-03 12:32:49 +00:00
arithm.cl core: follow IEEE 754 rules for floating-point division 2018-10-14 10:47:50 +00:00
convert.cl Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement 2024-01-26 12:56:52 +03:00
copymakeborder.cl optimized cv::copyMakeBorder 2014-06-02 15:46:44 +04:00
copyset.cl Merge pull request #2810 from ilya-lavrenov:tapi_copytomask 2014-06-04 12:23:36 +04:00
cvtclr_dx.cl Merge pull request #19783 from mikhail-nikolskiy:interop-perf 2021-03-25 21:27:31 +00:00
fft.cl core(ocl): fix fft kernel compilation 2019-09-03 15:46:53 +03:00
flip.cl Merge pull request #16608 from vpisarev:fix_mac_ocl_tests 2020-02-21 16:13:41 +03:00
gemm.cl use LOCAL_SIZE+1 2014-10-28 15:18:31 +03:00
halfconvert.cl dnn(ocl): avoid mess FP16/FP32 in convolution layer 2020-12-15 08:51:24 +00:00
inrange.cl OCL: fix incompatibility with Mali ruintime 2023-12-21 00:30:44 +03:00
intel_gemm.cl core(ocl): buffer bounds in intelblas_gemm_buffer_NT 2021-09-10 12:10:41 +00:00
lut.cl minor optimization of cv::LUT 2014-07-02 18:50:21 +04:00
meanstddev.cl Remove mul24 since id can be larger 2^23 2014-09-08 13:11:58 +04:00
minmaxloc.cl ocl: fix unaligned memory access 2015-07-06 13:58:17 +03:00
mixchannels.cl other kernels now use row scheme 2014-05-26 12:19:06 +03:00
mulspectrums.cl other kernels now use row scheme 2014-05-26 12:19:06 +03:00
normalize.cl optimized cv::normalize in case of mask 2014-06-02 15:33:19 +04:00
reduce2.cl Merge pull request #13879 from chacha21:REDUCE_SUM2 2023-04-28 20:42:52 +03:00
reduce.cl Remove mul24 since id can be larger 2^23 2014-09-08 13:11:58 +04:00
repeat.cl optimized cv::repeat 2014-05-23 13:16:27 +03:00
set_identity.cl optimized cv::setIdentity 2014-06-16 13:41:43 +04:00
split_merge.cl other kernels now use row scheme 2014-05-26 12:19:06 +03:00
transpose.cl attempt to fix compilation of OpenCL cv::transpose for AMD 2014-08-29 16:59:30 +04:00