opencv

mirror of https://github.com/opencv/opencv.git synced 2025-08-06 14:36:36 +08:00

History

Alexander Alekhin 40533dbf69 Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918 relates #24909 relates #24917 relates #24892 Performance changes: - [x] 12700K (1 thread) + Intel iGPU \|Name of Test\|noOCL\|convertFp16\|convertTo BASE\|convertTo PATCH\| \|---\|:-:\|:-:\|:-:\|:-:\| \|ConvertFP16FP32MatMat::OCL_Core\|3.130\|3.152\|3.127\|3.136\| \|ConvertFP16FP32MatUMat::OCL_Core\|3.030\|3.996\|3.007\|2.671\| \|ConvertFP16FP32UMatMat::OCL_Core\|3.010\|3.101\|3.056\|2.854\| \|ConvertFP16FP32UMatUMat::OCL_Core\|3.016\|3.298\|2.072\|2.061\| \|ConvertFP32FP16MatMat::OCL_Core\|2.697\|2.652\|2.723\|2.721\| \|ConvertFP32FP16MatUMat::OCL_Core\|2.752\|4.268\|2.662\|2.947\| \|ConvertFP32FP16UMatMat::OCL_Core\|2.706\|2.601\|2.603\|2.528\| \|ConvertFP32FP16UMatUMat::OCL_Core\|2.704\|3.215\|1.999\|1.988\| Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map). There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization). - [x] 12700K + AMD dGPU \|Name of Test\|noOCL\|convertFp16 dGPU\|convertTo BASE dGPU\|convertTo PATCH dGPU\| \|---\|:-:\|:-:\|:-:\|:-:\| \|ConvertFP16FP32MatMat::OCL_Core\|3.130\|3.133\|3.172\|3.087\| \|ConvertFP16FP32MatUMat::OCL_Core\|3.030\|1.713\|9.559\|1.729\| \|ConvertFP16FP32UMatMat::OCL_Core\|3.010\|6.515\|6.309\|4.452\| \|ConvertFP16FP32UMatUMat::OCL_Core\|3.016\|0.242\|23.597\|0.170\| \|ConvertFP32FP16MatMat::OCL_Core\|2.697\|2.641\|2.713\|2.689\| \|ConvertFP32FP16MatUMat::OCL_Core\|2.752\|4.076\|6.483\|4.191\| \|ConvertFP32FP16UMatMat::OCL_Core\|2.706\|9.042\|16.481\|1.834\| \|ConvertFP32FP16UMatUMat::OCL_Core\|2.704\|0.229\|15.730\|0.176\| convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED. dGPU has much more power, so results are x16-17 better than single cpu core. Patched version is not worse than convertFp16 and convertTo baseline. There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers. Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>		2024-01-26 12:56:52 +03:00
..
calib3d	Added exception warning to calibrateCamera description.	2023-12-26 09:23:11 +03:00
core	Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement	2024-01-26 12:56:52 +03:00
dnn	Merge pull request #23736 from seanm:c++11-simplifications	2024-01-19 16:53:08 +03:00
features2d	Added Java bindings for BOWImgDescriptorExtractor constructor.	2023-10-31 11:23:47 +03:00
flann	Merge pull request #23109 from seanm:misc-warnings	2023-10-06 13:33:21 +03:00
gapi	Ifdef OpenVINO API 1.0 usage in G-API module	2024-01-17 13:28:53 +00:00
highgui	fix highgui qt's statusbar text got cropped	2024-01-07 06:32:29 -05:00
imgcodecs	Merge pull request #24875 from tailsu:sd/jpeg-turbo-color-extensions	2024-01-23 14:32:56 +03:00
imgproc	Merge pull request #24750 from YusukeKameda:4.x	2024-01-18 15:06:36 +03:00
java	Merge pull request #24869 from alexlyulkov:al/android-camera-view-rotate	2024-01-17 21:35:35 +03:00
js	Merge pull request #24458 from laolaolulu:4.x	2023-11-13 14:51:20 +03:00
ml	Merge pull request #23109 from seanm:misc-warnings	2023-10-06 13:33:21 +03:00
objc	Merge pull request #24136 from komakai:visionos_support	2023-12-20 15:35:10 +03:00
objdetect	Merge pull request #24873 from AleksandrPanov:fix_charuco_board	2024-01-23 15:33:56 +03:00
photo	Merge pull request #23109 from seanm:misc-warnings	2023-10-06 13:33:21 +03:00
python	Merge pull request #23736 from seanm:c++11-simplifications	2024-01-19 16:53:08 +03:00
stitching	Merge pull request #23736 from seanm:c++11-simplifications	2024-01-19 16:53:08 +03:00
ts	Merge pull request #23736 from seanm:c++11-simplifications	2024-01-19 16:53:08 +03:00
video	Merge pull request #24852 from Octopus136:4.x	2024-01-17 10:20:03 +03:00
videoio	Merge pull request #23736 from seanm:c++11-simplifications	2024-01-19 16:53:08 +03:00
world	cmake: use /INCREMENTAL:NO with MSVS 2015	2023-12-07 19:46:27 +00:00