opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-15 18:09:11 +08:00

Author	SHA1	Message	Date
Yuantao Feng	3afe8ddaf8	core: Rename `cv::float16_t` to `cv::hfloat` (#25217 ) * rename cv::float16_t to cv::fp16_t * add typedef fp16_t float16_t * remove zero(), bits() from fp16_t class * fp16_t -> hfloat * remove cv::float16_t::fromBits; add hfloatFromBits * undo changes in conv_winograd_f63.simd.hpp and conv_block.simd.hpp * undo some changes in dnn	2024-03-21 23:44:19 +03:00
Alexander Alekhin	40533dbf69	Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918 relates #24909 relates #24917 relates #24892 Performance changes: - [x] 12700K (1 thread) + Intel iGPU \|Name of Test\|noOCL\|convertFp16\|convertTo BASE\|convertTo PATCH\| \|---\|:-:\|:-:\|:-:\|:-:\| \|ConvertFP16FP32MatMat::OCL_Core\|3.130\|3.152\|3.127\|3.136\| \|ConvertFP16FP32MatUMat::OCL_Core\|3.030\|3.996\|3.007\|2.671\| \|ConvertFP16FP32UMatMat::OCL_Core\|3.010\|3.101\|3.056\|2.854\| \|ConvertFP16FP32UMatUMat::OCL_Core\|3.016\|3.298\|2.072\|2.061\| \|ConvertFP32FP16MatMat::OCL_Core\|2.697\|2.652\|2.723\|2.721\| \|ConvertFP32FP16MatUMat::OCL_Core\|2.752\|4.268\|2.662\|2.947\| \|ConvertFP32FP16UMatMat::OCL_Core\|2.706\|2.601\|2.603\|2.528\| \|ConvertFP32FP16UMatUMat::OCL_Core\|2.704\|3.215\|1.999\|1.988\| Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map). There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization). - [x] 12700K + AMD dGPU \|Name of Test\|noOCL\|convertFp16 dGPU\|convertTo BASE dGPU\|convertTo PATCH dGPU\| \|---\|:-:\|:-:\|:-:\|:-:\| \|ConvertFP16FP32MatMat::OCL_Core\|3.130\|3.133\|3.172\|3.087\| \|ConvertFP16FP32MatUMat::OCL_Core\|3.030\|1.713\|9.559\|1.729\| \|ConvertFP16FP32UMatMat::OCL_Core\|3.010\|6.515\|6.309\|4.452\| \|ConvertFP16FP32UMatUMat::OCL_Core\|3.016\|0.242\|23.597\|0.170\| \|ConvertFP32FP16MatMat::OCL_Core\|2.697\|2.641\|2.713\|2.689\| \|ConvertFP32FP16MatUMat::OCL_Core\|2.752\|4.076\|6.483\|4.191\| \|ConvertFP32FP16UMatMat::OCL_Core\|2.706\|9.042\|16.481\|1.834\| \|ConvertFP32FP16UMatUMat::OCL_Core\|2.704\|0.229\|15.730\|0.176\| convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED. dGPU has much more power, so results are x16-17 better than single cpu core. Patched version is not worse than convertFp16 and convertTo baseline. There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers. Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>	2024-01-26 12:56:52 +03:00
Alexander Alekhin	624d532000	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2020-12-17 21:05:34 +00:00
Alexander Alekhin	c240355cc6	dnn(ocl): avoid mess FP16/FP32 in convolution layer	2020-12-15 08:51:24 +00:00
Alexander Alekhin	f8791f072d	core: avoid function type cast, make happy UBSAN backporting of commit: `d3d13c41c4`	2019-06-11 19:36:47 +00:00
Alexander Alekhin	d3d13c41c4	core: avoid function type cast, make happy UBSAN oss-fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14115	2019-06-11 07:06:29 +00:00
Alexander Alekhin	dfef04b325	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2019-02-12 17:54:40 +03:00
Alexander Alekhin	39b90ae9fb	core: dispatch convert	2019-02-08 18:32:10 +03:00
Alexander Alekhin	5527c41468	core: clone convert.dispatch.cpp	2019-02-08 16:29:16 +03:00

9 Commits