Alexander Alekhin
efc9837df1
Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage
...
DNN: avoid CV_16S usage for FP16 #24892
**Merge after**: #24918
TODO:
- [x] measure performance changes
- [x] optimize convertTo for OpenCL: #24918
12700K iGPU:
|Name of Test|0|1|1 vs 0 (x-factor)|
|---|:-:|:-:|:-:|
|AlexNet::DNNTestNetwork::OCV/OCL_FP16|7.441|7.480|0.99|
|CRNN::DNNTestNetwork::OCV/OCL_FP16|10.776|10.736|1.00|
|DenseNet_121::DNNTestNetwork::OCV/OCL_FP16|52.762|52.833|1.00|
|EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16|60.694|60.721|1.00|
|EfficientNet::DNNTestNetwork::OCV/OCL_FP16|33.373|33.173|1.01|
|FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16|81.840|81.724|1.00|
|GoogLeNet::DNNTestNetwork::OCV/OCL_FP16|20.965|20.927|1.00|
|Inception_5h::DNNTestNetwork::OCV/OCL_FP16|22.204|22.173|1.00|
|Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|47.115|47.460|0.99|
|MPHand::DNNTestNetwork::OCV/OCL_FP16|6.760|6.670|1.01|
|MPPalm::DNNTestNetwork::OCV/OCL_FP16|10.188|10.171|1.00|
|MPPose::DNNTestNetwork::OCV/OCL_FP16|12.510|12.561|1.00|
|MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16|17.290|17.072|1.01|
|MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|19.473|19.306|1.01|
|MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|22.874|23.404|0.98|
|OpenFace::DNNTestNetwork::OCV/OCL_FP16|9.568|9.517|1.01|
|OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16|539.899|539.845|1.00|
|PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16|18.015|18.769|0.96|
|PPOCRv3::DNNTestNetwork::OCV/OCL_FP16|63.122|63.540|0.99|
|ResNet_50::DNNTestNetwork::OCV/OCL_FP16|34.947|34.925|1.00|
|SFace::DNNTestNetwork::OCV/OCL_FP16|10.249|10.206|1.00|
|SSD::DNNTestNetwork::OCV/OCL_FP16|213.068|213.108|1.00|
|SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16|4.867|4.878|1.00|
|VIT_B_32::DNNTestNetwork::OCV/OCL_FP16|200.563|190.788|1.05|
|VitTrack::DNNTestNetwork::OCV/OCL_FP16|7.528|7.173|1.05|
|YOLOX::DNNTestNetwork::OCV/OCL_FP16|132.858|132.701|1.00|
|YOLOv3::DNNTestNetwork::OCV/OCL_FP16|209.559|208.809|1.00|
|YOLOv4::DNNTestNetwork::OCV/OCL_FP16|221.357|220.924|1.00|
|YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16|24.446|24.382|1.00|
|YOLOv5::DNNTestNetwork::OCV/OCL_FP16|43.922|44.080|1.00|
|YOLOv8::DNNTestNetwork::OCV/OCL_FP16|64.159|63.842|1.00|
|YuNet::DNNTestNetwork::OCV/OCL_FP16|10.177|10.231|0.99|
|opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16|15.121|15.445|0.98|
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 16:34:17 +03:00
fengyuentau
83acb656f1
integrate bias handling in ocl kernel
2024-01-11 11:15:17 +08:00
Alexander Alekhin
99c94d3d83
dnn(ocl): don't try KERNEL_TYPE_GEMM_LIKE with kernel_w > 16
...
- OpenCL kernel code doesn't support that
2023-12-21 13:30:57 +00:00
fengyuentau
0cdff46725
tune for opencl
2022-08-14 17:47:48 +08:00
Alexander Alekhin
8b4fa2605e
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-12-03 12:32:49 +00:00
yuki takehara
a6277370ca
Merge pull request #21107 from take1014:remove_assert_21038
...
resolves #21038
* remove C assert
* revert C header
* fix several points in review
* fix test_ds.cpp
2021-11-27 18:34:52 +00:00
Alexander Alekhin
cca4c47781
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-10-08 11:05:45 +00:00
Alexander Alekhin
81e7988eb9
Merge pull request #20840 from alalek:dnn_ocl_cleanup_code
2021-10-08 05:07:51 +00:00
Alexander Alekhin
8c2dd5fb9a
dnn(ocl4dnn): cleanup dead code, improve logging
2021-10-08 00:39:40 +00:00
Alexander Alekhin
724e04e979
dnn(ocl4dnn): add extra checks to convolution layer
...
- prevent running code over unsupported/non-tested configurations
- prevent integer div by zero
2021-10-07 23:18:32 +00:00
Alexander Alekhin
37c3f0d8a0
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-10-02 17:57:18 +00:00
Alexander Alekhin
f977d10a19
dnn(ocl): fix conv DWCONV workgroup
2021-10-01 18:52:07 +00:00
Alexander Alekhin
846317ef37
dnn(ocl): fix conv BASIC workgroup
2021-09-29 14:55:46 +00:00
Alexander Alekhin
c3ac834526
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-09-11 21:27:26 +00:00
Alexander Alekhin
35e824c287
dnn(ocl): fix out of bound access in GEMM-like kernels
...
- dropped usage of CreateSubBuffer() - buffers lifetime management issue
- fixed elementwise offset
- avoid out of bounds read access
2021-09-06 18:17:21 +00:00
Alexander Alekhin
5578ad5e14
dnn(ocl): fix automatic globalsize adjusting
...
- if kernel code doesn't support that
2021-09-06 03:11:29 +00:00
Alexander Alekhin
407adc7061
dnn(ocl): fix buffer offsets in IDLF kernel
...
- drop CreateSubBuffer
- fix FUSED_CONV_ELTWISE mode
2021-09-04 15:28:35 +00:00
Alexander Alekhin
5aa7435d25
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-09-02 15:24:04 +00:00
Alexander Alekhin
ae6fabc6fe
dnn(ocl): drop CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE check
...
- it is a hint and it should not block kernel execution
2021-08-30 20:40:14 +00:00
Alexander Alekhin
6fbfc58602
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-08-21 17:25:18 +00:00
Alexander Alekhin
f28e4b86fb
dnn(ocl): fix top initialization in verifyResult
2021-08-21 16:04:13 +00:00
Alexander Alekhin
35eaacd1db
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-03-27 15:35:16 +00:00
Alexander Alekhin
86d0a86141
dnn(ocl): fix gemm kernel scheduling
2021-03-26 00:35:00 +00:00
Alexander Alekhin
624d532000
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-12-17 21:05:34 +00:00
Alexander Alekhin
7631056b8a
Merge pull request #19114 from alalek:issue_18937
2020-12-15 20:47:05 +00:00
Alexander Alekhin
c240355cc6
dnn(ocl): avoid mess FP16/FP32 in convolution layer
2020-12-15 08:51:24 +00:00
Alexander Alekhin
4b3d2c8834
dnn(ocl): fix gemm kernels with beta=0
...
- dst is not initialized, may include NaN values
- 0*NaN produces NaN
2020-12-15 00:58:43 +00:00
Alexander Alekhin
2155296a13
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-11-27 14:08:06 +00:00
Sergei Slashchinin
f4f462c50b
Merge pull request #18862 from sl-sergei:support_pool1d
...
Support for Pool1d layer for OpenCV and OpenCL targets
* Initial version of Pool1d support
* Fix variable naming
* Fix 1d pooling for OpenCL
* Change support logic, remove unnecessary variable, split the tests
* Remove other depricated variables
* Fix warning. Check tests
* Change support check logic
* Change support check logic, 2
2020-11-24 16:52:45 +00:00
Alexander Alekhin
295afd5882
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-09-28 21:33:29 +00:00
Alexander Alekhin
c08f29c803
dnn(opencl): fix convolution kernel w/o bias with activation
2020-09-27 23:42:30 +00:00
Alexander Alekhin
f52a2cf5e1
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-09-19 17:03:08 +00:00
Alexander Alekhin
4fa82809df
ocl: avoid rescheduling of async kernels
2020-09-18 14:53:50 +00:00
Alexander Alekhin
1f2c83845d
backport: checks and fixes from static code analyzers results
...
original commit: 71f665bd8c
2020-09-02 19:05:47 +00:00
Alexander Alekhin
71f665bd8c
checks and fixes from static code analyzers results
2020-09-02 21:59:34 +03:00
luz.paz
fcc7d8dd4e
Fix modules/ typos
...
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`
backporting of commit: ec43292e1e
2019-08-16 17:34:29 +03:00
luz.paz
ec43292e1e
Fix modules/ typos
...
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`
2019-08-15 18:02:09 -04:00
Alexander Alekhin
332c37f332
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-03-06 11:43:16 +03:00
Alexander Alekhin
80d37ba698
dnn: fix usage of CV_LOG_VERBOSE macro
2019-03-02 14:49:21 +00:00
Alexander Alekhin
22dbcf98c5
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2018-11-17 14:17:35 +00:00
Alexander Alekhin
96c71dd3d2
dnn: reduce set of ignored warnings
2018-11-15 13:15:59 +03:00
Alexander Alekhin
a8b0db4e5d
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2018-09-28 14:14:47 +03:00
Alexander Alekhin
fae329a0ca
Merge pull request #12650 from alalek:dnn_ocl4dnn_verification_test
...
* dnn(ocl4dnn): update kernel checks
* dnn: workaround for IDLF kernels on Intel iGPU
* dnn(test): remove "skip" check for unstable cases
2018-09-27 12:54:23 +03:00
Dmitry Kurtaev
24ab751547
Merge pull request #12565 from dkurt:dnn_non_intel_gpu
...
* Remove isIntel check from deep learning layers
* Remove fp16->fp32 fallbacks where it's not necessary
* Fix Kernel::run to prevent localsize > globalsize
2018-09-26 16:27:00 +03:00
Alexander Alekhin
e6171d17f8
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2018-09-18 12:49:52 +03:00
Lubov Batanina
43f889ae1f
Merge pull request #12519 from l-bat:l-bat/onnx_parser
...
Support asymmetric padding in pooling layer (#12519 )
* Add Inception_V1 support in ONNX
* Add asymmetric padding in OpenCL and Inference engine
* Refactoring
2018-09-17 20:26:17 +03:00
Alexander Alekhin
df8b057b44
avoid Ptr<> == NULL
checks
2018-09-09 19:30:46 +00:00
Dmitry Kurtaev
faa6c4e1e1
Faster-RCNN anf RFCN models on CPU using Intel's Inference Engine backend.
...
Enable Torch layers tests with Intel's Inference Engine backend.
2018-07-25 19:04:55 +03:00
Alexander Alekhin
ee743afebe
dnn(ocl): don't use getUMat() for long live objects
2018-07-20 17:53:55 +03:00
Alexander Alekhin
78d07e841d
Merge pull request #11959 from pengli:3.4
2018-07-17 11:20:02 +00:00