Commit Graph

1098 Commits

Author SHA1 Message Date
Dmitry Kurtaev
af14780526 Fix Region layer with OpenVINO in case of different width/height 2023-05-18 17:45:30 +03:00
Abduragim Shtanchaev
d2143bcd44
Merge pull request #23614 from Abdurrahheem:lstm_layout_attribute
LSTM ONNX Layout Attribute Support #23614 

### Explanation

This PR contains necessary changes to support `layout` attribute. This attributes is present in [ONNX](https://github.com/onnx/onnx/blob/main/docs/Operators.md#lstm) and [Torch](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#lstm) (in touch it is name as `batch_first=True`) libraries. When `layout = 1` input to LSTM layer is expected to have batch dimension first -> `[batch_size, sequence_length, features]` vs `layout = 0` - default `[sequence_length, batch_size, features]`

### Test Data

Test data and data generator for PR located here [#1063](https://github.com/opencv/opencv_extra/pull/1063)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-05-17 22:46:56 +03:00
Zihao Mu
5229312ad2
Merge pull request #22275 from zihaomu:fp16_support_conv
DNN: FP16 support on Convolution 2D #22275 

## FP16 support on ARM platform
This PR proposes to support FP16 backend in Convolution.
For now, we only support FP16 at ARM aarch64.

In addition to adding fp16, I also added `seperateIm2col` optimization in this patch.

## How to use FP16 to speed up convolution?
```
Net net = readNet(modelPath);
net.setPreferableTarget(DNN_TARGET_CPU_FP16);
net.setInput(blob);
Mat output = net.forward();
```

### TODO List
| Task | Status | Remarks |
|:-------:|:--------:|:------------:|
| Convolution 2D FP16 | ✔️ | Done |
| Winograd FP16 | Because the current modification has reached 2k lines, winograd fp16 will be completed in the next PR. |  |
| Accuracy Test | ✔️ | Done |
| Performance Test | ✔️ | Done |
| Compiler bug | ✔️ | Done |

### Speed Test for FP 16.

**Test on M1 chip, 4 threads.**

| Model Name | FP32 (Conv+Wino) | Conv(FP16) + Wino(FP 32) |
|:-------:|:--------:|:------------:|
| ReseNet 50 | 26.0 ms | **18.05 ms** (25% speed up)|
| MobileNet V2 | 4.17 ms | **3.09 ms (29% speed up)** |

### Speed Test for `seperateIm2col` trick on X86.
**Test on AMD 5600x, 12 threads.**
| Model Name | 4.x | Patch |
|:-------:|:--------:|:------------:|
| MobileNet V2 | 5.6 ms | **3.0 ms (46% speed up)** |

### Performance Test

#### Performance Test of X86 platform: AMD 5600X, with `-perf_threas=1`
|Name of Test|4.x|patch|patch vs 4.x (x-factor)|
|---|:-:|:-:|:-:|
|Name of Test|4.x 0|fp16pr final|fp16pr final vs 4.x 0 (x-factor)|
|---|:-:|:-:|:-:|
|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)|0.001|0.001|1.00|
|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)|0.001|0.001|1.03|
|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)|0.001|0.001|0.92|
|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)|0.002|0.003|0.95|
|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)|0.006|0.006|1.00|
|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)|0.045|0.033|1.39|
|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)|0.011|0.009|1.17|
|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)|0.109|0.078|1.39|
|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)|0.040|0.042|0.94|
|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)|0.326|0.342|0.95|
|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)|0.580|0.589|0.99|
|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)|1.293|1.382|0.94|
|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)|3.590|3.710|0.97|
|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)|1.120|1.191|0.94|
|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)|2.576|2.872|0.90|
|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)|4.599|4.670|0.98|
|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)|9.230|9.582|0.96|
|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)|65.946|69.381|0.95|
|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)|18.915|19.289|0.98|
|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)|1.404|1.457|0.96|
|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)|2.060|1.501|1.37|
|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)|1.409|1.464|0.96|
|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)|1.793|1.838|0.98|
|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)|1.207|1.199|1.01|
|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)|1.277|1.275|1.00|
|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)|2.319|2.370|0.98|
|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)|1.351|1.346|1.00|
|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)|3.520|3.612|0.97|
|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)|1.876|1.880|1.00|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)|1.981|1.995|0.99|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)|2.620|2.627|1.00|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)|4.202|4.123|1.02|
|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)|2.429|2.445|0.99|
|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|2.591|2.576|1.01|
|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)|3.005|2.998|1.00|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)|3.515|3.532|1.00|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)|3.115|3.134|0.99|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)|3.937|3.899|1.01|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)|5.533|5.471|1.01|
|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)|3.472|3.464|1.00|
|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)|4.302|4.322|1.00|
|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)|6.100|6.035|1.01|
|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)|6.580|6.484|1.01|
|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)|9.741|9.634|1.01|
|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)|10.131|10.156|1.00|
|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)|12.391|12.350|1.00|
|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)|91.074|87.893|1.04|
|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)|5.903|5.903|1.00|
|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)|6.890|6.794|1.01|
|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)|5.160|5.131|1.01|
|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)|4.970|5.036|0.99|
|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|5.045|5.015|1.01|
|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)|11.583|11.343|1.02|
|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)|5.348|5.320|1.01|
|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)|5.357|5.396|0.99|
|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)|6.050|6.006|1.01|
|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)|5.952|5.953|1.00|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)|8.014|8.014|1.00|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)|12.472|12.577|0.99|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)|10.803|10.655|1.01|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)|18.429|13.405|1.37|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)|6.659|6.647|1.00|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)|14.192|13.819|1.03|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)|6.045|6.068|1.00|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)|12.742|12.828|0.99|
|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)|8.046|7.773|1.04|
|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)|17.440|17.192|1.01|
|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)|15.418|14.972|1.03|
|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)|0.430|0.430|1.00|
|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)|6.692|6.663|1.00|
|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)|6.350|6.347|1.00|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)|0.267|0.265|1.01|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)|7.755|7.558|1.03|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)|0.203|0.202|1.00|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|10.663|10.576|1.01|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)|10.827|10.614|1.02|
|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)|7.049|6.947|1.01|
|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)|6.900|6.901|1.00|
|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)|0.165|0.165|1.00|
|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)|17.953|17.251|1.04|
|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)|7.430|7.320|1.01|
|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)|22.187|21.705|1.02|
|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)|8.349|8.126|1.03|
|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)|8.273|8.297|1.00|
|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)|8.169|8.094|1.01|
|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)|13.602|13.359|1.02|
|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)|8.633|8.584|1.01|
|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)|29.339|28.897|1.02|
|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|13.000|12.920|1.01|
|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|14.262|13.319|1.07|
|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)|27.453|27.253|1.01|
|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)|32.052|27.269|1.18|
|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|15.363|15.208|1.01|
|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)|18.543|18.434|1.01|
|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)|39.114|37.954|1.03|
|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)|36.271|36.972|0.98|
|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|19.262|19.427|0.99|
|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|19.298|19.349|1.00|
|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|20.261|19.847|1.02|
|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|21.867|21.525|1.02|
|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)|51.756|49.979|1.04|
|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)|28.133|27.060|1.04|
|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)|25.035|24.980|1.00|
|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|25.858|25.821|1.00|
|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|27.313|27.149|1.01|
|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)|28.219|28.111|1.00|
|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)|46.025|46.674|0.99|
|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|30.220|29.446|1.03|
|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)|49.410|48.708|1.01|
|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|38.203|38.001|1.01|
|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|39.961|39.021|1.02|
|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|48.685|47.075|1.03|
|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)|75.114|72.586|1.03|
|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)|41.222|41.144|1.00|
|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)|46.220|46.353|1.00|
|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)|98.201|98.771|0.99|
|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)|100.106|96.971|1.03|
|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)|146.977|140.445|1.05|
|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)|198.618|194.665|1.02|


#### Performance Test of ARM platform: apple M1, with `-perf_threas=1`

Min (ms)

|Name of Test|4.x|patch|4.x vs patch (x-factor)|
|---|:-:|:-:|:-:|
|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)|0.001|0.001|1.07|
|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)|0.001|0.001|1.10|
|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)|0.002|0.002|0.97|
|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)|0.003|0.003|0.84|
|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)|0.009|0.009|1.00|
|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)|0.027|0.030|0.90|
|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)|0.008|0.007|1.07|
|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)|0.066|0.072|0.91|
|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)|0.090|0.054|1.68|
|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)|0.328|0.409|0.80|
|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)|0.659|0.697|0.95|
|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)|1.266|1.403|0.90|
|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)|3.550|4.145|0.86|
|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)|1.188|1.375|0.86|
|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)|2.683|3.236|0.83|
|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)|4.491|5.501|0.82|
|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)|8.916|10.181|0.88|
|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)|69.995|72.296|0.97|
|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)|22.531|23.139|0.97|
|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)|2.239|1.933|1.16|
|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU_FP16)|-|1.010|-|
|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)|3.134|2.068|1.52|
|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU_FP16)|-|1.062|-|
|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)|1.918|1.920|1.00|
|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU_FP16)|-|1.014|-|
|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)|2.340|2.352|0.99|
|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)|-|1.247|-|
|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)|1.116|1.111|1.00|
|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU_FP16)|-|1.114|-|
|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)|1.116|1.112|1.00|
|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU_FP16)|-|1.113|-|
|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)|3.067|3.085|0.99|
|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)|-|1.622|-|
|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)|1.153|1.187|0.97|
|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU_FP16)|-|1.150|-|
|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)|4.804|4.849|0.99|
|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU_FP16)|-|2.922|-|
|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)|1.463|1.469|1.00|
|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)|-|1.459|-|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)|1.577|1.580|1.00|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU_FP16)|-|1.580|-|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)|1.826|1.818|1.00|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU_FP16)|-|1.817|-|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)|6.541|5.081|1.29|
|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU_FP16)|-|2.809|-|
|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)|1.912|1.919|1.00|
|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)|-|1.919|-|
|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|1.961|1.971|0.99|
|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|1.961|-|
|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)|2.317|2.329|0.99|
|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU_FP16)|-|2.322|-|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)|2.920|2.947|0.99|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU_FP16)|-|2.924|-|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)|2.467|2.466|1.00|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU_FP16)|-|2.496|-|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)|3.028|2.997|1.01|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)|-|2.986|-|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)|4.353|4.355|1.00|
|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU_FP16)|-|4.355|-|
|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)|2.762|2.793|0.99|
|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)|-|2.797|-|
|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)|3.428|3.226|1.06|
|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU_FP16)|-|3.223|-|
|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)|3.967|3.957|1.00|
|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU_FP16)|-|3.960|-|
|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)|4.806|4.387|1.10|
|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU_FP16)|-|4.366|-|
|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)|14.509|11.756|1.23|
|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|6.510|-|
|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)|13.718|13.287|1.03|
|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)|-|7.190|-|
|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)|15.133|14.853|1.02|
|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)|-|8.671|-|
|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)|41.928|43.328|0.97|
|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU_FP16)|-|38.072|-|
|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)|4.409|4.428|1.00|
|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)|-|4.427|-|
|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)|6.144|5.363|1.15|
|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU_FP16)|-|5.368|-|
|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)|3.926|3.932|1.00|
|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)|-|3.938|-|
|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)|3.920|3.915|1.00|
|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)|-|3.950|-|
|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|3.767|3.764|1.00|
|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|3.762|-|
|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)|19.959|13.875|1.44|
|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU_FP16)|-|7.781|-|
|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)|3.951|3.955|1.00|
|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)|-|3.969|-|
|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)|4.050|4.034|1.00|
|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU_FP16)|-|4.093|-|
|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)|4.923|4.506|1.09|
|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU_FP16)|-|4.509|-|
|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)|4.759|4.476|1.06|
|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU_FP16)|-|4.447|-|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)|6.079|5.628|1.08|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU_FP16)|-|5.625|-|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)|19.843|17.523|1.13|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)|-|8.917|-|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)|8.334|8.247|1.01|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU_FP16)|-|8.246|-|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)|23.164|18.199|1.27|
|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)|-|9.305|-|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)|5.184|5.178|1.00|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU_FP16)|-|5.149|-|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)|17.990|18.103|0.99|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)|-|9.777|-|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)|4.831|4.522|1.07|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU_FP16)|-|4.523|-|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)|17.328|17.319|1.00|
|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)|-|8.948|-|
|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)|5.944|5.961|1.00|
|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU_FP16)|-|5.936|-|
|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)|19.811|20.064|0.99|
|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)|-|11.705|-|
|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)|22.398|17.686|1.27|
|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU_FP16)|-|9.859|-|
|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)|0.416|0.416|1.00|
|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|0.417|-|
|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)|5.356|5.110|1.05|
|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU_FP16)|-|5.114|-|
|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)|5.092|4.748|1.07|
|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU_FP16)|-|4.754|-|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)|0.260|0.229|1.13|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|0.229|-|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)|5.872|5.460|1.08|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU_FP16)|-|5.460|-|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)|0.161|0.161|1.00|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|0.161|-|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|7.176|7.175|1.00|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|7.162|-|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)|7.174|7.185|1.00|
|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU_FP16)|-|7.157|-|
|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)|5.400|5.180|1.04|
|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU_FP16)|-|5.201|-|
|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)|5.330|5.188|1.03|
|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU_FP16)|-|5.177|-|
|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)|0.115|0.115|1.00|
|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|0.115|-|
|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)|26.156|20.222|1.29|
|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU_FP16)|-|11.203|-|
|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)|5.627|5.543|1.02|
|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU_FP16)|-|5.506|-|
|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)|27.925|27.741|1.01|
|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU_FP16)|-|17.217|-|
|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)|6.359|6.062|1.05|
|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU_FP16)|-|6.048|-|
|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)|6.559|6.322|1.04|
|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU_FP16)|-|6.280|-|
|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)|6.412|6.200|1.03|
|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU_FP16)|-|6.197|-|
|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)|9.167|8.624|1.06|
|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU_FP16)|-|8.626|-|
|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)|6.755|6.491|1.04|
|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU_FP16)|-|6.520|-|
|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)|35.664|34.752|1.03|
|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU_FP16)|-|20.260|-|
|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|9.514|9.414|1.01|
|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|9.462|-|
|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|10.631|9.963|1.07|
|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|9.935|-|
|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)|37.465|36.798|1.02|
|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)|-|19.569|-|
|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)|38.157|36.157|1.06|
|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU_FP16)|-|18.902|-|
|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|10.356|10.401|1.00|
|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|10.360|-|
|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)|12.641|12.150|1.04|
|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)|-|12.162|-|
|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)|50.545|50.505|1.00|
|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU_FP16)|-|27.950|-|
|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)|54.233|49.603|1.09|
|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)|-|26.515|-|
|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|13.779|12.968|1.06|
|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|12.984|-|
|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|15.809|15.329|1.03|
|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|15.433|-|
|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|14.563|14.527|1.00|
|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|14.480|-|
|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|16.714|16.484|1.01|
|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|16.362|-|
|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)|77.832|65.729|1.18|
|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU_FP16)|-|32.065|-|
|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)|21.903|20.386|1.07|
|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)|-|20.416|-|
|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)|20.405|18.148|1.12|
|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU_FP16)|-|18.128|-|
|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|20.334|18.521|1.10|
|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|18.495|-|
|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|21.527|19.584|1.10|
|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|19.630|-|
|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)|22.715|20.057|1.13|
|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)|-|20.068|-|
|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)|26.228|24.992|1.05|
|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|24.957|-|
|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|21.524|21.581|1.00|
|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|21.782|-|
|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)|34.094|31.964|1.07|
|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|31.925|-|
|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)|28.677|27.813|1.03|
|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|27.808|-|
|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)|31.274|27.892|1.12|
|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|27.910|-|
|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)|30.533|30.007|1.02|
|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|30.089|-|
|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)|39.837|38.312|1.04|
|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|38.477|-|
|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)|32.480|29.237|1.11|
|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU_FP16)|-|29.452|-|
|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)|33.544|32.832|1.02|
|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU_FP16)|-|32.784|-|
|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)|134.481|130.678|1.03|
|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU_FP16)|-|70.134|-|
|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)|127.930|126.530|1.01|
|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)|-|65.261|-|
|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)|201.346|187.007|1.08|
|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)|-|91.525|-|
|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)|252.038|245.587|1.03|
|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU_FP16)|-|125.477|-|

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-05-17 09:38:33 +03:00
wanli
46991bcd62 Solve the bug of same shape broadcast with CUDA 2023-05-15 13:55:38 +08:00
Alexander Smorkalov
85b04f0b4d
Merge pull request #23557 from WanliZhong:eltwise_cpu_bug
fix nary elementwise bug in cpu
2023-05-11 15:56:46 +03:00
wanli
85cc4086c8 fix nary elementwise bug in cpu 2023-05-10 14:29:33 +08:00
Alexander Smorkalov
25c28c5da4
Merge pull request #23485 from zihaomu:add_onnx_where
DNN: add ONNX where node support
2023-05-05 09:21:07 +03:00
zihaomu
0513741a85 add broadcast where node 2023-05-05 11:16:19 +08:00
Alexander Alekhin
3c76b33532 Merge pull request #22614 from zihaomu:add_std2DB_API 2023-05-01 19:37:23 +00:00
zihaomu
8be93a6de7 add scale factor to DB demo. 2023-04-30 22:03:21 +08:00
Abduragim Shtanchaev
3b1ee0549b added test for lstm without hidden
states initialization
2023-04-25 16:01:13 +03:00
Dmitry Kurtaev
aa57833ad5
Merge pull request #23409 from dkurt:dnn_tflite_quant
Import and inference INT8 quantized TFLite model #23409

### Pull Request Readiness Checklist

* Support quantized TFLite models
* Enable fused activations (FP32, INT8)

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1048

![res](https://user-images.githubusercontent.com/25801568/231433201-566b4bd6-ccff-462c-9e74-adbdcdf3648b.png)

on the image, green boxes are from TFLite and red boxes from OpenCV

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-04-24 13:44:10 +03:00
Alexander Alekhin
9ab0ff6cf2 Merge pull request #23511 from zihaomu:issue_23465 2023-04-22 04:01:26 +00:00
Zihao Mu
601778e0e6
Merge pull request #22750 from zihaomu:improve_blobFromImage
DNN: Add New API blobFromImageParam #22750

The purpose of this PR:

1. Add new API `blobFromImageParam` to extend `blobFromImage` API. It can support the different data layout (NCHW or NHWC), and letter_box.
2. ~~`blobFromImage` can output `CV_16F`~~

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-04-21 19:10:17 +03:00
zihaomu
54e1a8709d fix the bug, disable the fast1x1 when padding is not 0. 2023-04-21 10:55:07 +08:00
Abduragim Shtanchaev
b3a2444bcf
Merge pull request #23501 from Abdurrahheem:additional_lstm_tests
Added LSTM and GRU tests for various batch and input length sizes #23501

Added tests with various sequence length and batch sizes
Test data: https://github.com/opencv/opencv_extra/pull/1057

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-04-20 10:11:33 +03:00
zihaomu
51281f8d69 support the split node of onnx opset >= 13 2023-04-11 16:18:50 +08:00
Dmitry Kurtaev
5e1d33329b Several fixes for ONNX importer: Expand, Gather 2023-03-27 22:15:26 +03:00
Dmitry Kurtaev
5df6b4a756
Merge pull request #23325 from dkurt:dnn_input_info
Propagate inputs info for ONNX and TFLite models

### Pull Request Readiness Checklist

Needed for generic applications such as benchmarking pipelines. So OpenCV can tell about the default input shapes specified in the models.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-03-21 14:50:53 +03:00
zihaomu
ee3740af00 move global skip out of if loop, and add opencv_deny_list 2023-03-13 22:16:51 +08:00
Alexander Alekhin
bdff0949bb dnn(tflite): add 3rdparty flatbuffers with pre-generated schema 2023-02-21 16:06:19 +00:00
Dmitry Kurtaev
76350cd30f
Merge pull request #23161 from dkurt:dnn_tflite
TFLite models importer

* initial commit

* Refactor TFLiteImporter

* Better FlatBuffers detection

* Add permute before 4D->3D reshape

* Track layers layout

* TFLite Convolution2DTransposeBias layer

* Skip TFLite tests without FlatBuffers

* Fix check of FlatBuffers in tests. Add readNetFromTFLite from buffer

* TFLite Max Unpooling test

* Add skip for TFLite unpooling test

* Revert DW convolution workaround

* Fix ObjC bindings

* Better errors handling

* Regenerate TFLite schema using flatc

* dnn(tflite): more checks, better logging

* Checks for unimplemented fusion. Fix tests
2023-02-13 14:00:20 +00:00
Yuantao Feng
c2b7c1f13b
Merge pull request #23219 from fengyuentau:add_gelu
Add GELU layer for vision transformers

* add gelu and gelu approximation

* drop setKernelParams
2023-02-10 18:03:29 +00:00
Alexander Alekhin
96a45e842e
Merge pull request #23061 from WanliZhong:gemm_cuda
DNN: make GEMM can be supported with transA and transB in CUDA
2023-02-09 00:06:32 +03:00
wanli
4718a4bf81 make GEMM can be supported with transA and transB in CUDA 2023-01-31 15:14:17 +08:00
Alexander Alekhin
cd44aa0bb1 Merge pull request #23162 from zihaomu:issue_23151 2023-01-28 13:00:43 +00:00
zihaomu
f45a12439a fix depth wise issue. 2023-01-28 11:41:00 +08:00
Yuantao Feng
4d918ba40b
Merge pull request #23047 from fengyuentau:layer_norm
dnn: add layer normalization for vision transformers

* add layer norm onnx parser, impl and tests

* add onnx graph simplifier for layer norm expanded

* handle the case when constants are of type Initializer

* add test case for layer norm expanded with initializers

* use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16

* use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull

* template hasBias

* remove trailing whitespace

* use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt

* refactor implementation, optimize parallel_for

* disable layer norm expanded

* remove the removal of layer norm optional outputs
2023-01-27 16:35:59 +03:00
Alexander Alekhin
18cbfa4a4f Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2023-01-23 00:11:12 +00:00
zihaomu
840b1d5c94 add depthwise add fuse 2023-01-11 08:42:51 +08:00
Dmitry Kurtaev
8681686d8f
Merge pull request #22957 from dkurt:new_openvino_api
Switch to new OpenVINO API after 2022.1 release

* Pass Layer_Test_Convolution_DLDT.Accuracy/0 test

* Pass test Test_Caffe_layers.Softmax

* Failed 136 tests

* Fix Concat. Failed 120 tests

* Custom nGraph ops. 19 failed tests

* Set and get properties from Core

* Read model from buffer

* Change MaxPooling layer output names. Restore reshape

* Cosmetic changes

* Cosmetic changes

* Override getOutputsInfo

* Fixes for OpenVINO < 2022.1

* Async inference for 2021.4 and less

* Compile model with config

* Fix serialize for 2022.1

* Asynchronous inference with 2022.1

* Handle 1d outputs

* Work with model with dynamic output shape

* Fixes with 1d output for old API

* Control outputs by nGraph function for all OpenVINO versions

* Refer inputs in PrePostProcessor by indices

* Fix cycled dependency between InfEngineNgraphNode and InfEngineNgraphNet.
Add InferRequest callback only for async inference. Do not capture InferRequest object.

* Fix tests thresholds

* Fix HETERO:GPU,CPU plugin issues with unsupported layer
2022-12-23 16:58:41 +00:00
Alexander Smorkalov
9012e6dd9b
Merge pull request #22965 from vrabaud:numpy_fix
Remove references to deprecated NumPy type aliases.
2022-12-23 15:34:02 +03:00
Alexander Smorkalov
4930516652
Merge pull request #22898 from fengyuentau:slice_neg_steps
dnn: support ONNX Slice with negative steps by adding and using cv::flipND
2022-12-23 14:15:06 +03:00
Vincent Rabaud
ad568edd7f Remove references to deprecated NumPy type aliases.
This change replaces references to a number of deprecated NumPy
type aliases (np.bool, np.int, np.float, np.complex, np.object,
np.str) with their recommended replacement (bool, int, float,
complex, object, str).

Those types were deprecated in 1.20 and are removed in 1.24,
cf https://github.com/numpy/numpy/pull/22607.
2022-12-23 13:53:49 +03:00
Alexander Alekhin
1f41d06f9a Merge pull request #23008 from mshabunin:fix-yolov4-tiny-hash 2022-12-23 10:14:25 +00:00
fengyuentau
34a0897f90 add cv::flipND; support onnx slice with negative steps via cv::flipND 2022-12-23 16:39:53 +08:00
Maksim Shabunin
d35fbe6bfc dnn: updated YOLOv4-tiny model and tests 2022-12-22 15:49:21 +03:00
Alexander Alekhin
6b4f3e5fab Merge pull request #22993 from alalek:fixup_21738 2022-12-21 19:50:51 +00:00
Yuantao Feng
a2b3acfc6e
dnn: add the CANN backend (#22634)
* cann backend impl v1

* cann backend impl v2: use opencv parsers to build models for cann

* adjust fc according to the new transA and transB

* put cann net in cann backend node and reuse forwardLayer

* use fork() to create a child process and compile cann model

* remove legacy code

* remove debug code

* fall bcak to CPU backend if there is one layer not supoorted by CANN backend

* fix netInput forward
2022-12-21 09:04:41 +03:00
Alexander Alekhin
cdbb893b27 dnn: disable OpenCL code path in MatMul processing
- this mode is not supported by 22828
2022-12-20 09:46:48 +00:00
Alexander Alekhin
1102b7eff8 dnn: fix gather layer implementation
- support FP16 data
2022-12-20 06:09:34 +00:00
zoom
4891818114 make MatMul support 3D or 4D with broadcast 2022-12-15 10:36:08 +08:00
Alexander Alekhin
8ba44e7d55 Merge pull request #22882 from zihaomu:gemm_first_const 2022-12-08 14:18:33 +00:00
Zihao Mu
0a650b573b
Merge pull request #22840 from zihaomu:optimze_conv_memory_usage
DNN: reduce the memory used in convolution layer

* reduce the memory in winograd and disabel the test when usage memory is larger than 2gb.

* remove VERY_LOG tag
2022-12-08 12:57:13 +00:00
Alexander Alekhin
b16f76eede Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-12-03 12:39:41 +00:00
Alexander Alekhin
d16b3b2487 dnn(test): restore openvino tests with 'Cannot get memory' message 2022-12-03 01:34:48 +00:00
Alexander Smorkalov
e14ca39fd7
Merge pull request #22857 from fengyuentau:batched_nms
dnn: add batched nms
2022-11-30 12:37:49 +03:00
Alexander Smorkalov
421ba8730a
Merge pull request #22809 from fengyuentau:tile
dnn: support ONNX Tile
2022-11-29 14:42:28 +03:00
zihaomu
0d56524b72 gemm support transA and transB, and first input is constance. 2022-11-29 17:13:36 +08:00
fengyuentau
9fded9ca53 batched nms impl 2022-11-29 15:32:34 +08:00
fengyuentau
441624a5fb tile impl 2022-11-29 11:15:38 +08:00
zoom
5044af69d1 let MatMul can work when both two inputs are const 2022-11-27 17:32:41 +08:00
zoom
ef2677b0a6 Make MatMul layer support 3d or 4d operation with const input 2022-11-10 11:41:44 +08:00
Zihao Mu
903bf0147e
Merge pull request #22666 from zihaomu:support_onnx_qdq_model
DNN: let Quant and Dequant of ONNX_importer support the Constant input.

* let Quant and Dequant support the Constant input.

* fix negative value of axis.
2022-10-31 16:06:31 +00:00
Alexander Smorkalov
22f8fb4d5c Do not fail tests in Yolo v7 model was not found. 2022-10-24 17:59:18 +03:00
Dmitry Kurtaev
35b2cff295
Merge pull request #22656 from dkurt:halide_fixes
* Fixes for Halide
* Enable some Halide tests
2022-10-21 17:49:49 +03:00
Alexander Smorkalov
5d292826b2
Merge pull request #22593 from zihaomu:optimize_wino
optimize winograd futher more
2022-10-19 13:08:32 +03:00
Alexander Smorkalov
f378f02954
Merge pull request #22652 from rogday:cuda_test_fixes
Address CUDA-related errors
2022-10-19 09:37:12 +03:00
Smirnov Egor
dd14cf6a9c address CUDA-related errors and enable cuda in elementwise ops 2022-10-18 16:54:42 +03:00
Alexander Smorkalov
ec7fc5adca
Merge pull request #22529 from fengyuentau:scatter_scatternd
DNN: supports Scatter and ScatterND from ONNX
2022-10-17 14:57:46 +03:00
fengyuentau
d24d8f2abe implementation of scatter and scatternd with conformance tests enabled 2022-10-17 11:30:32 +08:00
zoom
d816442e4d Make Unsqueeze layer support negative axes. 2022-10-14 18:00:19 +08:00
Zihao Mu
0fa43e3aac Optimize the winograd futher more. 2022-10-14 10:15:45 +08:00
Alexander Smorkalov
ec26541771
Merge pull request #22577 from zihaomu:Disable_winograd_branch_in_tryquantize
DNN: add enableWinograd API for Net
2022-10-11 09:44:00 +03:00
Alexander Smorkalov
3419e64dcf
Merge pull request #22611 from zihaomu:greaterOrEqual
DNN: support GreaterOrEqual and LessOrEqual op in ONNX
2022-10-10 11:43:44 +03:00
Zihao Mu
1e2ceca4df add enableWinograd API for Net. 2022-10-09 09:33:07 +08:00
Alexander Alekhin
347246901e Merge pull request #21745 from alalek:dnn_plugin_openvino 2022-10-08 22:32:25 +00:00
Zihao Mu
9821fae59d add greater_or_equal and less_or_equal ONNX support 2022-10-08 15:51:40 +08:00
Alexander Alekhin
43b2bb2c25 dnn: plugin support for OpenVINO 2022-10-07 16:57:31 +00:00
zoom
4557971481 enhance slice layer
refactor the code for parsing Slice layer
add test for Slice layer
let 'begin' and 'end' resize to dims
add opset message comment
2022-10-01 17:12:07 +08:00
Alexander Smorkalov
a6274647a4
Merge pull request #21738 from rogday:gather
add Gather implementation
2022-09-19 16:21:14 +03:00
Egor Smirnov
65f71ce2eb add Gather implementation 2022-09-19 15:06:44 +03:00
Alexander Smorkalov
6aefb8e86f
Merge pull request #22290 from fengyuentau:naive_yolov7
Support for YOLOv7 ONNX (not simplified)
2022-09-19 14:43:18 +03:00
fengyuentau
4aef9b1c93 dnn: support yolov7 (not simplified) 2022-09-19 18:38:03 +08:00
anton
337452b4c0 changed names of permutations if Reshpe is in NHWC 2022-09-03 19:02:41 +02:00
Alexander Smorkalov
d2c48b898c Merge pull request #22306 from zihaomu:qgemm_and_squeeze_opset13_onnximporter 2022-08-30 06:33:57 +00:00
Zihao Mu
2d837efba7 add qgemm and squeeze op13 supported on ONNXImporter 2022-08-30 09:50:29 +08:00
Alexander Smorkalov
2619099fe5
Merge pull request #22337 from zihaomu:load_ONNX_fp16_as_fp32
DNN: load fp16 ONNX model as fp32
2022-08-29 09:32:25 +03:00
Zihao Mu
bb64db98d8
Further optimization of Conv2D, fused Conv_Add_Activation, bring latest code from ficus OpConv.fx. (#22401) 2022-08-26 12:57:25 +03:00
Zihao Mu
7eaec9dd22 load fp16 as fp32 and align fp16 and double in onnx_graph_simplifie 2022-08-26 10:04:44 +08:00
Alexander Alekhin
2ebdc04787 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-08-14 15:50:42 +00:00
Alexander Smorkalov
bb71cb200e
Merge pull request #22199 from zihaomu:bug_fix_22195
DNN: Reduce Layer (add dynamic batch and ReduceSum support)
2022-08-11 12:59:51 +03:00
Alexander Smorkalov
b2b7193374
Merge pull request #22311 from zihaomu:layer_fused_optmized_mish
DNN: add another two Mish activation to onnx_graph_simplifier
2022-08-05 14:22:06 +03:00
Zihao Mu
0614c40b42 add more skip for very long test case in test_dnn. 2022-08-02 14:58:05 +08:00
Zihao Mu
d4640f4647 support ReduceLayer without reshape layer. 2022-08-02 10:32:31 +08:00
Zihao Mu
3c5377ca1b add another Mish graph simplifier. 2022-07-28 11:21:29 +08:00
rogday
ed69bcae2d
Merge pull request #21865 from rogday:nary_eltwise_layers
Reimplementation of Element-wise layers with broadcasting support

* init

* semi-working initial version

* add small_vector

* wip

* remove smallvec

* add nary function

* replace auto with Mat in lambda expr used in transform

* uncomment asserts

* autobuffer shape_buf & step_buf

* fix a missing bracket

* fixed a missing addLayer in parseElementWise

* solve one-dimensional broadcast

* remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called

* one autobuffer for step & shape

* temporal fix for the missing original dimension information

* fix parseUnsqueeze when it gets a 1d tensor constant

* support sum/mean/min/max with only one input

* reuse old code to handle cases of two non-constant inputs

* add condition to handle div & mul of two non-constant inputs

* use || instead of or

* remove trainling spaces

* enlarge buf in binary_forward to contain other buffer

* use autobuffer in nary_forward

* generate data randomly and add more cases for perf

* add op and, or & xor

* update perf_dnn

* remove some comments

* remove legacy; add two ONNX conformance tests in filter

* move from cpu_denylist to all_denylist

* adjust parsing for inputs>=2

Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
2022-07-19 06:14:05 +03:00
Zihao Mu
1b8fba8e26 support ReduceSum with two input and dynamic shape batch size in ReduceLayer. 2022-07-13 13:46:16 +08:00
Zihao Mu
45fbb67aba fix scale layer can not handle 1x1 weight correctly. 2022-07-13 11:25:27 +08:00
Zihao Mu
a80fcacd90
Merge pull request #21372 from zihaomu:dnn_quantize_per_tensor
Add per_tensor_quantize to int8 quantize

* add per_tensor_quantize to dnn int8 module.

* change api flag from perTensor to perChannel, and recognize quantize type and onnx importer.

* change the default to hpp
2022-07-05 19:14:42 +03:00
Zihao Mu
59b870a87a
Merge pull request #21910 from zihaomu:fast_conv_ARM
DNN: Accelerating convolution

* Fast Conv of ARM, X86 and universal intrinsics.

* improve code style.

* error fixed.

* improve the License

* optimize memory allocated and Adjust the threshold.

* change FasterRCNN_vgg16 to 2GB memory.
2022-07-01 13:03:15 +03:00
Zihao Mu
ef94275eb6 bug fixed of GEMM node in ONNX_importer 2022-06-22 21:08:48 +08:00
Wanli
a6ca48a1c2
Merge pull request #22100 from WanliZhong:issue_22015
Fix issue 22015, let Clip layer support 1-3 inputs

* Fix issue 22015.
Let layer Clip support 1-3 inputs.

* Resolve other problems caused by modifications

* Update onnx_importer.cpp

added extra checks to min/max handling in Clip

* Add assertions to check the size of the input

* Add test for clip with min and max initializers

* Separate test for "clip_init_min_max". Change the check method for input_size to provide a clearer message in case of problem.

* Add tests for clip with min or max initializers

* Change the implementation of getting input

Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
2022-06-22 14:21:16 +03:00
Zihao Mu
2411b825b4 bug fixed of GEMM node in ONNX_importer 2022-06-22 15:00:17 +08:00
Alexander Alekhin
583bd1a6e2 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-06-04 19:10:35 +00:00
Alexander Alekhin
978dc76653 Merge pull request #22006 from rogday:21947_fix 2022-05-24 19:26:02 +00:00
rogday
a2ad997e97 fix vector access in TF::sortByExecutionOrder 2022-05-24 00:05:13 +03:00
Alexander Alekhin
d9bf522b27 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-05-23 16:06:14 +00:00
rogday
93dc0679ec
Merge pull request #21818 from rogday:revert_renaming
* add prefixes to layer names and layer output names

* dnn: OPENCV_DNN_ONNX_USE_LEGACY_NAMES runtime parameter

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2022-05-23 14:50:42 +00:00
Alexander Alekhin
bb5462e327 Merge pull request #21991 from fengyuentau:qconv_asympad 2022-05-19 17:20:04 +00:00
fengyuentau
ff88132620 support asymmetric paddings for qconv 2022-05-16 19:01:37 +08:00
Zihao Mu
64ded50bbf parsing depth2space and space2depth of ONNX importer 2022-04-29 10:17:02 +08:00
rogday
9cd5a0a1e6
Merge pull request #21884 from rogday:cuda_cleanup
Fix CUDA compilation issues and adjust thresholds.

* Fix CUDA compilation issues and adjust thresholds.

* add conformance tests to denylist
2022-04-19 16:40:25 +00:00
OpenCV Developers
2985739b8c Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-04-16 14:41:15 +00:00
rogday
a2b84e9897 add assert to tf graph simplifier to address security concerns 2022-04-13 22:50:27 +03:00
OpenCV Pushbot
66f3c2673c
Merge pull request #21831 from zihaomu:sign_layer_onnx
DNN: Add sign, shrink and reciprocal for onnx_impoter
2022-04-13 17:08:30 +00:00
OpenCV Pushbot
03c9648f2e
Merge pull request #21854 from opencv-pushbot:dnn_test_update_checks_face_detector_4.x 2022-04-12 17:20:22 +00:00
OpenCV Developers
e3a55af336 dnn(test): update opencv_face_detector checks
original commit: be4a432bea
2022-04-11 20:27:06 +00:00
OpenCV Developers
be4a432bea dnn(test): update opencv_face_detector checks 2022-04-11 20:26:25 +00:00
zihaomu
e36948cfbc add ONNX OP sign, shrink and reciprocal 2022-04-07 15:32:12 +08:00
Alexander Alekhin
08d44f588f dnn(test): update OpenVINO tests 2022.1.0 (OpenCV 4.x) 2022-04-05 14:13:38 +00:00
Alexander Alekhin
13a995cc1d Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-04-02 19:45:44 +00:00
Alexander Alekhin
4d927e73f1 dnn(test): update OpenVINO tests 2022.1.0 2022-04-02 17:42:53 +00:00
Zihao Mu
7b582b71ba
Merge pull request #21036 from fengyuentau:timvx_backend_support
dnn: TIM-VX NPU backend support

* Add TimVX NPU backend for DNN module.

* use official branch from tim-vx repo; fix detecting viv sdk

Co-authored-by: fytao <yuantao.feng@outlook.com>
2022-03-31 21:42:11 +00:00
Alexander Alekhin
5e434073d4 Merge pull request #21796 from alalek:dnn_reduce_fixup_21601 2022-03-30 22:26:28 +00:00
Alexander Alekhin
6f5cf8c15f dnn: fix ReduceLayer implementation, update OpenVINO tests 2022-03-30 20:03:41 +00:00
Alexander Alekhin
b687bc807a dnn(test): update OpenVINO tests 2021.4.2 2022-03-30 18:58:35 +00:00
Alexander Alekhin
1339ebaa84 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-03-26 16:00:28 +00:00
Alexander Alekhin
c9b90884da Merge pull request #21601 from zihaomu:add_reduceLayer 2022-03-26 10:20:10 +00:00
luz paz
8e8e4bbabc dnn: fix various dnn related typos
Fixes source comments and documentation related to dnn code.
2022-03-23 18:12:12 -04:00
Zihao Mu
b6b5c27cec Support for some reduce layers for onnx 2022-03-18 10:19:13 +08:00
rogday
93353aea70
Merge pull request #21522 from rogday:lstm
Fix LSTM support in ONNX

* fix LSTM and add peephole support

* disable old tests

* turn lambdas into functions

* more hacks for  c++98

* add assertions

* slice fixes

* backport of cuda-related fixes

* address review comments
2022-03-15 09:14:05 +03:00
Alexander Alekhin
901e0ddfe4 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-03-05 19:46:28 +00:00
Alexander Alekhin
5cc27fd3b5 Merge pull request #21542 from rogday:split_expand 2022-02-28 22:38:24 +00:00
Egor Smirnov
375fe81311 fix slice and expand 2022-02-28 17:18:07 +03:00
Maksim Shabunin
a251474144 Update filters in ONNX tests 2022-02-15 11:56:28 +03:00
Maksim Shabunin
45cbf70265 Update filters in ONNX tests 2022-02-14 17:16:49 +03:00
Alexander Alekhin
19926e2979 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-02-11 17:32:37 +00:00
Alexander Alekhin
effce0573b dnn: drop legacy Inference Engine NN builder API 2022-02-10 11:55:24 +00:00
Alexander Alekhin
57d3002ee1 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-02-06 16:10:43 +00:00
Alexander Alekhin
1f70d4e2a5 dnn(test): re-enable ONNX split tests for OpenVINO 2022-02-06 10:36:15 +00:00
Alexander Alekhin
870c8d3c4e dnn(test): fix int8 tolerances 2022-01-31 12:54:01 +00:00
Alexander Alekhin
d573472a86 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-01-31 12:53:45 +00:00
Alexander Alekhin
a7e6a1059c dnn(test): fix outputs handling in ONNX conformance
- ONNX output is 1 tensor per defined output instead of N tensors from outputs of "output" layer
2022-01-29 23:29:51 +00:00
Zihao Mu
9e3ba487fa
Merge pull request #21518 from zihaomu:resize_onnx_opset13
Add resize layer compatible with ONNX opset13 version
2022-01-28 17:55:01 +03:00
Alexander Alekhin
9699e2b483 dnn(onnx): handle non-default ONNX domains
- re-enable quantized models tests
2021-12-25 01:38:52 +00:00
Alexander Alekhin
217fea9667 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-12-24 16:48:07 +00:00
Alexander Alekhin
6e299b582a dnn(test): decompose 'DynamicAxes' test 2021-12-23 00:47:27 +00:00
Alexander Alekhin
9777fbacf6 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-12-22 15:57:02 +00:00
Alexander Alekhin
c78a8dfd2d fix 4.x links 2021-12-22 13:24:30 +00:00
Alexander Alekhin
b1a57c4cb2 fix 3.4 links 2021-12-22 12:38:21 +00:00
rogday
0a178a687a fix const/x in Div 2021-12-20 19:53:37 +03:00
Alexander Alekhin
80492d663e Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-12-18 16:19:06 +00:00
Smirnov Egor
71a22e45b0 add celu, hardsigmoid, selu, thresholdedrelu layers 2021-12-18 03:19:54 +03:00
Alexander Alekhin
04ee99f1a3 Merge pull request #21282 from alalek:dnn_test_vulkan_skip 2021-12-17 19:39:35 +00:00
Smirnov Egor
1bd382c1d0 Add acos, acosh, asin, asinh, atan, atanh, cos, cosh, erf, hardswish, sin, sinh, softplus, softsign, tan layers 2021-12-17 18:19:40 +03:00
Alexander Alekhin
249c508126 dnn(test): skip failed tests on Vulkan backend 2021-12-17 14:01:04 +00:00
Smirnov Egor
fec2c7e715 fix Flatten layer 2021-12-17 16:29:56 +03:00
Gruhuang
b4bb98ea60
Merge pull request #21268 from pccvlab:tf_Arg
add argmax and argmin parsing for tensorflow

* add argmax and argmin for tf

* remove whitespace

* remove whitespace

* remove static_cast

Signed-off-by: Crayon-new <1349159541@qq.com>
2021-12-16 17:06:02 +00:00
Alexander Alekhin
6d677bbd63 dnn(test): update ONNX conformance filters (4.x) 2021-12-16 12:09:31 +00:00