mirror of
https://github.com/opencv/opencv.git
synced 2025-01-12 15:49:32 +08:00
f185802489
Other links: - https://raw.githubusercontent.com/opencv/opencv/master - https://github.com/opencv/opencv/blob/master
4.8 KiB
4.8 KiB
OpenCV deep learning module samples
Model Zoo
Object detection
Model | Scale | Size WxH | Mean subtraction | Channels order |
---|---|---|---|---|
MobileNet-SSD, Caffe | 0.00784 (2/255) |
300x300 |
127.5 127.5 127.5 |
BGR |
OpenCV face detector | 1.0 |
300x300 |
104 177 123 |
BGR |
SSDs from TensorFlow | 0.00784 (2/255) |
300x300 |
127.5 127.5 127.5 |
RGB |
YOLO | 0.00392 (1/255) |
416x416 |
0 0 0 |
RGB |
VGG16-SSD | 1.0 |
300x300 |
104 117 123 |
BGR |
Faster-RCNN | 1.0 |
800x600 |
102.9801 115.9465 122.7717 |
BGR |
R-FCN | 1.0 |
800x600 |
102.9801 115.9465 122.7717 |
BGR |
Faster-RCNN, ResNet backbone | 1.0 |
300x300 |
103.939 116.779 123.68 |
RGB |
Faster-RCNN, InceptionV2 backbone | 0.00784 (2/255) |
300x300 |
127.5 127.5 127.5 |
RGB |
Face detection
An origin model
with single precision floating point weights has been quantized using TensorFlow framework.
To achieve the best accuracy run the model on BGR images resized to 300x300
applying mean subtraction
of values (104, 177, 123)
for each blue, green and red channels correspondingly.
The following are accuracy metrics obtained using COCO object detection evaluation
tool on FDDB dataset
(see script)
applying resize to 300x300
and keeping an origin images' sizes.
AP - Average Precision | FP32/FP16 | UINT8 | FP32/FP16 | UINT8 |
AR - Average Recall | 300x300 | 300x300 | any size | any size |
--------------------------------------------------|-----------|----------------|-----------|----------------|
AP @[ IoU=0.50:0.95 | area= all | maxDets=100 ] | 0.408 | 0.408 | 0.378 | 0.328 (-0.050) |
AP @[ IoU=0.50 | area= all | maxDets=100 ] | 0.849 | 0.849 | 0.797 | 0.790 (-0.007) |
AP @[ IoU=0.75 | area= all | maxDets=100 ] | 0.251 | 0.251 | 0.208 | 0.140 (-0.068) |
AP @[ IoU=0.50:0.95 | area= small | maxDets=100 ] | 0.050 | 0.051 (+0.001) | 0.107 | 0.070 (-0.037) |
AP @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] | 0.381 | 0.379 (-0.002) | 0.380 | 0.368 (-0.012) |
AP @[ IoU=0.50:0.95 | area= large | maxDets=100 ] | 0.455 | 0.455 | 0.412 | 0.337 (-0.075) |
AR @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] | 0.299 | 0.299 | 0.279 | 0.246 (-0.033) |
AR @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] | 0.482 | 0.482 | 0.476 | 0.436 (-0.040) |
AR @[ IoU=0.50:0.95 | area= all | maxDets=100 ] | 0.496 | 0.496 | 0.491 | 0.451 (-0.040) |
AR @[ IoU=0.50:0.95 | area= small | maxDets=100 ] | 0.189 | 0.193 (+0.004) | 0.284 | 0.232 (-0.052) |
AR @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] | 0.481 | 0.480 (-0.001) | 0.470 | 0.458 (-0.012) |
AR @[ IoU=0.50:0.95 | area= large | maxDets=100 ] | 0.528 | 0.528 | 0.520 | 0.462 (-0.058) |
Classification
Model | Scale | Size WxH | Mean subtraction | Channels order |
---|---|---|---|---|
GoogLeNet | 1.0 |
224x224 |
104 117 123 |
BGR |
SqueezeNet | 1.0 |
227x227 |
0 0 0 |
BGR |
Semantic segmentation
Model | Scale | Size WxH | Mean subtraction | Channels order |
---|---|---|---|---|
ENet | 0.00392 (1/255) |
1024x512 |
0 0 0 |
RGB |
FCN8s | 1.0 |
500x500 |
0 0 0 |
BGR |