Merge pull request #25710 from gursimarsingh:improved_object_detection_sample

Merged yolo_detector and object detection sample #25710

Relates to #25006

This pull request merges the yolo_detector.cpp sample with the object_detector.cpp sample. It also beautifies the bounding box display on the output images

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
This commit is contained in:
Gursimar Singh 2024-09-18 18:49:46 +05:30 committed by GitHub
parent 46b800f506
commit e823493af1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 684 additions and 790 deletions

View File

@ -150,13 +150,12 @@ Once we have our ONNX graph of the model, we just simply can run with OpenCV's s
3. Run the following command: 3. Run the following command:
@code{.cpp} @code{.cpp}
./bin/example_dnn_yolo_detector --input=<path_to_your_input_file> \ ./bin/example_dnn_object_detection <model_name> --input=<path_to_your_input_file> \
--classes=<path_to_class_names_file> \ --labels=<path_to_class_names_file> \
--thr=<confidence_threshold> \ --thr=<confidence_threshold> \
--nms=<non_maximum_suppression_threshold> \ --nms=<non_maximum_suppression_threshold> \
--mean=<mean_normalization_value> \ --mean=<mean_normalization_value> \
--scale=<scale_factor> \ --scale=<scale_factor> \
--yolo=<yolo_model_version> \
--padvalue=<padding_value> \ --padvalue=<padding_value> \
--paddingmode=<padding_mode> \ --paddingmode=<padding_mode> \
--backend=<computation_backend> \ --backend=<computation_backend> \
@ -166,7 +165,7 @@ Once we have our ONNX graph of the model, we just simply can run with OpenCV's s
@endcode @endcode
- --input: File path to your input image or video. If omitted, it will capture frames from a camera. - --input: File path to your input image or video. If omitted, it will capture frames from a camera.
- --classes: File path to a text file containing class names for object detection. - --labels: File path to a text file containing class names for object detection.
- --thr: Confidence threshold for detection (e.g., 0.5). - --thr: Confidence threshold for detection (e.g., 0.5).
- --nms: Non-maximum suppression threshold (e.g., 0.4). - --nms: Non-maximum suppression threshold (e.g., 0.4).
- --mean: Mean normalization value (e.g., 0.0 for no mean normalization). - --mean: Mean normalization value (e.g., 0.0 for no mean normalization).
@ -191,43 +190,28 @@ To demonstrate how to run OpenCV YOLO samples without your own pretrained model,
Run the YOLOX detector(with default values): Run the YOLOX detector(with default values):
@code{.sh} @code{.sh}
git clone https://github.com/opencv/opencv_extra.git cd opencv/samples/dnn
cd opencv_extra/testdata/dnn export OPENCV_DOWNLOAD_CACHE_DIR=<path to download the model>
python download_models.py yolox_s_inf_decoder cd ../data
cd .. export OPENCV_SAMPLES_DATA_PATH=$(pwd)
export OPENCV_TEST_DATA_PATH=$(pwd) python download_models.py yolov8x --save_dir=$OPENCV_DOWNLOAD_CACHE_DIR
cd <build directory of OpenCV> cd <build directory of OpenCV>
./bin/example_dnn_yolo_detector ./bin/example_dnn_object_detection yolov8x
@endcode @endcode
This will execute the YOLOX detector with your camera. This will execute the YOLOX detector with your camera.
For YOLOv8 (for instance), follow these additional steps: For YOLOv8 (for instance), follow these additional steps:
@code{.sh} @code{.sh}
cd opencv_extra/testdata/dnn cd opencv/samples/dnn
python download_models.py yolov8 export OPENCV_DOWNLOAD_CACHE_DIR=<path to download the model>
cd .. cd ../data
export OPENCV_TEST_DATA_PATH=$(pwd) export OPENCV_SAMPLES_DATA_PATH=$(pwd)
python download_models.py yolov8n --save_dir=$OPENCV_DOWNLOAD_CACHE_DIR
cd <build directory of OpenCV> cd <build directory of OpenCV>
./bin/example_dnn_object_detection yolov8n --model=onnx/models/yolov8n.onnx --mean=0.0 --scale=0.003921568627 --paddingmode=2 --padvalue=144.0 --thr=0.5 --nms=0.4 --rgb=0
./bin/example_dnn_yolo_detector --model=onnx/models/yolov8n.onnx --yolo=yolov8 --mean=0.0 --scale=0.003921568627 --paddingmode=2 --padvalue=144.0 --thr=0.5 --nms=0.4 --rgb=0
@endcode @endcode
For YOLOv10, follow these steps:
@code{.sh}
cd opencv_extra/testdata/dnn
python download_models.py yolov10
cd ..
export OPENCV_TEST_DATA_PATH=$(pwd)
cd <build directory of OpenCV>
./bin/example_dnn_yolo_detector --model=onnx/models/yolov10s.onnx --yolo=yolov10 --width=640 --height=480 --scale=0.003921568627 --padvalue=114
@endcode
This will run `YOLOv10` detector on first camera found on your system. If you want to run it on a image/video file, you can use `--input` option to specify the path to the file.
VIDEO DEMO: VIDEO DEMO:
@youtube{NHtRlndE2cg} @youtube{NHtRlndE2cg}
@ -238,30 +222,30 @@ module this is also quite easy to achieve. Below we will outline the sample impl
- Import required libraries - Import required libraries
@snippet samples/dnn/yolo_detector.cpp includes @snippet samples/dnn/object_detection.cpp includes
- Read ONNX graph and create neural network model: - Read ONNX graph and create neural network model:
@snippet samples/dnn/yolo_detector.cpp read_net @snippet samples/dnn/object_detection.cpp read_net
- Read image and pre-process it: - Read image and pre-process it:
@snippet samples/dnn/yolo_detector.cpp preprocess_params @snippet samples/dnn/object_detection.cpp preprocess_params
@snippet samples/dnn/yolo_detector.cpp preprocess_call @snippet samples/dnn/object_detection.cpp preprocess_call
@snippet samples/dnn/yolo_detector.cpp preprocess_call_func @snippet samples/dnn/object_detection.cpp preprocess_call_func
- Inference: - Inference:
@snippet samples/dnn/yolo_detector.cpp forward_buffers @snippet samples/dnn/object_detection.cpp forward_buffers
@snippet samples/dnn/yolo_detector.cpp forward @snippet samples/dnn/object_detection.cpp forward
- Post-Processing - Post-Processing
All post-processing steps are implemented in function `yoloPostProcess`. Please pay attention, All post-processing steps are implemented in function `yoloPostProcess`. Please pay attention,
that NMS step is not included into onnx graph. Sample uses OpenCV function for it. that NMS step is not included into onnx graph. Sample uses OpenCV function for it.
@snippet samples/dnn/yolo_detector.cpp postprocess @snippet samples/dnn/object_detection.cpp postprocess
- Draw predicted boxes - Draw predicted boxes
@snippet samples/dnn/yolo_detector.cpp draw_boxes @snippet samples/dnn/object_detection.cpp draw_boxes

View File

@ -12,6 +12,8 @@ std::string findFile(const std::string& filename);
std::string findModel(const std::string& filename, const std::string& sha1); std::string findModel(const std::string& filename, const std::string& sha1);
std::vector<std::string> findAliases(std::string& zooFile, const std::string& sampleType);
inline int getBackendID(const String& backend) { inline int getBackendID(const String& backend) {
std::map<String, int> backendIDs = { std::map<String, int> backendIDs = {
{"default", cv::dnn::DNN_BACKEND_DEFAULT}, {"default", cv::dnn::DNN_BACKEND_DEFAULT},
@ -177,8 +179,33 @@ std::string genPreprocArguments(const std::string& modelName, const std::string&
modelName, zooFile)+ modelName, zooFile)+
genArgument(prefix + "labels", "Path to a text file with names of classes to label detected objects.", genArgument(prefix + "labels", "Path to a text file with names of classes to label detected objects.",
modelName, zooFile)+ modelName, zooFile)+
genArgument(prefix + "postprocessing", "Indicate the postprocessing type of model i.e. yolov8, yolonas, etc.",
modelName, zooFile)+
genArgument(prefix + "sha1", "Optional path to hashsum of downloaded model to be loaded from models.yml", genArgument(prefix + "sha1", "Optional path to hashsum of downloaded model to be loaded from models.yml",
modelName, zooFile)+ modelName, zooFile)+
genArgument(prefix + "download_sha", "Optional path to hashsum of downloaded model to be loaded from models.yml", genArgument(prefix + "download_sha", "Optional path to hashsum of downloaded model to be loaded from models.yml",
modelName, zooFile); modelName, zooFile);
}
std::vector<std::string> findAliases(std::string& zooFile, const std::string& sampleType) {
std::vector<std::string> aliases;
zooFile = findFile(zooFile);
cv::FileStorage fs(zooFile, cv::FileStorage::READ);
cv::FileNode root = fs.root();
for (const auto& node : root) {
std::string alias = node.name();
cv::FileNode sampleNode = node["sample"];
if (!sampleNode.empty() && sampleNode.isString()) {
std::string sampleValue = (std::string)sampleNode;
if (sampleValue == sampleType) {
aliases.push_back(alias);
}
}
}
return aliases;
} }

View File

@ -16,9 +16,9 @@ yolov8x:
width: 640 width: 640
height: 640 height: 640
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 postprocessing: "yolov8"
sample: "yolo_detector" sample: "object_detection"
yolov8s: yolov8s:
load_info: load_info:
@ -30,11 +30,11 @@ yolov8s:
width: 640 width: 640
height: 640 height: 640
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 postprocessing: "yolov8"
sample: "yolo_detector" sample: "object_detection"
yolov8n: yolov8:
load_info: load_info:
url: "https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.1.0/yolov8n.onnx" url: "https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.1.0/yolov8n.onnx"
sha1: "68f864475d06e2ec4037181052739f268eeac38d" sha1: "68f864475d06e2ec4037181052739f268eeac38d"
@ -44,9 +44,9 @@ yolov8n:
width: 640 width: 640
height: 640 height: 640
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 postprocessing: "yolov8"
sample: "yolo_detector" sample: "object_detection"
yolov8m: yolov8m:
load_info: load_info:
@ -58,9 +58,9 @@ yolov8m:
width: 640 width: 640
height: 640 height: 640
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 postprocessing: "yolov8"
sample: "yolo_detector" sample: "object_detection"
yolov8l: yolov8l:
load_info: load_info:
@ -72,8 +72,8 @@ yolov8l:
width: 640 width: 640
height: 640 height: 640
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 postprocessing: "yolov8"
sample: "yolo_detector" sample: "yolo_detector"
# YOLO4 object detection family from Darknet (https://github.com/AlexeyAB/darknet) # YOLO4 object detection family from Darknet (https://github.com/AlexeyAB/darknet)
@ -90,7 +90,7 @@ yolov4:
width: 416 width: 416
height: 416 height: 416
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 background_label_id: 0
sample: "object_detection" sample: "object_detection"
@ -105,7 +105,7 @@ yolov4-tiny:
width: 416 width: 416
height: 416 height: 416
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 background_label_id: 0
sample: "object_detection" sample: "object_detection"
@ -120,7 +120,7 @@ yolov3:
width: 416 width: 416
height: 416 height: 416
rgb: true rgb: true
classes: "object_detection_classes_yolo.txt" labels: "object_detection_classes_yolo.txt"
background_label_id: 0 background_label_id: 0
sample: "object_detection" sample: "object_detection"
@ -135,24 +135,10 @@ tiny-yolo-voc:
width: 416 width: 416
height: 416 height: 416
rgb: true rgb: true
classes: "object_detection_classes_pascal_voc.txt" labels: "object_detection_classes_pascal_voc.txt"
background_label_id: 0 background_label_id: 0
sample: "object_detection" sample: "object_detection"
yolov8:
load_info:
url: "https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.1.0/yolov8n.onnx"
sha1: "68f864475d06e2ec4037181052739f268eeac38d"
model: "yolov8n.onnx"
mean: [0, 0, 0]
scale: 0.00392
width: 640
height: 640
rgb: true
postprocessing: "yolov8"
classes: "object_detection_classes_yolo.txt"
sample: "object_detection"
# Caffe implementation of SSD model from https://github.com/chuanqi305/MobileNet-SSD # Caffe implementation of SSD model from https://github.com/chuanqi305/MobileNet-SSD
ssd_caffe: ssd_caffe:
load_info: load_info:
@ -165,7 +151,7 @@ ssd_caffe:
width: 300 width: 300
height: 300 height: 300
rgb: false rgb: false
classes: "object_detection_classes_pascal_voc.txt" labels: "object_detection_classes_pascal_voc.txt"
sample: "object_detection" sample: "object_detection"
# TensorFlow implementation of SSD model from https://github.com/tensorflow/models/tree/master/research/object_detection # TensorFlow implementation of SSD model from https://github.com/tensorflow/models/tree/master/research/object_detection
@ -183,7 +169,7 @@ ssd_tf:
width: 300 width: 300
height: 300 height: 300
rgb: true rgb: true
classes: "object_detection_classes_coco.txt" labels: "object_detection_classes_coco.txt"
sample: "object_detection" sample: "object_detection"
# TensorFlow implementation of Faster-RCNN model from https://github.com/tensorflow/models/tree/master/research/object_detection # TensorFlow implementation of Faster-RCNN model from https://github.com/tensorflow/models/tree/master/research/object_detection

View File

@ -1,68 +1,114 @@
//![includes]
#include <fstream> #include <fstream>
#include <sstream> #include <sstream>
#include <opencv2/dnn.hpp> #include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp> #include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui.hpp> #include <opencv2/highgui.hpp>
#if defined(HAVE_THREADS)
#define USE_THREADS 1
#endif
#ifdef USE_THREADS
#include <mutex> #include <mutex>
#include <thread> #include <thread>
#include <queue> #include <queue>
#endif
#include "iostream"
#include "common.hpp" #include "common.hpp"
//![includes]
std::string param_keys =
"{ help h | | Print help message. }"
"{ @alias | | An alias name of model to extract preprocessing parameters from models.yml file. }"
"{ zoo | models.yml | An optional path to file with preprocessing parameters }"
"{ device | 0 | camera device number. }"
"{ input i | | Path to input image or video file. Skip this argument to capture frames from a camera. }"
"{ framework f | | Optional name of an origin framework of the model. Detect it automatically if it does not set. }"
"{ classes | | Optional path to a text file with names of classes to label detected objects. }"
"{ thr | .5 | Confidence threshold. }"
"{ nms | .4 | Non-maximum suppression threshold. }"
"{ async | 0 | Number of asynchronous forwards at the same time. "
"Choose 0 for synchronous mode }";
std::string backend_keys = cv::format(
"{ backend | 0 | Choose one of computation backends: "
"%d: automatically (by default), "
"%d: Intel's Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit), "
"%d: OpenCV implementation, "
"%d: VKCOM, "
"%d: CUDA }", cv::dnn::DNN_BACKEND_DEFAULT, cv::dnn::DNN_BACKEND_INFERENCE_ENGINE, cv::dnn::DNN_BACKEND_OPENCV, cv::dnn::DNN_BACKEND_VKCOM, cv::dnn::DNN_BACKEND_CUDA);
std::string target_keys = cv::format(
"{ target | 0 | Choose one of target computation devices: "
"%d: CPU target (by default), "
"%d: OpenCL, "
"%d: OpenCL fp16 (half-float precision), "
"%d: VPU, "
"%d: Vulkan, "
"%d: CUDA, "
"%d: CUDA fp16 (half-float preprocess) }", cv::dnn::DNN_TARGET_CPU, cv::dnn::DNN_TARGET_OPENCL, cv::dnn::DNN_TARGET_OPENCL_FP16, cv::dnn::DNN_TARGET_MYRIAD, cv::dnn::DNN_TARGET_VULKAN, cv::dnn::DNN_TARGET_CUDA, cv::dnn::DNN_TARGET_CUDA_FP16);
std::string keys = param_keys + backend_keys + target_keys;
using namespace cv; using namespace cv;
using namespace dnn; using namespace dnn;
using namespace std;
float confThreshold, nmsThreshold; const string about =
std::vector<std::string> classes; "Firstly, download required models using `download_models.py` (if not already done). Set environment variable OPENCV_DOWNLOAD_CACHE_DIR to specify where models should be downloaded. Also, point OPENCV_SAMPLES_DATA_PATH to opencv/samples/data.\n"
"To run:\n"
"\t ./example_dnn_object_detection model_name --input=path/to/your/input/image/or/video (don't give --input flag if want to use device camera)\n"
"Sample command:\n"
"\t ./example_dnn_object_detection yolov8 --input=$OPENCV_SAMPLES_DATA_PATH/baboon.jpg\n"
inline void preprocess(const Mat& frame, Net& net, Size inpSize, float scale, "Model path can also be specified using --model argument. ";
const Scalar& mean, bool swapRB);
void postprocess(Mat& frame, const std::vector<Mat>& out, Net& net, int backend); const string param_keys =
"{ help h | | Print help message. }"
"{ @alias | | An alias name of model to extract preprocessing parameters from models.yml file. }"
"{ zoo | ../dnn/models.yml | An optional path to file with preprocessing parameters }"
"{ device | 0 | camera device number. }"
"{ input i | | Path to input image or video file. Skip this argument to capture frames from a camera. }"
"{ thr | .5 | Confidence threshold. }"
"{ nms | .4 | Non-maximum suppression threshold. }"
"{ async | 0 | Number of asynchronous forwards at the same time. "
"Choose 0 for synchronous mode }"
"{ padvalue | 114.0 | padding value. }"
"{ paddingmode | 2 | Choose one of padding modes: "
"0: resize to required input size without extra processing, "
"1: Image will be cropped after resize, "
"2: Resize image to the desired size while preserving the aspect ratio of original image }";
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame); const string backend_keys = format(
"{ backend | default | Choose one of computation backends: "
"default: automatically (by default), "
"openvino: Intel's Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit), "
"opencv: OpenCV implementation, "
"vkcom: VKCOM, "
"cuda: CUDA, "
"webnn: WebNN }");
void callback(int pos, void* userdata); const string target_keys = format(
"{ target | cpu | Choose one of target computation devices: "
"cpu: CPU target (by default), "
"opencl: OpenCL, "
"opencl_fp16: OpenCL fp16 (half-float precision), "
"vpu: VPU, "
"vulkan: Vulkan, "
"cuda: CUDA, "
"cuda_fp16: CUDA fp16 (half-float preprocess) }");
string keys = param_keys + backend_keys + target_keys;
float confThreshold, nmsThreshold, scale, paddingValue;
vector<string> labels;
Scalar meanv;
bool swapRB;
int inpWidth, inpHeight;
size_t asyncNumReq = 0;
ImagePaddingMode paddingMode;
string modelName, framework;
static void preprocess(const Mat& frame, Net& net, Size inpSize);
static void postprocess(Mat& frame, const vector<Mat>& outs, Net& net, int backend, vector<int>& classIds, vector<float>& confidences, vector<Rect>& boxes, const string yolo_name);
static void drawPred(vector<int>& classIds, vector<float>& confidences, vector<Rect>& boxes, Mat& frame, FontFace& sans, int stdSize, int stdWeight, int stdImgSize, int stdThickness);
static void callback(int pos, void* userdata);
static Scalar getColor(int classId);
static void yoloPostProcessing(
const vector<Mat>& outs,
vector<int>& keep_classIds,
vector<float>& keep_confidences,
vector<Rect2d>& keep_boxes,
float conf_threshold,
float iou_threshold,
const string& yolo_name);
static void printAliases(string& zooFile){
vector<string> aliases = findAliases(zooFile, "object_detection");
cout<<"Alias choices: [ ";
for (auto it: aliases){
cout<<"'"<<it<<"' ";
}
cout<<"]"<<endl;
}
static Scalar getTextColor(Scalar bgColor) {
double luminance = 0.299 * bgColor[2] + 0.587 * bgColor[1] + 0.114 * bgColor[0];
return luminance > 128 ? Scalar(0, 0, 0) : Scalar(255, 255, 255);
}
#ifdef USE_THREADS
template <typename T> template <typename T>
class QueueFPS : public std::queue<T> class QueueFPS : public std::queue<T>
{ {
@ -112,233 +158,362 @@ private:
TickMeter tm; TickMeter tm;
std::mutex mutex; std::mutex mutex;
}; };
#endif // USE_THREADS
int main(int argc, char** argv) int main(int argc, char** argv)
{ {
CommandLineParser parser(argc, argv, keys); CommandLineParser parser(argc, argv, keys);
const std::string modelName = parser.get<String>("@alias"); string zooFile = parser.get<String>("zoo");
const std::string zooFile = parser.get<String>("zoo"); if (!parser.has("@alias") || parser.has("help"))
{
cout << about << endl;
parser.printMessage();
printAliases(zooFile);
return -1;
}
zooFile = findFile(zooFile);
modelName = parser.get<String>("@alias");
keys += genPreprocArguments(modelName, zooFile); keys += genPreprocArguments(modelName, zooFile);
parser = CommandLineParser(argc, argv, keys); parser = CommandLineParser(argc, argv, keys);
parser.about("Use this script to run object detection deep learning networks using OpenCV.");
if (argc == 1 || parser.has("help")) if (!parser.has("model"))
{ {
parser.printMessage(); cout << "Path to model is not provided in command line or model alias is not correct" << endl;
return 0; printAliases(zooFile);
return -1;
} }
confThreshold = parser.get<float>("thr"); confThreshold = parser.get<float>("thr");
nmsThreshold = parser.get<float>("nms"); nmsThreshold = parser.get<float>("nms");
float scale = parser.get<float>("scale"); //![preprocess_params]
Scalar mean = parser.get<Scalar>("mean"); scale = parser.get<float>("scale");
bool swapRB = parser.get<bool>("rgb"); meanv = parser.get<Scalar>("mean");
int inpWidth = parser.get<int>("width"); swapRB = parser.get<bool>("rgb");
int inpHeight = parser.get<int>("height"); inpWidth = parser.get<int>("width");
size_t asyncNumReq = parser.get<int>("async"); inpHeight = parser.get<int>("height");
CV_Assert(parser.has("model")); int async = parser.get<int>("async");
std::string modelPath = findFile(parser.get<String>("model")); paddingValue = parser.get<float>("padvalue");
std::string configPath = findFile(parser.get<String>("config")); const string yolo_name = parser.get<String>("postprocessing");
paddingMode = static_cast<ImagePaddingMode>(parser.get<int>("paddingmode"));
//![preprocess_params]
String sha1 = parser.get<String>("sha1");
const string modelPath = findModel(parser.get<String>("model"), sha1);
const string configPath = findFile(parser.get<String>("config"));
framework = modelPath.substr(modelPath.rfind('.') + 1);
// Open file with classes names. if (parser.has("labels"))
if (parser.has("classes"))
{ {
std::string file = parser.get<String>("classes"); const string file = findFile(parser.get<String>("labels"));
std::ifstream ifs(file.c_str()); ifstream ifs(file.c_str());
if (!ifs.is_open()) if (!ifs.is_open())
CV_Error(Error::StsError, "File " + file + " not found"); CV_Error(Error::StsError, "File " + file + " not found");
std::string line; string line;
while (std::getline(ifs, line)) while (getline(ifs, line))
{ {
classes.push_back(line); labels.push_back(line);
} }
} }
//![read_net]
// Load a model. Net net = readNet(modelPath, configPath);
Net net = readNet(modelPath, configPath, parser.get<String>("framework")); int backend = getBackendID(parser.get<String>("backend"));
int backend = parser.get<int>("backend");
net.setPreferableBackend(backend); net.setPreferableBackend(backend);
net.setPreferableTarget(parser.get<int>("target")); net.setPreferableTarget(getTargetID(parser.get<String>("target")));
std::vector<String> outNames = net.getUnconnectedOutLayersNames(); //![read_net]
// Create a window // Create a window
static const std::string kWinName = "Deep learning object detection in OpenCV"; static const string kWinName = "Deep learning object detection in OpenCV";
namedWindow(kWinName, WINDOW_NORMAL); namedWindow(kWinName, WINDOW_AUTOSIZE);
int initialConf = (int)(confThreshold * 100); int initialConf = (int)(confThreshold * 100);
createTrackbar("Confidence threshold, %", kWinName, &initialConf, 99, callback); createTrackbar("Confidence threshold, %", kWinName, &initialConf, 99, callback, &net);
// Open a video file or an image file or a camera stream. // Open a video file or an image file or a camera stream.
VideoCapture cap; VideoCapture cap;
if (parser.has("input")) bool openSuccess = parser.has("input") ? cap.open(parser.get<String>("input")) : cap.open(parser.get<int>("device"));
cap.open(parser.get<String>("input")); if (!openSuccess){
else cout << "Could not open input file or camera device" << endl;
cap.open(parser.get<int>("device")); return 0;
}
#ifdef USE_THREADS FontFace sans("sans");
bool process = true;
// Frames capturing thread int stdSize = 15;
QueueFPS<Mat> framesQueue; int stdWeight = 150;
std::thread framesThread([&](){ int stdImgSize = 512;
Mat frame; int stdThickness = 2;
while (process) vector<int> classIds;
{ vector<float> confidences;
cap >> frame; vector<Rect> boxes;
if (!frame.empty())
framesQueue.push(frame.clone());
else
break;
}
});
// Frames processing thread if (async > 0 && backend == DNN_BACKEND_INFERENCE_ENGINE){
QueueFPS<Mat> processedFramesQueue; asyncNumReq = async;
QueueFPS<std::vector<Mat> > predictionsQueue; }
std::thread processingThread([&](){
std::queue<AsyncArray> futureOutputs; if (async != 0) {
Mat blob; // Threading is enabled
while (process) bool process = true;
{
// Get a next frame // Frames capturing thread
QueueFPS<Mat> framesQueue;
std::thread framesThread([&]() {
Mat frame; Mat frame;
{ while (process) {
if (!framesQueue.empty()) cap >> frame;
{ if (!frame.empty())
frame = framesQueue.get(); framesQueue.push(frame.clone());
if (asyncNumReq)
{
if (futureOutputs.size() == asyncNumReq)
frame = Mat();
}
else
framesQueue.clear(); // Skip the rest of frames
}
}
// Process the frame
if (!frame.empty())
{
preprocess(frame, net, Size(inpWidth, inpHeight), scale, mean, swapRB);
processedFramesQueue.push(frame);
if (asyncNumReq)
{
futureOutputs.push(net.forwardAsync());
}
else else
break;
}
});
// Frames processing thread
QueueFPS<Mat> processedFramesQueue;
QueueFPS<std::vector<Mat>> predictionsQueue;
std::thread processingThread([&]() {
std::queue<AsyncArray> futureOutputs;
Mat blob;
while (process) {
// Get the next frame
Mat frame;
{ {
std::vector<Mat> outs; if (!framesQueue.empty()) {
net.forward(outs, outNames); frame = framesQueue.get();
predictionsQueue.push(outs); if (asyncNumReq) {
if (futureOutputs.size() == asyncNumReq)
frame = Mat();
}
}
}
// Process the frame
if (!frame.empty()) {
preprocess(frame, net, Size(inpWidth, inpHeight));
processedFramesQueue.push(frame);
if (asyncNumReq) {
futureOutputs.push(net.forwardAsync());
} else {
vector<Mat> outs;
net.forward(outs, net.getUnconnectedOutLayersNames());
predictionsQueue.push(outs);
}
}
while (!futureOutputs.empty() &&
futureOutputs.front().wait_for(std::chrono::seconds(0))) {
AsyncArray async_out = futureOutputs.front();
futureOutputs.pop();
Mat out;
async_out.get(out);
predictionsQueue.push({out});
} }
} }
});
while (!futureOutputs.empty() && // Postprocessing and rendering loop
futureOutputs.front().wait_for(std::chrono::seconds(0))) while (waitKey(100) < 0) {
{ if (predictionsQueue.empty())
AsyncArray async_out = futureOutputs.front(); continue;
futureOutputs.pop();
Mat out; vector<Mat> outs = predictionsQueue.get();
async_out.get(out); Mat frame = processedFramesQueue.get();
predictionsQueue.push({out});
classIds.clear();
confidences.clear();
boxes.clear();
postprocess(frame, outs, net, backend, classIds, confidences, boxes, yolo_name);
drawPred(classIds, confidences, boxes, frame, sans, stdSize, stdWeight, stdImgSize, stdThickness);
int imgWidth = max(frame.rows, frame.cols);
int size = static_cast<int>((stdSize * imgWidth) / (stdImgSize * 1.5));
int weight = static_cast<int>((stdWeight * imgWidth) / (stdImgSize * 1.5));
if (predictionsQueue.counter > 1) {
string label = format("Camera: %.2f FPS", framesQueue.getFPS());
rectangle(frame, Point(0, 0), Point(10 * size, 3 * size + size / 4), Scalar::all(255), FILLED);
putText(frame, label, Point(0, size), Scalar::all(0), sans, size, weight);
label = format("Network: %.2f FPS", predictionsQueue.getFPS());
putText(frame, label, Point(0, 2 * size), Scalar::all(0), sans, size, weight);
label = format("Skipped frames: %d", framesQueue.counter - predictionsQueue.counter);
putText(frame, label, Point(0, 3 * size), Scalar::all(0), sans, size, weight);
} }
imshow(kWinName, frame);
} }
});
// Postprocessing and rendering loop process = false;
while (waitKey(1) < 0) framesThread.join();
{ processingThread.join();
if (predictionsQueue.empty()) } else {
continue; if (asyncNumReq)
CV_Error(Error::StsNotImplemented, "Asynchronous forward is supported only with Inference Engine backend.");
// Threading is disabled, run synchronously
Mat frame, blob;
while (waitKey(100) < 0) {
cap >> frame;
if (frame.empty()) {
waitKey();
break;
}
preprocess(frame, net, Size(inpWidth, inpHeight));
std::vector<Mat> outs = predictionsQueue.get(); vector<Mat> outs;
Mat frame = processedFramesQueue.get(); net.forward(outs, net.getUnconnectedOutLayersNames());
postprocess(frame, outs, net, backend); classIds.clear();
confidences.clear();
boxes.clear();
if (predictionsQueue.counter > 1) postprocess(frame, outs, net, backend, classIds, confidences, boxes, yolo_name);
{
std::string label = format("Camera: %.2f FPS", framesQueue.getFPS());
putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
label = format("Network: %.2f FPS", predictionsQueue.getFPS()); drawPred(classIds, confidences, boxes, frame, sans, stdSize, stdWeight, stdImgSize, stdThickness);
putText(frame, label, Point(0, 30), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
label = format("Skipped frames: %d", framesQueue.counter - predictionsQueue.counter); vector<double> layersTimes;
putText(frame, label, Point(0, 45), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0)); int imgWidth = max(frame.rows, frame.cols);
int size = static_cast<int>((stdSize * imgWidth) / (stdImgSize * 1.5));
int weight = static_cast<int>((stdWeight * imgWidth) / (stdImgSize * 1.5));
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
string label = format("Inference time: %.2f ms", t);
putText(frame, label, Point(0, size), Scalar(0, 255, 0), sans, size, weight);
imshow(kWinName, frame);
} }
imshow(kWinName, frame);
} }
process = false;
framesThread.join();
processingThread.join();
#else // USE_THREADS
if (asyncNumReq)
CV_Error(Error::StsNotImplemented, "Asynchronous forward is supported only with Inference Engine backend.");
// Process frames.
Mat frame, blob;
while (waitKey(1) < 0)
{
cap >> frame;
if (frame.empty())
{
waitKey();
break;
}
preprocess(frame, net, Size(inpWidth, inpHeight), scale, mean, swapRB);
std::vector<Mat> outs;
net.forward(outs, outNames);
postprocess(frame, outs, net, backend);
// Put efficiency information.
std::vector<double> layersTimes;
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
std::string label = format("Inference time: %.2f ms", t);
putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
imshow(kWinName, frame);
}
#endif // USE_THREADS
return 0; return 0;
} }
inline void preprocess(const Mat& frame, Net& net, Size inpSize, float scale, void preprocess(const Mat& frame, Net& net, Size inpSize)
const Scalar& mean, bool swapRB)
{ {
static Mat blob; Size size(inpSize.width <= 0 ? frame.cols : inpSize.width, inpSize.height <= 0 ? frame.rows : inpSize.height);
// Create a 4D blob from a frame.
if (inpSize.width <= 0) inpSize.width = frame.cols;
if (inpSize.height <= 0) inpSize.height = frame.rows;
blobFromImage(frame, blob, 1.0, inpSize, Scalar(), swapRB, false, CV_8U);
// Run a model. // Prepare the blob from the image
net.setInput(blob, "", scale, mean); Mat inp;
if (net.getLayer(0)->outputNameToIndex("im_info") != -1) // Faster-RCNN or R-FCN if(framework == "weights"){ // checks whether model is darknet
blobFromImage(frame, inp, scale, size, meanv, swapRB, false, CV_32F);
}
else{
//![preprocess_call]
Image2BlobParams imgParams(
scale,
size,
meanv,
swapRB,
CV_32F,
DNN_LAYOUT_NCHW,
paddingMode,
paddingValue);
inp = blobFromImageWithParams(frame, imgParams);
//![preprocess_call]
}
// Set the blob as the network input
net.setInput(inp);
// Check if the model is Faster-RCNN or R-FCN
if (net.getLayer(0)->outputNameToIndex("im_info") != -1)
{ {
resize(frame, frame, inpSize); // Resize the frame and prepare imInfo
Mat imInfo = (Mat_<float>(1, 3) << inpSize.height, inpSize.width, 1.6f); resize(frame, frame, size);
Mat imInfo = (Mat_<float>(1, 3) << size.height, size.width, 1.6f);
net.setInput(imInfo, "im_info"); net.setInput(imInfo, "im_info");
} }
} }
void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net, int backend) void yoloPostProcessing(
const vector<Mat>& outs,
vector<int>& keep_classIds,
vector<float>& keep_confidences,
vector<Rect2d>& keep_boxes,
float conf_threshold,
float iou_threshold,
const string& yolo_name)
{ {
static std::vector<int> outLayers = net.getUnconnectedOutLayers(); // Retrieve
static std::string outLayerType = net.getLayer(outLayers[0])->type; vector<int> classIds;
vector<float> confidences;
vector<Rect2d> boxes;
vector<Mat> outs_copy = outs;
if (yolo_name == "yolov8")
{
transposeND(outs_copy[0], {0, 2, 1}, outs_copy[0]);
}
if (yolo_name == "yolonas")
{
// outs contains 2 elements of shape [1, 8400, 80] and [1, 8400, 4]. Concat them to get [1, 8400, 84]
Mat concat_out;
// squeeze the first dimension
outs_copy[0] = outs_copy[0].reshape(1, outs_copy[0].size[1]);
outs_copy[1] = outs_copy[1].reshape(1, outs_copy[1].size[1]);
hconcat(outs_copy[1], outs_copy[0], concat_out);
outs_copy[0] = concat_out;
// remove the second element
outs_copy.pop_back();
// unsqueeze the first dimension
outs_copy[0] = outs_copy[0].reshape(0, vector<int>{1, 8400, 84});
}
for (auto preds : outs_copy)
{
preds = preds.reshape(1, preds.size[1]); // [1, 8400, 85] -> [8400, 85]
for (int i = 0; i < preds.rows; ++i)
{
// filter out non-object
float obj_conf = (yolo_name == "yolov8" || yolo_name == "yolonas") ? 1.0f : preds.at<float>(i, 4);
if (obj_conf < conf_threshold)
continue;
Mat scores = preds.row(i).colRange((yolo_name == "yolov8" || yolo_name == "yolonas") ? 4 : 5, preds.cols);
double conf;
Point maxLoc;
minMaxLoc(scores, 0, &conf, 0, &maxLoc);
conf = (yolo_name == "yolov8" || yolo_name == "yolonas") ? conf : conf * obj_conf;
if (conf < conf_threshold)
continue;
// get bbox coords
float* det = preds.ptr<float>(i);
double cx = det[0];
double cy = det[1];
double w = det[2];
double h = det[3];
// [x1, y1, x2, y2]
if (yolo_name == "yolonas") {
boxes.push_back(Rect2d(cx, cy, w, h));
} else {
boxes.push_back(Rect2d(cx - 0.5 * w, cy - 0.5 * h,
cx + 0.5 * w, cy + 0.5 * h));
}
classIds.push_back(maxLoc.x);
confidences.push_back(static_cast<float>(conf));
}
}
// NMS
vector<int> keep_idx;
NMSBoxes(boxes, confidences, conf_threshold, iou_threshold, keep_idx);
for (auto i : keep_idx)
{
keep_classIds.push_back(classIds[i]);
keep_confidences.push_back(confidences[i]);
keep_boxes.push_back(boxes[i]);
}
}
void postprocess(Mat& frame, const vector<Mat>& outs, Net& net, int backend, vector<int>& classIds, vector<float>& confidences, vector<Rect>& boxes, const string yolo_name)
{
static vector<int> outLayers = net.getUnconnectedOutLayers();
static string outLayerType = net.getLayer(outLayers[0])->type;
std::vector<int> classIds;
std::vector<float> confidences;
std::vector<Rect> boxes;
if (outLayerType == "DetectionOutput") if (outLayerType == "DetectionOutput")
{ {
// Network produces output blob with a shape 1x1xNx7 where N is a number of // Network produces output blob with a shape 1x1xNx7 where N is a number of
@ -405,14 +580,46 @@ void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net, int backend
} }
} }
} }
else else if (outLayerType == "Identity")
CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType); {
//![forward_buffers]
vector<int> keep_classIds;
vector<float> keep_confidences;
vector<Rect2d> keep_boxes;
//![forward_buffers]
// NMS is used inside Region layer only on DNN_BACKEND_OPENCV for another backends we need NMS in sample //![postprocess]
// or NMS is required if number of outputs > 1 yoloPostProcessing(outs, keep_classIds, keep_confidences, keep_boxes, confThreshold, nmsThreshold, yolo_name);
//![postprocess]
for (size_t i = 0; i < keep_classIds.size(); ++i)
{
classIds.push_back(keep_classIds[i]);
confidences.push_back(keep_confidences[i]);
Rect2d box = keep_boxes[i];
boxes.push_back(Rect(cvFloor(box.x), cvFloor(box.y), cvFloor(box.width-box.x), cvFloor(box.height-box.y)));
}
if (framework == "onnx"){
Image2BlobParams paramNet;
paramNet.scalefactor = scale;
paramNet.size = Size(inpWidth, inpHeight);
paramNet.mean = meanv;
paramNet.swapRB = swapRB;
paramNet.paddingmode = paddingMode;
paramNet.blobRectsToImageRects(boxes, boxes, frame.size());
}
}
else
{
CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);
}
// NMS is used inside Region layer only on DNN_BACKEND_OPENCV for other backends we need NMS in sample
// or NMS is required if the number of outputs > 1
if (outLayers.size() > 1 || (outLayerType == "Region" && backend != DNN_BACKEND_OPENCV)) if (outLayers.size() > 1 || (outLayerType == "Region" && backend != DNN_BACKEND_OPENCV))
{ {
std::map<int, std::vector<size_t> > class2indices; map<int, vector<size_t> > class2indices;
for (size_t i = 0; i < classIds.size(); i++) for (size_t i = 0; i < classIds.size(); i++)
{ {
if (confidences[i] >= confThreshold) if (confidences[i] >= confThreshold)
@ -420,20 +627,20 @@ void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net, int backend
class2indices[classIds[i]].push_back(i); class2indices[classIds[i]].push_back(i);
} }
} }
std::vector<Rect> nmsBoxes; vector<Rect> nmsBoxes;
std::vector<float> nmsConfidences; vector<float> nmsConfidences;
std::vector<int> nmsClassIds; vector<int> nmsClassIds;
for (std::map<int, std::vector<size_t> >::iterator it = class2indices.begin(); it != class2indices.end(); ++it) for (map<int, vector<size_t> >::iterator it = class2indices.begin(); it != class2indices.end(); ++it)
{ {
std::vector<Rect> localBoxes; vector<Rect> localBoxes;
std::vector<float> localConfidences; vector<float> localConfidences;
std::vector<size_t> classIndices = it->second; vector<size_t> classIndices = it->second;
for (size_t i = 0; i < classIndices.size(); i++) for (size_t i = 0; i < classIndices.size(); i++)
{ {
localBoxes.push_back(boxes[classIndices[i]]); localBoxes.push_back(boxes[classIndices[i]]);
localConfidences.push_back(confidences[classIndices[i]]); localConfidences.push_back(confidences[classIndices[i]]);
} }
std::vector<int> nmsIndices; vector<int> nmsIndices;
NMSBoxes(localBoxes, localConfidences, confThreshold, nmsThreshold, nmsIndices); NMSBoxes(localBoxes, localConfidences, confThreshold, nmsThreshold, nmsIndices);
for (size_t i = 0; i < nmsIndices.size(); i++) for (size_t i = 0; i < nmsIndices.size(); i++)
{ {
@ -447,36 +654,49 @@ void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net, int backend
classIds = nmsClassIds; classIds = nmsClassIds;
confidences = nmsConfidences; confidences = nmsConfidences;
} }
for (size_t idx = 0; idx < boxes.size(); ++idx)
{
Rect box = boxes[idx];
drawPred(classIds[idx], confidences[idx], box.x, box.y,
box.x + box.width, box.y + box.height, frame);
}
} }
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame) void drawPred(vector<int>& classIds, vector<float>& confidences, vector<Rect>& boxes, Mat& frame, FontFace& sans, int stdSize, int stdWeight, int stdImgSize, int stdThickness)
{ {
rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0)); int imgWidth = max(frame.rows, frame.cols);
int size = (stdSize*imgWidth)/stdImgSize;
int weight = (stdWeight*imgWidth)/stdImgSize;
int thickness = (stdThickness*imgWidth)/stdImgSize;
std::string label = format("%.2f", conf); for (size_t idx = 0; idx < boxes.size(); ++idx){
if (!classes.empty()) Scalar boxColor = getColor(classIds[idx]);
{ int left = boxes[idx].x;
CV_Assert(classId < (int)classes.size()); int top = boxes[idx].y;
label = classes[classId] + ": " + label; int right = boxes[idx].x + boxes[idx].width;
int bottom = boxes[idx].y + boxes[idx].height;
rectangle(frame, Point(left, top), Point(right, bottom), boxColor, thickness);
string label = format("%.2f", confidences[idx]);
if (!labels.empty())
{
CV_Assert(classIds[idx] < (int)labels.size());
label = labels[classIds[idx]] + ": " + label;
}
Rect r = getTextSize(Size(), label, Point(), sans, size, weight);
int baseline = r.y + r.height;
Size labelSize = Size(r.width, r.height + size/4 - baseline);
top = max(top-thickness/2, labelSize.height);
rectangle(frame, Point(left-thickness/2, top-(labelSize.height)),
Point(left + labelSize.width, top), boxColor, FILLED);
putText(frame, label, Point(left, top-size/4), getTextColor(boxColor), sans, size, weight);
} }
int baseLine;
Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
top = max(top, labelSize.height);
rectangle(frame, Point(left, top - labelSize.height),
Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar());
} }
void callback(int pos, void*) void callback(int pos, void*)
{ {
confThreshold = pos * 0.01f; confThreshold = pos * 0.01f;
} }
Scalar getColor(int classId) {
int r = min((classId >> 0 & 1) * 128 + (classId >> 3 & 1) * 64 + (classId >> 6 & 1) * 32 + 80, 255);
int g = min((classId >> 1 & 1) * 128 + (classId >> 4 & 1) * 64 + (classId >> 7 & 1) * 32 + 40, 255);
int b = min((classId >> 2 & 1) * 128 + (classId >> 5 & 1) * 64 + (classId >> 8 & 1) * 32 + 40, 255);
return Scalar(b, g, r);
}

View File

@ -12,10 +12,22 @@ from tf_text_graph_common import readTextMessage
from tf_text_graph_ssd import createSSDGraph from tf_text_graph_ssd import createSSDGraph
from tf_text_graph_faster_rcnn import createFasterRCNNGraph from tf_text_graph_faster_rcnn import createFasterRCNNGraph
backends = (cv.dnn.DNN_BACKEND_DEFAULT, cv.dnn.DNN_BACKEND_INFERENCE_ENGINE, cv.dnn.DNN_BACKEND_OPENCV, def help():
cv.dnn.DNN_BACKEND_VKCOM, cv.dnn.DNN_BACKEND_CUDA) print(
targets = (cv.dnn.DNN_TARGET_CPU, cv.dnn.DNN_TARGET_OPENCL, cv.dnn.DNN_TARGET_OPENCL_FP16, cv.dnn.DNN_TARGET_MYRIAD, cv.dnn.DNN_TARGET_HDDL, '''
cv.dnn.DNN_TARGET_VULKAN, cv.dnn.DNN_TARGET_CUDA, cv.dnn.DNN_TARGET_CUDA_FP16) Firstly, download required models using `download_models.py` (if not already done). Set environment variable OPENCV_DOWNLOAD_CACHE_DIR to specify where models should be downloaded. Also, point OPENCV_SAMPLES_DATA_PATH to opencv/samples/data.\n"\n
To run:
python object_detection.py model_name(e.g yolov8) --input=path/to/your/input/image/or/video (don't pass --input to use device camera)
Sample command:
python object_detection.py yolov8 --input=path/to/image
Model path can also be specified using --model argument
'''
)
backends = ("default", "openvino", "opencv", "vkcom", "cuda")
targets = ("cpu", "opencl", "opencl_fp16", "ncs2_vpu", "hddl_vpu", "vulkan", "cuda", "cuda_fp16")
parser = argparse.ArgumentParser(add_help=False) parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('--zoo', default=os.path.join(os.path.dirname(os.path.abspath(__file__)), 'models.yml'), parser.add_argument('--zoo', default=os.path.join(os.path.dirname(os.path.abspath(__file__)), 'models.yml'),
@ -30,27 +42,27 @@ parser.add_argument('--framework', choices=['caffe', 'tensorflow', 'darknet', 'd
'Detect it automatically if it does not set.') 'Detect it automatically if it does not set.')
parser.add_argument('--thr', type=float, default=0.5, help='Confidence threshold') parser.add_argument('--thr', type=float, default=0.5, help='Confidence threshold')
parser.add_argument('--nms', type=float, default=0.4, help='Non-maximum suppression threshold') parser.add_argument('--nms', type=float, default=0.4, help='Non-maximum suppression threshold')
parser.add_argument('--backend', choices=backends, default=cv.dnn.DNN_BACKEND_DEFAULT, type=int, parser.add_argument('--backend', default="default", type=str, choices=backends,
help="Choose one of computation backends: " help="Choose one of computation backends: "
"%d: automatically (by default), " "default: automatically (by default), "
"%d: Intel's Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit), " "openvino: Intel's Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit), "
"%d: OpenCV implementation, " "opencv: OpenCV implementation, "
"%d: VKCOM, " "vkcom: VKCOM, "
"%d: CUDA" % backends) "cuda: CUDA, "
parser.add_argument('--target', choices=targets, default=cv.dnn.DNN_TARGET_CPU, type=int, "webnn: WebNN")
help='Choose one of target computation devices: ' parser.add_argument('--target', default="cpu", type=str, choices=targets,
'%d: CPU target (by default), ' help="Choose one of target computation devices: "
'%d: OpenCL, ' "cpu: CPU target (by default), "
'%d: OpenCL fp16 (half-float precision), ' "opencl: OpenCL, "
'%d: NCS2 VPU, ' "opencl_fp16: OpenCL fp16 (half-float precision), "
'%d: HDDL VPU, ' "ncs2_vpu: NCS2 VPU, "
'%d: Vulkan, ' "hddl_vpu: HDDL VPU, "
'%d: CUDA, ' "vulkan: Vulkan, "
'%d: CUDA fp16 (half-float preprocess)' % targets) "cuda: CUDA, "
"cuda_fp16: CUDA fp16 (half-float preprocess)")
parser.add_argument('--async', type=int, default=0, parser.add_argument('--async', type=int, default=0,
dest='asyncN', dest='use_threads',
help='Number of asynchronous forwards at the same time. ' help='Choose 0 for synchronous mode and 1 for asynchronous mode')
'Choose 0 for synchronous mode')
args, _ = parser.parse_known_args() args, _ = parser.parse_known_args()
add_preproc_args(args.zoo, parser, 'object_detection') add_preproc_args(args.zoo, parser, 'object_detection')
parser = argparse.ArgumentParser(parents=[parser], parser = argparse.ArgumentParser(parents=[parser],
@ -58,9 +70,14 @@ parser = argparse.ArgumentParser(parents=[parser],
formatter_class=argparse.ArgumentDefaultsHelpFormatter) formatter_class=argparse.ArgumentDefaultsHelpFormatter)
args = parser.parse_args() args = parser.parse_args()
args.model = findFile(args.model) if args.alias is None or hasattr(args, 'help'):
args.config = findFile(args.config) help()
args.classes = findFile(args.classes) exit(1)
args.model = findModel(args.model, args.sha1)
if args.config is not None:
args.config = findFile(args.config)
args.labels = findFile(args.labels)
# If config specified, try to load it as TensorFlow Object Detection API's pipeline. # If config specified, try to load it as TensorFlow Object Detection API's pipeline.
config = readTextMessage(args.config) config = readTextMessage(args.config)
@ -77,40 +94,38 @@ if 'model' in config:
# Load names of classes # Load names of classes
classes = None labels = None
if args.classes: if args.labels:
with open(args.classes, 'rt') as f: with open(args.labels, 'rt') as f:
classes = f.read().rstrip('\n').split('\n') labels = f.read().rstrip('\n').split('\n')
# Load a network # Load a network
net = cv.dnn.readNet(args.model, args.config, args.framework) net = cv.dnn.readNet(args.model, args.config, args.framework)
net.setPreferableBackend(args.backend) net.setPreferableBackend(get_backend_id(args.backend))
net.setPreferableTarget(args.target) net.setPreferableTarget(get_target_id(args.target))
outNames = net.getUnconnectedOutLayersNames() outNames = net.getUnconnectedOutLayersNames()
confThreshold = args.thr confThreshold = args.thr
nmsThreshold = args.nms nmsThreshold = args.nms
stdSize = 0.8
stdWeight = 2
stdImgSize = 512
asyncN = 0
def get_color(class_id):
r = min((class_id >> 0 & 1) * 128 + (class_id >> 3 & 1) * 64 + (class_id >> 6 & 1) * 32 + 80, 255)
g = min((class_id >> 1 & 1) * 128 + (class_id >> 4 & 1) * 64 + (class_id >> 7 & 1) * 32 + 40, 255)
b = min((class_id >> 2 & 1) * 128 + (class_id >> 5 & 1) * 64 + (class_id >> 8 & 1) * 32 + 40, 255)
return (int(b), int(g), int(r))
def get_text_color(bg_color):
luminance = 0.299 * bg_color[2] + 0.587 * bg_color[1] + 0.114 * bg_color[0]
return (0, 0, 0) if luminance > 128 else (255, 255, 255)
def postprocess(frame, outs): def postprocess(frame, outs):
frameHeight = frame.shape[0] frameHeight = frame.shape[0]
frameWidth = frame.shape[1] frameWidth = frame.shape[1]
def drawPred(classId, conf, left, top, right, bottom):
# Draw a bounding box.
cv.rectangle(frame, (left, top), (right, bottom), (0, 255, 0))
label = '%.2f' % conf
# Print a label of class.
if classes:
assert(classId < len(classes))
label = '%s: %s' % (classes[classId], label)
labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)
top = max(top, labelSize[1])
cv.rectangle(frame, (left, top - labelSize[1]), (left + labelSize[0], top + baseLine), (255, 255, 255), cv.FILLED)
cv.putText(frame, label, (left, top), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0))
layerNames = net.getLayerNames() layerNames = net.getLayerNames()
lastLayerId = net.getLayerId(layerNames[-1]) lastLayerId = net.getLayerId(layerNames[-1])
lastLayer = net.getLayer(lastLayerId) lastLayer = net.getLayer(lastLayerId)
@ -194,17 +209,33 @@ def postprocess(frame, outs):
else: else:
indices = np.arange(0, len(classIds)) indices = np.arange(0, len(classIds))
return boxes, classIds, confidences, indices
def drawPred(classIds, confidences, boxes, indices, fontSize, fontThickness):
for i in indices: for i in indices:
box = boxes[i] box = boxes[i]
left = box[0] left = box[0]
top = box[1] top = box[1]
width = box[2] right = box[0] + box[2]
height = box[3] bottom = box[1] + box[3]
drawPred(classIds[i], confidences[i], left, top, left + width, top + height) bg_color = get_color(classIds[i])
cv.rectangle(frame, (left, top), (right, bottom), bg_color, fontThickness)
label = '%.2f' % confidences[i]
# Print a label of class.
if labels:
assert(classIds[i] < len(labels))
label = '%s: %s' % (labels[classIds[i]], label)
labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, fontSize, fontThickness)
top = max(top, labelSize[1])
cv.rectangle(frame, (int(left-fontThickness/2), top - labelSize[1]), (left + labelSize[0], top + baseLine), bg_color, cv.FILLED)
cv.putText(frame, label, (left, top-fontThickness), cv.FONT_HERSHEY_SIMPLEX, fontSize, get_text_color(bg_color), fontThickness)
# Process inputs # Process inputs
winName = 'Deep learning object detection in OpenCV' winName = 'Deep learning object detection in OpenCV'
cv.namedWindow(winName, cv.WINDOW_NORMAL) cv.namedWindow(winName, cv.WINDOW_AUTOSIZE)
def callback(pos): def callback(pos):
global confThreshold global confThreshold
@ -252,7 +283,7 @@ def framesThreadBody():
processedFramesQueue = queue.Queue() processedFramesQueue = queue.Queue()
predictionsQueue = QueueFPS() predictionsQueue = QueueFPS()
def processingThreadBody(): def processingThreadBody():
global processedFramesQueue, predictionsQueue, args, process global processedFramesQueue, predictionsQueue, args, process, asyncN
futureOutputs = [] futureOutputs = []
while process: while process:
@ -261,8 +292,8 @@ def processingThreadBody():
try: try:
frame = framesQueue.get_nowait() frame = framesQueue.get_nowait()
if args.asyncN: if asyncN:
if len(futureOutputs) == args.asyncN: if len(futureOutputs) == asyncN:
frame = None # Skip the frame frame = None # Skip the frame
else: else:
framesQueue.queue.clear() # Skip the rest of frames framesQueue.queue.clear() # Skip the rest of frames
@ -277,7 +308,7 @@ def processingThreadBody():
# Create a 4D blob from a frame. # Create a 4D blob from a frame.
inpWidth = args.width if args.width else frameWidth inpWidth = args.width if args.width else frameWidth
inpHeight = args.height if args.height else frameHeight inpHeight = args.height if args.height else frameHeight
blob = cv.dnn.blobFromImage(frame, size=(inpWidth, inpHeight), swapRB=args.rgb, ddepth=cv.CV_8U) blob = cv.dnn.blobFromImage(frame, size=(inpWidth, inpHeight), swapRB=args.rgb, ddepth=cv.CV_32F)
processedFramesQueue.put(frame) processedFramesQueue.put(frame)
# Run a model # Run a model
@ -286,7 +317,7 @@ def processingThreadBody():
frame = cv.resize(frame, (inpWidth, inpHeight)) frame = cv.resize(frame, (inpWidth, inpHeight))
net.setInput(np.array([[inpHeight, inpWidth, 1.6]], dtype=np.float32), 'im_info') net.setInput(np.array([[inpHeight, inpWidth, 1.6]], dtype=np.float32), 'im_info')
if args.asyncN: if asyncN:
futureOutputs.append(net.forwardAsync()) futureOutputs.append(net.forwardAsync())
else: else:
outs = net.forward(outNames) outs = net.forward(outNames)
@ -298,40 +329,68 @@ def processingThreadBody():
del futureOutputs[0] del futureOutputs[0]
if args.use_threads:
framesThread = Thread(target=framesThreadBody)
framesThread.start()
framesThread = Thread(target=framesThreadBody) processingThread = Thread(target=processingThreadBody)
framesThread.start() processingThread.start()
processingThread = Thread(target=processingThreadBody) #
processingThread.start() # Postprocessing and rendering loop
#
while cv.waitKey(1) < 0:
try:
# Request prediction first because they put after frames
outs = predictionsQueue.get_nowait()
frame = processedFramesQueue.get_nowait()
imgWidth = max(frame.shape[:2])
fontSize = (stdSize*imgWidth)/stdImgSize
fontThickness = max(1,(stdWeight*imgWidth)//stdImgSize)
# boxes, classIds, confidences, indices = postprocess(frame, outs)
# Postprocessing and rendering loop drawPred(classIds, confidences, boxes, indices, fontSize, fontThickness)
# fontSize = fontSize/2
while cv.waitKey(1) < 0: # Put efficiency information.
try: if predictionsQueue.counter > 1:
# Request prediction first because they put after frames label = 'Camera: %.2f FPS' % (framesQueue.getFPS())
outs = predictionsQueue.get_nowait() cv.rectangle(frame, (0, 0), (int(260*fontSize), int(80*fontSize)), (255,255,255), cv.FILLED)
frame = processedFramesQueue.get_nowait() cv.putText(frame, label, (0, int(25*fontSize)), cv.FONT_HERSHEY_SIMPLEX, fontSize, (0, 0, 0), fontThickness)
postprocess(frame, outs) label = 'Network: %.2f FPS' % (predictionsQueue.getFPS())
cv.putText(frame, label, (0, int(2*25*fontSize)), cv.FONT_HERSHEY_SIMPLEX, fontSize, (0, 0, 0), fontThickness)
# Put efficiency information. label = 'Skipped frames: %d' % (framesQueue.counter - predictionsQueue.counter)
if predictionsQueue.counter > 1: cv.putText(frame, label, (0, int(3*25*fontSize)), cv.FONT_HERSHEY_SIMPLEX, fontSize, (0, 0, 0), fontThickness)
label = 'Camera: %.2f FPS' % (framesQueue.getFPS())
cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))
label = 'Network: %.2f FPS' % (predictionsQueue.getFPS()) cv.imshow(winName, frame)
cv.putText(frame, label, (0, 30), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0)) except queue.Empty:
pass
label = 'Skipped frames: %d' % (framesQueue.counter - predictionsQueue.counter)
cv.putText(frame, label, (0, 45), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))
cv.imshow(winName, frame)
except queue.Empty:
pass
process = False process = False
framesThread.join() framesThread.join()
processingThread.join() processingThread.join()
else:
# Non-threaded processing if --async is 0
while cv.waitKey(1) < 0:
hasFrame, frame = cap.read()
if not hasFrame:
cv.waitKey()
break
frameHeight = frame.shape[0]
frameWidth = frame.shape[1]
inpWidth = args.width if args.width else frameWidth
inpHeight = args.height if args.height else frameHeight
blob = cv.dnn.blobFromImage(frame, size=(inpWidth, inpHeight), swapRB=args.rgb, ddepth=cv.CV_32F)
net.setInput(blob, scalefactor=args.scale, mean=args.mean)
outs = net.forward(outNames)
boxes, classIds, confidences, indices = postprocess(frame, outs)
drawPred(classIds, confidences, boxes, indices, (stdSize*max(frame.shape[:2]))/stdImgSize, (stdWeight*max(frame.shape[:2]))//stdImgSize)
cv.imshow(winName, frame)

View File

@ -1,382 +0,0 @@
/**
* @file yolo_detector.cpp
* @brief Yolo Object Detection Sample
* @author OpenCV team
*/
//![includes]
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <fstream>
#include <sstream>
#include "iostream"
#include "common.hpp"
#include <opencv2/highgui.hpp>
//![includes]
using namespace cv;
using namespace cv::dnn;
void getClasses(std::string classesFile);
void drawPrediction(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);
void yoloPostProcessing(
std::vector<Mat>& outs,
std::vector<int>& keep_classIds,
std::vector<float>& keep_confidences,
std::vector<Rect2d>& keep_boxes,
float conf_threshold,
float iou_threshold,
const std::string& model_name,
const int nc
);
std::vector<std::string> classes;
std::string keys =
"{ help h | | Print help message. }"
"{ device | 0 | camera device number. }"
"{ model | onnx/models/yolox_s_inf_decoder.onnx | Default model. }"
"{ yolo | yolox | yolo model version. }"
"{ input i | | Path to input image or video file. Skip this argument to capture frames from a camera. }"
"{ classes | | Optional path to a text file with names of classes to label detected objects. }"
"{ nc | 80 | Number of classes. Default is 80 (coming from COCO dataset). }"
"{ thr | .5 | Confidence threshold. }"
"{ nms | .4 | Non-maximum suppression threshold. }"
"{ mean | 0.0 | Normalization constant. }"
"{ scale | 1.0 | Preprocess input image by multiplying on a scale factor. }"
"{ width | 640 | Preprocess input image by resizing to a specific width. }"
"{ height | 640 | Preprocess input image by resizing to a specific height. }"
"{ rgb | 1 | Indicate that model works with RGB input images instead BGR ones. }"
"{ padvalue | 114.0 | padding value. }"
"{ paddingmode | 2 | Choose one of computation backends: "
"0: resize to required input size without extra processing, "
"1: Image will be cropped after resize, "
"2: Resize image to the desired size while preserving the aspect ratio of original image }"
"{ backend | 0 | Choose one of computation backends: "
"0: automatically (by default), "
"1: Halide language (http://halide-lang.org/), "
"2: Intel's Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit), "
"3: OpenCV implementation, "
"4: VKCOM, "
"5: CUDA }"
"{ target | 0 | Choose one of target computation devices: "
"0: CPU target (by default), "
"1: OpenCL, "
"2: OpenCL fp16 (half-float precision), "
"3: VPU, "
"4: Vulkan, "
"6: CUDA, "
"7: CUDA fp16 (half-float preprocess) }"
"{ async | 0 | Number of asynchronous forwards at the same time. "
"Choose 0 for synchronous mode }";
void getClasses(std::string classesFile)
{
std::ifstream ifs(classesFile.c_str());
if (!ifs.is_open())
CV_Error(Error::StsError, "File " + classesFile + " not found");
std::string line;
while (std::getline(ifs, line))
classes.push_back(line);
}
void drawPrediction(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));
std::string label = format("%.2f", conf);
if (!classes.empty())
{
CV_Assert(classId < (int)classes.size());
label = classes[classId] + ": " + label;
}
int baseLine;
Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
top = max(top, labelSize.height);
rectangle(frame, Point(left, top - labelSize.height),
Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar());
}
void yoloPostProcessing(
std::vector<Mat>& outs,
std::vector<int>& keep_classIds,
std::vector<float>& keep_confidences,
std::vector<Rect2d>& keep_boxes,
float conf_threshold,
float iou_threshold,
const std::string& model_name,
const int nc=80)
{
// Retrieve
std::vector<int> classIds;
std::vector<float> confidences;
std::vector<Rect2d> boxes;
if (model_name == "yolov8" || model_name == "yolov10" ||
model_name == "yolov9")
{
cv::transposeND(outs[0], {0, 2, 1}, outs[0]);
}
if (model_name == "yolonas")
{
// outs contains 2 elemets of shape [1, 8400, 80] and [1, 8400, 4]. Concat them to get [1, 8400, 84]
Mat concat_out;
// squeeze the first dimension
outs[0] = outs[0].reshape(1, outs[0].size[1]);
outs[1] = outs[1].reshape(1, outs[1].size[1]);
cv::hconcat(outs[1], outs[0], concat_out);
outs[0] = concat_out;
// remove the second element
outs.pop_back();
// unsqueeze the first dimension
outs[0] = outs[0].reshape(0, std::vector<int>{1, 8400, nc + 4});
}
// assert if last dim is 85 or 84
CV_CheckEQ(outs[0].dims, 3, "Invalid output shape. The shape should be [1, #anchors, 85 or 84]");
CV_CheckEQ((outs[0].size[2] == nc + 5 || outs[0].size[2] == 80 + 4), true, "Invalid output shape: ");
for (auto preds : outs)
{
preds = preds.reshape(1, preds.size[1]); // [1, 8400, 85] -> [8400, 85]
for (int i = 0; i < preds.rows; ++i)
{
// filter out non object
float obj_conf = (model_name == "yolov8" || model_name == "yolonas" ||
model_name == "yolov9" || model_name == "yolov10") ? 1.0f : preds.at<float>(i, 4) ;
if (obj_conf < conf_threshold)
continue;
Mat scores = preds.row(i).colRange((model_name == "yolov8" || model_name == "yolonas" || model_name == "yolov9" || model_name == "yolov10") ? 4 : 5, preds.cols);
double conf;
Point maxLoc;
minMaxLoc(scores, 0, &conf, 0, &maxLoc);
conf = (model_name == "yolov8" || model_name == "yolonas" || model_name == "yolov9" || model_name == "yolov10") ? conf : conf * obj_conf;
if (conf < conf_threshold)
continue;
// get bbox coords
float* det = preds.ptr<float>(i);
double cx = det[0];
double cy = det[1];
double w = det[2];
double h = det[3];
// [x1, y1, x2, y2]
if (model_name == "yolonas" || model_name == "yolov10"){
boxes.push_back(Rect2d(cx, cy, w, h));
} else {
boxes.push_back(Rect2d(cx - 0.5 * w, cy - 0.5 * h,
cx + 0.5 * w, cy + 0.5 * h));
}
classIds.push_back(maxLoc.x);
confidences.push_back(static_cast<float>(conf));
}
}
// NMS
std::vector<int> keep_idx;
NMSBoxes(boxes, confidences, conf_threshold, iou_threshold, keep_idx);
for (auto i : keep_idx)
{
keep_classIds.push_back(classIds[i]);
keep_confidences.push_back(confidences[i]);
keep_boxes.push_back(boxes[i]);
}
}
/**
* @function main
* @brief Main function
*/
int main(int argc, char** argv)
{
CommandLineParser parser(argc, argv, keys);
parser.about("Use this script to run object detection deep learning networks using OpenCV.");
if (parser.has("help"))
{
parser.printMessage();
return 0;
}
CV_Assert(parser.has("model"));
CV_Assert(parser.has("yolo"));
// if model is default, use findFile to get the full path otherwise use the given path
std::string weightPath = findFile(parser.get<String>("model"));
std::string yolo_model = parser.get<String>("yolo");
int nc = parser.get<int>("nc");
float confThreshold = parser.get<float>("thr");
float nmsThreshold = parser.get<float>("nms");
//![preprocess_params]
float paddingValue = parser.get<float>("padvalue");
bool swapRB = parser.get<bool>("rgb");
int inpWidth = parser.get<int>("width");
int inpHeight = parser.get<int>("height");
Scalar scale = parser.get<float>("scale");
Scalar mean = parser.get<Scalar>("mean");
ImagePaddingMode paddingMode = static_cast<ImagePaddingMode>(parser.get<int>("paddingmode"));
//![preprocess_params]
// check if yolo model is valid
if (yolo_model != "yolov5" && yolo_model != "yolov6"
&& yolo_model != "yolov7" && yolo_model != "yolov8"
&& yolo_model != "yolov10" && yolo_model !="yolov9"
&& yolo_model != "yolox" && yolo_model != "yolonas")
CV_Error(Error::StsError, "Invalid yolo model: " + yolo_model);
// get classes
if (parser.has("classes"))
{
getClasses(findFile(parser.get<String>("classes")));
}
// load model
//![read_net]
Net net = readNet(weightPath);
int backend = parser.get<int>("backend");
net.setPreferableBackend(backend);
net.setPreferableTarget(parser.get<int>("target"));
//![read_net]
VideoCapture cap;
Mat img;
bool isImage = false;
bool isCamera = false;
// Check if input is given
if (parser.has("input"))
{
String input = parser.get<String>("input");
// Check if the input is an image
if (input.find(".jpg") != String::npos || input.find(".png") != String::npos)
{
img = imread(findFile(input));
if (img.empty())
{
CV_Error(Error::StsError, "Cannot read image file: " + input);
}
isImage = true;
}
else
{
cap.open(input);
if (!cap.isOpened())
{
CV_Error(Error::StsError, "Cannot open video " + input);
}
isCamera = true;
}
}
else
{
int cameraIndex = parser.get<int>("device");
cap.open(cameraIndex);
if (!cap.isOpened())
{
CV_Error(Error::StsError, cv::format("Cannot open camera #%d", cameraIndex));
}
isCamera = true;
}
// image pre-processing
//![preprocess_call]
Size size(inpWidth, inpHeight);
Image2BlobParams imgParams(
scale,
size,
mean,
swapRB,
CV_32F,
DNN_LAYOUT_NCHW,
paddingMode,
paddingValue);
// rescale boxes back to original image
Image2BlobParams paramNet;
paramNet.scalefactor = scale;
paramNet.size = size;
paramNet.mean = mean;
paramNet.swapRB = swapRB;
paramNet.paddingmode = paddingMode;
//![preprocess_call]
//![forward_buffers]
std::vector<Mat> outs;
std::vector<int> keep_classIds;
std::vector<float> keep_confidences;
std::vector<Rect2d> keep_boxes;
std::vector<Rect> boxes;
//![forward_buffers]
Mat inp;
while (waitKey(1) < 0)
{
if (isCamera)
cap >> img;
if (img.empty())
{
std::cout << "Empty frame" << std::endl;
waitKey();
break;
}
//![preprocess_call_func]
inp = blobFromImageWithParams(img, imgParams);
//![preprocess_call_func]
//![forward]
net.setInput(inp);
net.forward(outs, net.getUnconnectedOutLayersNames());
//![forward]
//![postprocess]
yoloPostProcessing(
outs, keep_classIds, keep_confidences, keep_boxes,
confThreshold, nmsThreshold,
yolo_model,
nc);
//![postprocess]
// covert Rect2d to Rect
//![draw_boxes]
for (auto box : keep_boxes)
{
boxes.push_back(Rect(cvFloor(box.x), cvFloor(box.y), cvFloor(box.width - box.x), cvFloor(box.height - box.y)));
}
paramNet.blobRectsToImageRects(boxes, boxes, img.size());
for (size_t idx = 0; idx < boxes.size(); ++idx)
{
Rect box = boxes[idx];
drawPrediction(keep_classIds[idx], keep_confidences[idx], box.x, box.y,
box.width + box.x, box.height + box.y, img);
}
const std::string kWinName = "Yolo Object Detector";
namedWindow(kWinName, WINDOW_NORMAL);
imshow(kWinName, img);
//![draw_boxes]
outs.clear();
keep_classIds.clear();
keep_confidences.clear();
keep_boxes.clear();
boxes.clear();
if (isImage)
{
waitKey();
break;
}
}
}