opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-17 10:58:00 +08:00

Author	SHA1	Message	Date
Yuantao Feng	e3858cc5a3	Merge pull request #25147 from fengyuentau:dnn/elementwise_layers/speedup * added v_erf and implemented gelu acceleration via vectorization * remove anonymous v_erf and use v_erf from intrin_math * enable perf for ov and cuda backend	2024-07-08 14:24:36 +03:00
Abduragim Shtanchaev	efbc9f0b66	Merge pull request #25861 from Abdurrahheem:ash/torch-attention-export-fix-4x Merge pull request #25861 from Abdurrahheem:ash/torch-attention-export-fix-4x Support for Unflatten operation requred by Attention layer - 4.x #25861 ### Pull Request Readiness Checklist All test data and models for PR are located [#1190](https://github.com/opencv/opencv_extra/pull/1190) This PR fixes issue reised when importing batched vanilla `Attention` layer from `PyTorch` via ONNX. Currently batched version of `Attention` layer in PyTorch [has unflatten operation inside](`e3b3431c42/torch/nn/functional.py (L5500C17-L5500C31)`). `unflatten` operation causes issue in `reshape` layer (see the Reshape_2 in the graph below) due to incorrect output of `slice` layer. This PR particularly fixes `slice` and `concat` layers to handle `unflatten` operation. <img width="673" alt="image" src="https://github.com/opencv/opencv/assets/44877829/5b612b31-657a-47f1-83a4-0ac35a950abd"> See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-07-04 16:25:31 +03:00
Yuantao Feng	a7fd9446cf	Merge pull request #25630 from fengyuentau:nary-multi-thread dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630 This PR introduces the following changes: - [x] Parallelize binary forward impl - [x] Parallelize ternary forward impl (Where) - [x] Parallelize nary (Operator that can take >=1 operands) - [x] Enable conformance tests if workable ## Performance ### i7-12700K, RAM 64GB, Ubuntu 22.04 ``` Geometric mean (ms) Name of Test opencv opencv opencv perf perf perf core.x64.0606 core.x64.0606 core.x64.0606 vs opencv perf core.x64.0606 (x-factor) NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 16.116 11.161 1.44 NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 17.469 11.446 1.53 NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 17.531 11.469 1.53 NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 28.653 13.682 2.09 NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 21.899 13.422 1.63 NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 21.738 13.185 1.65 NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 16.172 11.473 1.41 NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 16.309 11.565 1.41 NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 16.166 11.454 1.41 NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 16.157 11.443 1.41 NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 163.459 15.234 10.73 NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 10.880 10.868 1.00 NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 10.947 11.058 0.99 NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 10.948 10.910 1.00 NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 10.874 10.871 1.00 NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 10.971 10.920 1.00 NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 17.546 11.462 1.53 NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 16.175 11.475 1.41 NHWC_C::Layer_NaryEltwise::OCV/CPU 11.339 11.333 1.00 NHWC_H::Layer_NaryEltwise::OCV/CPU 16.154 11.102 1.46 ``` ### Apple M1, RAM 16GB, macOS 14.4.1 ``` Geometric mean (ms) Name of Test opencv opencv opencv perf perf perf core.m1.0606 core.m1.0606.patch core.m1.0606.patch vs opencv perf core.m1.0606 (x-factor) NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 28.418 3.768 7.54 NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 6.942 5.679 1.22 NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 5.822 5.653 1.03 NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 5.751 5.628 1.02 NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 5.797 5.599 1.04 NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 7.272 5.578 1.30 NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 5.777 5.562 1.04 NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 5.819 5.559 1.05 NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 5.830 5.574 1.05 NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 5.759 5.567 1.03 NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 342.260 74.655 4.58 NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 8.338 8.280 1.01 NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 8.359 8.309 1.01 NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 8.412 8.295 1.01 NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 8.380 8.297 1.01 NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 8.356 8.323 1.00 NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 6.818 5.561 1.23 NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 5.805 5.570 1.04 NHWC_C::Layer_NaryEltwise::OCV/CPU 3.834 4.817 0.80 NHWC_H::Layer_NaryEltwise::OCV/CPU 28.402 3.771 7.53 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-07-03 10:09:05 +03:00
Wanli	6e1864e3fc	Merge pull request #24941 from WanliZhong:v_exp Add support for v_exp (exponential) #24941 This PR aims to implement `v_exp(v_float16 x)`, `v_exp(v_float32 x)` and `v_exp(v_float64 x)`. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-07-02 12:32:49 +03:00
Alexander Smorkalov	3d74d646d8	Fixed CuDNN runtime version check for CuDNN 9+.	2024-07-01 17:33:24 +03:00
Yuantao Feng	3f13ce797b	Merge pull request #25779 from fengyuentau:dnn/fix_onnx_depthtospace dnn: add DepthToSpace and SpaceToDepth #25779 We are working on updating WeChat QRCode module. One of the new models is a fully convolutional model and hence it should be able to run with different input shapes. However, it has an operator `DepthToSpace`, which is parsed as a subgraph of `Reshape -> Permute -> Reshape` with a fixed shape getting during parsing. The subgraph itself is not a problem, but the true problem is the subgraph with a fixed input and output shape regardless input changes. This does not allow the model to run with different input shapes. Solution is to add a dedicated layer for DepthtoSpace and SpaceToDepth. Backend support: - [x] CPU - [x] CUDA - [x] OpenCL - [x] OpenVINO - [x] CANN - [x] TIMVX - ~Vulkan~ (missing fundamental tools, like permutation and reshape) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-06-21 19:28:22 +03:00
Kumataro	1bd5ca1ebe	Merge pull request #25686 from Kumataro:fix25674 Suppress build warnings for GCC14 #25686 Close #25674 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-06-02 14:14:04 +03:00
CNOCycle	98b8825031	Merge pull request #25613 from CNOCycle:tflite/ops Support Global_Pool_2D ops in .tflite model #25613 ### Pull Request Readiness Checklist Merge with extra: https://github.com/opencv/opencv_extra/pull/1180 This PR adds support for `GlobalAveragePooling2D` and `GlobalMaxPool2D` on the TFlite backend. When the k`eep_dims` option is enabled, the output is a 2D tensor, necessitating the inclusion of an additional flatten layer. Additionally, the names of these layers have been updated to match the output tensor names generated by `generate.py` from the opencv_extra repository. - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [X] The feature is well documented and sample code can be built with the project CMake	2024-05-31 19:31:21 +03:00
Abduragim Shtanchaev	d7f04a9d33	Merge pull request #25660 from Abdurrahheem:ash/fix-slice-empty-input Slice layer parser fix to support empty input case #25660 This PR fixes Slice Layer's parser to handle empty input cases (cases with initializer) It fixed the issue rased in #24838 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-05-31 13:13:36 +03:00
Danial Javady	05e48605a0	Merge pull request #25412 from ZelboK:update-cudnn-to-9 Refactor DNN module to build with cudnn 9 #25412 A lot of APIs that are currently being used in the dnn module have been removed in cudnn 9. They were deprecated in 8. This PR updates said code accordingly to the newer API. Some key notes: 1) This is my first PR. I am new to openCV. 2) `opencv_test_core` tests pass 3) On a 3080, cuda 12.4(should be irrelevant since I didn't build the `opencv_modules`, gcc 11.4, WSL 2. 4) For brevity I will avoid including macro code that will allow for older versions of cudnn to build. I was unable to get the tests working for `opencv_test_dnn` and `opencv_perf_dnn`. The errors I get are of the following: ``` OpenCV tests: Can't find required data file: dnn/onnx/conformance/node/test_reduce_prod_default_axes_keepdims_example/model.onnx in function 'findData' " thrown in the test body. ``` So before I spend more time investigating I was hoping to get a maintainer to point me in the right direction here. I would like to run these tests and confirm things are working as intended. I may have missed some details. ### Pull Request Readiness Checklist relevant issue (https://github.com/opencv/opencv/issues/24983 - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-05-28 09:54:08 +03:00
Yuantao Feng	bc0618b688	Merge pull request #25582 from fengyuentau:dnn/dump_pbtxt Current net exporter `dump` and `dumpToFile` exports the network structure (and its params) to a .dot file which works with `graphviz`. This is hard to use and not friendly to new user. What's worse, the produced picture is not looking pretty. dnn: better net exporter that works with netron #25582 This PR introduces new exporter `dumpToPbtxt` and uses this new exporter by default with environment variable `OPENCV_DNN_NETWORK_DUMP`. It mimics the string output of a onnx model but modified with dnn-specific changes, see below for an example. ![image](https://github.com/opencv/opencv/assets/17219438/0644bed1-da71-4019-8466-88390698e4df) ## Usage Call `cv::dnn::Net::dumpToPbtxt`: ```cpp TEST(DumpNet, dumpToPbtxt) { std::string path = "/path/to/model.onnx"; auto net = readNet(path); Mat input(std::vector<int>{1, 3, 640, 480}, CV_32F); net.setInput(input); net.dumpToPbtxt("yunet.pbtxt"); } ``` Set `export OPENCV_DNN_NETWORK_DUMP=1` ```cpp TEST(DumpNet, env) { std::string path = "/path/to/model.onnx"; auto net = readNet(path); Mat input(std::vector<int>{1, 3, 640, 480}, CV_32F); net.setInput(input); net.forward(); } ``` --- Note: - `pbtxt` is registered as one of the ONNX model suffix in netron. So you can see `module: ai.onnx` and such in the model. - We can get the string output of an ONNX model with the following script ```python import onnx net = onnx.load("/path/to/model.onnx") net_str = str(net) file = open("/path/to/model.pbtxt", "w") file.write(net_str) file.close() ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-05-17 11:07:05 +03:00
CNOCycle	7713c84465	Merge pull request #25297 from CNOCycle:tflite/transpose Support Transpose op in TFlite #25297 Merge with extra: https://github.com/opencv/opencv_extra/pull/1168 The purpose of this PR is to introduce support for the Transpose op in TFlite format and to add a shape comparison between the output tensors and the references. In some occasional cases, the shape of the output tensor is `[1,4,1,1]`, while the shape of the reference tensor is `[1,4]`. Consequently, the norm check incorrectly reports that the test has passed, as the residual is zero. Below is a Python script for generating testing data. The generated data can be integrated into the repo `opencv_extra`. ```python import numpy as np import tensorflow as tf PREFIX_TFL = '/path/to/opencv_extra/testdata/dnn/tflite/' def generator(input_tensor, model, saved_name): # convert keras model to .tflite format converter = tf.lite.TFLiteConverter.from_keras_model(model) #converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.optimizations = [None] tflite_model = converter.convert() with open(f'{PREFIX_TFL}/{saved_name}.tflite', 'wb') as f: f.write(tflite_model) # save the input tensor to .npy if input_tensor.ndim == 4: opencv_tensor = np.transpose(input_tensor, (0,3,1,2)) else: opencv_tensor = input_tensor opencv_tensor = np.copy(opencv_tensor, order='C').astype(np.float32) np.save(f'{PREFIX_TFL}/{saved_name}_inp.npy', opencv_tensor) # generate output tenosr and save it to .npy mat_out = model(input_tensor).numpy() mat_out = np.copy(mat_out, order='C').astype(np.float32) if mat_out.ndim == 4: mat_out = np.transpose(mat_out, (0,3,1,2)) interpreter = tf.lite.Interpreter(model_content=tflite_model) out_name = interpreter.get_output_details()[0]['name'] np.save(f'{PREFIX_TFL}/{saved_name}_out_{out_name}.npy', mat_out) def build_transpose(): model_name = "keras_permute" mat_in = np.array([[[1,2,3], [4,5,6]]], dtype=np.float32) model = tf.keras.Sequential() model.add(tf.keras.Input(shape=(2,3))) model.add(tf.keras.layers.Permute((2,1))) model.summary() generator(mat_in, model, model_name) if __name__ == '__main__': build_transpose() ``` ### Pull Request Readiness Checklist - [x] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [X] The feature is well documented and sample code can be built with the project CMake	2024-05-15 20:07:25 +03:00
alexlyulkov	03507e06b4	Merge pull request #25518 from alexlyulkov:al/fixed-gemm-openvino Fixed OpenVINO gemm layer #25518 Fixed OpenVINO gemm layer The problem was that our layer didn't properly handle all the possible gemm options in OpenVINO mode Fixes #25472 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-05-14 17:41:19 +03:00
Alexander Smorkalov	ac9a858377	Merge pull request #25524 from alexlyulkov:al/openvino-layers Added more OpenVINO layers to dnn	2024-05-03 13:16:56 +03:00
Wanli	ed47cce1c5	change fcn8s-heavy-pascal tests from caffe to onnx	2024-05-03 00:15:09 +08:00
Alexander Lyulkov	f3f29fa62c	Added more OpenVINO layers to dnn	2024-05-02 14:37:40 +03:00
alexlyulkov	f9dd20eb07	Merge pull request #25414 from alexlyulkov:al/range-fixed Fixed ONNX range layer #25414 Partially address https://github.com/opencv/opencv/issues/25363 Fixed ONNX range layer. It should support any input type. Added tests (extra [PR](https://github.com/opencv/opencv_extra/pull/1170)) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-04-17 09:38:21 +03:00
Yuantao Feng	197626a5bf	Merge pull request #25387 from fengyuentau:complete-float16_t-renaming Rename remaining float16_t for future proof #25387 Resolves comment: https://github.com/opencv/opencv/pull/25217#discussion_r1547733187. `std::float16_t` and `std::bfloat16_t` are introduced since c++23: https://en.cppreference.com/w/cpp/types/floating-point. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-04-11 14:02:44 +03:00
Liutong HAN	5be158a2b6	Further optimize fastDepthwiseConv for RVV.	2024-04-07 11:34:41 +08:00
Yuantao Feng	55d7e3f8cc	Merge pull request #1165 from fengyuentau:gold_yolo [BugFix] dnn (ONNX): Foce dropping constant inputs in parseClip if they are shared #25319 Resolves https://github.com/opencv/opencv/issues/25278 Merge with https://github.com/opencv/opencv_extra/pull/1165 In Gold-YOLO ,`Div` has a constant input `B=6` which is then parsed into a `Const` layer in the ONNX importer, but `Clip` also has the shared constant input `max=6` which is already a `Const` layer and then connected to `Elementwise` layer. This should not happen because in the `forward()` of `Elementwise` layer, the legacy code goes through and apply activation to each input. More details on https://github.com/opencv/opencv/issues/25278#issuecomment-2032199630. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-04-03 15:56:59 +03:00
Dmitry Kurtaev	13c95efa74	Merge pull request #25312 from dkurt:dnn_hotfix_tflite Ownership check in TFLite importer #25312 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/25310 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-04-03 09:41:40 +03:00
HAN Liutong	eba158fb0c	Merge pull request #25230 from hanliutong/rvv-conv Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic. #25230 This patch optimize 3 functions in the int8 layer by using RVV Native Intrinsic. This patch was tested on QEMU using VLEN=128 and VLEN=256 on `./bin/opencv_test_dnn --gtest_filter="Int8"`; On the real device (k230, VLEN=128), `EfficientDet_int8` in `opencv_perf_dnn` showed a performance improvement of 1.46x. \| Name of Test \| Original \| optimized \| Speed-up \| \| ------------------------------------------ \| -------- \| ---------- \| -------- \| \| EfficientDet_int8::DNNTestNetwork::OCV/CPU \| 2843.467 \| 1947.013 \| 1.46 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-03-31 16:47:06 +03:00
Yuantao Feng	b758897c29	Merge pull request #25271 from fengyuentau:matmul_bias Merge with https://github.com/opencv/opencv_extra/pull/1158 Todo: - [x] Fix Attention pattern recognition. - [x] Handle other backends. Benchmark: "VIT_B_32 OCV/CPU", M1, results in milliseconds. \| Model \| 4.x \| This PR \| \| - \| - \| - \| \| VIT_B_32 OCV/CPU \| 87.66 \| 83.83 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-03-29 17:35:23 +03:00
Alexander Smorkalov	9fc4b61074	Merge pull request #25291 from dkurt:einsum_openvino Einsum OpenVINO backend	2024-03-29 15:54:26 +03:00
Dmitry Kurtaev	cfa42e4338	Einsum OpenVINO backend	2024-03-29 14:29:45 +03:00
Dmitry Kurtaev	01dc010436	Merge pull request #25273 from dkurt:tflite_new_layers TFLite new layers #25273 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/25272, https://github.com/opencv/opencv/issues/24965 Merge with extra: https://github.com/opencv/opencv_extra/pull/1160 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-03-29 11:21:13 +03:00
Yuantao Feng	accf200408	Merge pull request #25238 from fengyuentau:optimized_const dnn: avoid const layer forwarding in layer norm layer and attention layer #25238 While profiling ViTs with dnn, I found `ConstLayer` can take a proportion of the inference time, which is weird. This comes from the data copy during the inference of `ConstLayer`. There is a chance that we can improve the efficiency of data copying but the easiest and most convenient way is to avoid `ConstLayer`. This PR change the way how we handle constants in layer normalization layer and attention layer, which is storing in the layer blobs instead of making constant layers for them. Checklists: - [x] Backend compatibility in layer normalization layer. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-03-26 15:09:51 +03:00
Alexander Smorkalov	fc34554475	Merge pull request #25184 from dkurt:avoid_extra_memset Avoid extra memset	2024-03-25 13:07:49 +03:00
Yuantao Feng	025e7602b9	Merge pull request #25166 from fengyuentau:fix_cann_gemm dnn (CANN): Fix incorrect shape of 1d bias in Gemm #25166 Gemm layer was refactored some time ago. Users found that the mobilenet example in https://github.com/opencv/opencv/wiki/Huawei-CANN-Backend does not work because of incorrect shape set for 1d bias in Gemm. This PR resolves this issue. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-03-25 09:47:28 +03:00
Dmitry Kurtaev	0b6c9a2123	Merge pull request #25181 from dkurt:release_conv_weights Release convolution weightsMat after usage #25181 ### Pull Request Readiness Checklist related (but not resolved): https://github.com/opencv/opencv/issues/24134 Minor memory footprint improvement. Also, adds a test for VmHWM. RAM top memory usage (-230MB) \| YOLOv3 (237MB file) \| 4.x \| PR \| \|---------------------\|---------\|---------\| \| no winograd \| 808 MB \| 581 MB \| \| winograd \| 1985 MB \| 1750 MB \| See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-03-25 09:03:28 +03:00
Oleg Pipikin	6da2ddcf0e	Fix for OpenVINO 2024.0 Remove support OpenVINO lower than 2022.1 release Remove legacy InferenceEngine wrappers	2024-03-18 15:05:50 +04:00
Dmitry Kurtaev	6a370ba9e7	Avoid extra memset in convolution initialization	2024-03-08 10:46:07 +03:00
Dmitry Kurtaev	98aed21dd4	Avoid copy of ONNX graph during import	2024-03-05 18:22:46 +03:00
Alexander Smorkalov	daa8f7dfc6	Partially back-port #25075 to 4.x	2024-03-05 12:15:39 +03:00
Laurent Berger	5fe3933346	Merge pull request #25120 from LaurentBerger:I25103 Fixed ReduceMean layer behaviour #25120 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake `a93c31e3c9/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc (L433-L443)`	2024-03-04 09:36:53 +03:00
CSBVision	e8582f2cf8	Update net_impl.cpp See issue #25112	2024-03-01 14:56:00 +01:00
Yuantao Feng	5aa5c39210	Merge pull request #25076 from fengyuentau:improve_attention dnn: try improving performance of Attention layer #25076 Checklist: - [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward - [x] Use layer internal buffer over temporary Mat buffer - [x] Try a single fastGemmBatch on the Q/K/V calculation Performance: Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}. Data is in millisecond. \| \| macOS 14.2.1, Apple M1 \| Ubuntu 22.04.2, Intel i7 12700K \| \| - \| - \| - \| \| Current \| 10.96 \| 1.58 \| \| w/ Mat \| 6.27 \| 1.41 \| \| w/ Internals \| 5.87 \| 1.38 \| \| w/ fastGemmBatch \| 6.12 \| 2.14 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-02-28 16:47:08 +03:00
Laurent Berger	3c712cf77d	Merge pull request #25100 from LaurentBerger:I25077 Fix issue #25077 #25100 Fixes https://github.com/opencv/opencv/issues/25077 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-02-27 14:15:11 +03:00
Dhanwanth1803	12aa0fe898	Merge pull request #24985 from Dhanwanth1803:hardswish Fixes #24974 support HardSwishInt8 #24985 As given very clearly in the issue #24974 I made the required 2 changes to implement HardSwish Layer in INT8. Requesting comments. resolves https://github.com/opencv/opencv/issues/24974 - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: Dhanwanth1803 <dhanwanthvarala@gmail,com>	2024-02-16 18:19:29 +03:00
fengyuentau	fcaa8ce3c2	fix incorrect steps and elemsize when dtype changes	2024-02-06 16:27:25 +08:00
Haosonn	87f749277d	Merge pull request #24768 from Haosonn:pre-pr-2 Vulkan backend for NaryEltwiseLayer in DNN module #24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <IskXCr@outlook.com>	2024-01-29 18:41:49 +03:00
Alexander Alekhin	efc9837df1	Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage DNN: avoid CV_16S usage for FP16 #24892 Merge after: #24918 TODO: - [x] measure performance changes - [x] optimize convertTo for OpenCL: #24918 12700K iGPU: \|Name of Test\|0\|1\|1 vs 0 (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|AlexNet::DNNTestNetwork::OCV/OCL_FP16\|7.441\|7.480\|0.99\| \|CRNN::DNNTestNetwork::OCV/OCL_FP16\|10.776\|10.736\|1.00\| \|DenseNet_121::DNNTestNetwork::OCV/OCL_FP16\|52.762\|52.833\|1.00\| \|EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16\|60.694\|60.721\|1.00\| \|EfficientNet::DNNTestNetwork::OCV/OCL_FP16\|33.373\|33.173\|1.01\| \|FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16\|81.840\|81.724\|1.00\| \|GoogLeNet::DNNTestNetwork::OCV/OCL_FP16\|20.965\|20.927\|1.00\| \|Inception_5h::DNNTestNetwork::OCV/OCL_FP16\|22.204\|22.173\|1.00\| \|Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16\|47.115\|47.460\|0.99\| \|MPHand::DNNTestNetwork::OCV/OCL_FP16\|6.760\|6.670\|1.01\| \|MPPalm::DNNTestNetwork::OCV/OCL_FP16\|10.188\|10.171\|1.00\| \|MPPose::DNNTestNetwork::OCV/OCL_FP16\|12.510\|12.561\|1.00\| \|MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16\|17.290\|17.072\|1.01\| \|MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16\|19.473\|19.306\|1.01\| \|MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16\|22.874\|23.404\|0.98\| \|OpenFace::DNNTestNetwork::OCV/OCL_FP16\|9.568\|9.517\|1.01\| \|OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16\|539.899\|539.845\|1.00\| \|PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16\|18.015\|18.769\|0.96\| \|PPOCRv3::DNNTestNetwork::OCV/OCL_FP16\|63.122\|63.540\|0.99\| \|ResNet_50::DNNTestNetwork::OCV/OCL_FP16\|34.947\|34.925\|1.00\| \|SFace::DNNTestNetwork::OCV/OCL_FP16\|10.249\|10.206\|1.00\| \|SSD::DNNTestNetwork::OCV/OCL_FP16\|213.068\|213.108\|1.00\| \|SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16\|4.867\|4.878\|1.00\| \|VIT_B_32::DNNTestNetwork::OCV/OCL_FP16\|200.563\|190.788\|1.05\| \|VitTrack::DNNTestNetwork::OCV/OCL_FP16\|7.528\|7.173\|1.05\| \|YOLOX::DNNTestNetwork::OCV/OCL_FP16\|132.858\|132.701\|1.00\| \|YOLOv3::DNNTestNetwork::OCV/OCL_FP16\|209.559\|208.809\|1.00\| \|YOLOv4::DNNTestNetwork::OCV/OCL_FP16\|221.357\|220.924\|1.00\| \|YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16\|24.446\|24.382\|1.00\| \|YOLOv5::DNNTestNetwork::OCV/OCL_FP16\|43.922\|44.080\|1.00\| \|YOLOv8::DNNTestNetwork::OCV/OCL_FP16\|64.159\|63.842\|1.00\| \|YuNet::DNNTestNetwork::OCV/OCL_FP16\|10.177\|10.231\|0.99\| \|opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16\|15.121\|15.445\|0.98\| Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>	2024-01-26 16:34:17 +03:00
Yuantao Feng	37156a4719	Merge pull request #24925 from fengyuentau:loongarch_handle_warnings Handle warnings in loongson-related code #24925 See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16 Warnings needs to be handled before we add the loongson server to our CI. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-01-26 13:38:00 +03:00
Sean McBride	e64857c561	Merge pull request #23736 from seanm:c++11-simplifications Removed all pre-C++11 code, workarounds, and branches #23736 This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required. It is a nice clean up and simplification. * No longer unconditionally #include <array> in cvdef.h, include explicitly where needed * Removed deprecated CV_NODISCARD, already unused in the codebase * Removed some pre-C++11 workarounds, and simplified some backwards compat defines * Removed CV_CXX_STD_ARRAY * Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE * Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code. * Updated some documentation consequently. * Removed all tests of CV_CXX11, now assume it's always true * Fixed links. --------- Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com> Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>	2024-01-19 16:53:08 +03:00
fengyuentau	d269de0a03	initial commit	2024-01-18 11:17:50 +08:00
Alexander Smorkalov	ac4c0bffac	Merge pull request #24813 from fengyuentau:speedup_scatter dnn: improve scatter and scatterND speed with multi-threading	2024-01-17 17:16:50 +03:00
Alexander Smorkalov	84bb1cda4e	Merge pull request #24865 from asmorkalov:as/dnn_concat_assert Normalize axis parameter in DNN Concat to handle negative values	2024-01-16 14:39:28 +03:00
Alexander Smorkalov	26cf82a56c	Normalize axis parameter in DNN Concat to handle negative values.	2024-01-16 12:22:22 +03:00
Alexander Smorkalov	99c86bb40c	Merge pull request #24556 from plctlab:rvp Optimization based on RISC-V P Packed SIMD Extension v0.5.2	2024-01-16 11:36:31 +03:00
Alexander Smorkalov	68dc02e302	Merge pull request #24858 from Dhanwanth1803:avx-fix Use AVX2 overload instread on AVX in AVX2 scope	2024-01-16 09:14:31 +03:00

1 2 3 4 5 ...

1838 Commits