mirror of
https://github.com/opencv/opencv.git
synced 2025-06-12 04:12:52 +08:00
![]() * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix |
||
---|---|---|
.. | ||
cityscapes_semsegm_test_enet.py | ||
imagenet_cls_test_alexnet.py | ||
imagenet_cls_test_googlenet.py | ||
imagenet_cls_test_inception.py | ||
npy_blob.cpp | ||
npy_blob.hpp | ||
pascal_semsegm_test_fcn.py | ||
test_backends.cpp | ||
test_caffe_importer.cpp | ||
test_common.cpp | ||
test_common.hpp | ||
test_common.impl.hpp | ||
test_darknet_importer.cpp | ||
test_googlenet.cpp | ||
test_halide_layers.cpp | ||
test_ie_models.cpp | ||
test_int8_layers.cpp | ||
test_layers.cpp | ||
test_main.cpp | ||
test_misc.cpp | ||
test_model.cpp | ||
test_nms.cpp | ||
test_onnx_conformance_layer_filter__cuda_denylist.inl.hpp | ||
test_onnx_conformance_layer_filter__halide_denylist.inl.hpp | ||
test_onnx_conformance_layer_filter__openvino.inl.hpp | ||
test_onnx_conformance_layer_filter__vulkan_denylist.inl.hpp | ||
test_onnx_conformance_layer_filter_opencv_all_denylist.inl.hpp | ||
test_onnx_conformance_layer_filter_opencv_cpu_denylist.inl.hpp | ||
test_onnx_conformance_layer_filter_opencv_denylist.inl.hpp | ||
test_onnx_conformance_layer_filter_opencv_ocl_fp16_denylist.inl.hpp | ||
test_onnx_conformance_layer_filter_opencv_ocl_fp32_denylist.inl.hpp | ||
test_onnx_conformance_layer_parser_denylist.inl.hpp | ||
test_onnx_conformance.cpp | ||
test_onnx_importer.cpp | ||
test_precomp.hpp | ||
test_tf_importer.cpp | ||
test_tflite_importer.cpp | ||
test_torch_importer.cpp |