opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-15 18:09:11 +08:00

Author	SHA1	Message	Date
casualwinds	7b399c4248	Merge pull request #24280 from casualwind:parallel_opt Optimization for parallelization when large core number #24280 Problem description： When the number of cores is large, OpenCV’s thread library may reduce performance when processing parallel jobs. The reason for this problem: When the number of cores (the thread pool initialized the threads, whose number is as same as the number of cores) is large, the main thread will spend too much time on waking up unnecessary threads. When a parallel job needs to be executed, the main thread will wake up all threads in sequence, and then wait for the signal for the job completion after waking up all threads. When the number of threads is larger than the parallel number of a job slices, there will be a situation where the main thread wakes up the threads in sequence and the awakened threads have completed the job, but the main thread is still waking up the other threads. The threads woken up by the main thread after this have nothing to do, and the broadcasts made by the waking threads take a lot of time, which reduce the performance. Solution： Reduce the time for the process of main thread waking up the worker threads through the following two methods: • The number of threads awakened by the main thread should be adjusted according to the parallel number of a job slices. If the number of threads is greater than the number of the parallel number of job slices, the total number of threads awakened should be reduced. • In the process of waking up threads in sequence, if the main thread finds that all parallel job slices have been allocated, it will jump out of the loop in time and wait for the signal for the job completion. Performance Test: The tests were run in the manner described by https://github.com/opencv/opencv/wiki/HowToUsePerfTests. At core number = 160, There are big performance gain in some cases. Take the following cases in the video module as examples: OpticalFlowPyrLK_self::Path_Idx_Cn_NPoints_WSize_Deriv::("cv/optflow/frames/VGA_%02d.png", 2, 1, (9, 9), 11, true) Performance improves 191%:0.185405ms ->0.0636496ms perf::DenseOpticalFlow_VariationalRefinement::(320x240, 10, 10) Performance improves 112%:23.88938ms -> 11.2562ms Among all the modules, the performance improvement is greatest on module video, and there are also certain improvements on other modules. At core number = 160, the times labeled below are the geometric mean of the average time of all cases for one module. The optimization is available on each module. overall \| time(ms) \| \| \| \| \| \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- module name \| gapi \| dnn \| features2d \| objdetect \| core \| imgproc \| stitching \| video original \| 0.185 \| 1.586 \| 9.998 \| 11.846 \| 0.205 \| 0.215 \| 164.409 \| 0.803 optimized \| 0.174 \| 1.353 \| 9.535 \| 11.105 \| 0.199 \| 0.185 \| 153.972 \| 0.489 Performance improves \| 6% \| 17% \| 5% \| 7% \| 3% \| 16% \| 7% \| 64% Meanwhile, It is found that adjusting the order of test cases will have an impact on some test cases. For example, we used option --gtest-shuffle to run opencv_perf_gapi, the performance of TestPerformance::CmpWithScalarPerfTestFluid/CmpWithScalarPerfTest::(compare_f, CMP_GE, 1920x1080, 32FC1, { gapi.kernel_package }) case had 30% changes compared to the case without shuffle. I would like to ask if you have also encountered such a situation and could you share your experience? ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-09-27 16:21:20 +03:00
Alexander Alekhin	bc8c912c7a	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-12-24 13:54:58 +00:00
Vincent Rabaud	7463e9b8bb	Even faster CV_PAUSE on SkyLake and above. No need to loop as RDTSC is 3/4 times faster than _mm_pause.	2022-12-19 14:15:34 +01:00
Alexander Alekhin	420db56ffd	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-12-18 02:17:17 +00:00
Vincent Rabaud	b7b08fa0c3	Fix slower CV_PAUSE on SkyLake and above. This is fixing https://github.com/opencv/opencv/issues/22852	2022-12-15 14:18:57 +01:00
wxsheng	4154bd0667	Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX * Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX * Add resize.lasx.cpp for Loongson SIMD acceleration * Add imgwarp.lasx.cpp for Loongson SIMD acceleration * Add LASX acceleration support for dnn/conv * Add CV_PAUSE(v) for Loongarch * Set LASX by default on Loongarch64 * LoongArch: tune test threshold for Core/HAL.mat_decomp/15 Co-authored-by: shengwenxue <shengwenxue@loongson.cn>	2022-09-10 09:39:43 +03:00
Alexander Alekhin	87d4970e8b	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2021-10-04 19:50:01 +00:00
Alexander Alekhin	62414e3073	core(parallel): suppress TSAN warning	2021-10-04 10:46:32 +00:00
Alexander Alekhin	e5d78960c6	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2021-02-12 21:34:49 +00:00
Vincent Rabaud	847b16fb76	Disable thread sanitization when CV_USE_GLOBAL_WORKERS_COND_VAR is not set. This fixes #19463	2021-02-09 14:12:39 +01:00
Alexander Smorkalov	7228d2a824	Added initial version of cmake toolchain for RISC-V architecture.	2020-04-27 12:42:38 +03:00
Alexander Alekhin	560f85f8e5	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2020-01-28 14:26:57 +03:00
Alexander Alekhin	e83438c23d	core(build): fix i386 compilation	2020-01-26 00:00:25 +00:00
Alexander Alekhin	92b9888837	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2019-12-12 13:02:19 +03:00
Alexander Alekhin	816f82682b	core(trace/itt): avoid calling __itt_thread_set_name() by default - don't override current application thread names - set name for own threads only	2019-12-07 21:41:15 +00:00
Alexander Alekhin	a74fe2ec01	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2019-09-20 21:11:49 +00:00
mipsopen-fwu	b1ea91d8bd	Merge pull request #15422 from mipsopen-fwu:msa-dev * Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added. Signed-off-by: Fei Wu <fwu@wavecomp.com> * Removed some unused code in mips.toolchain.cmake. Signed-off-by: Fei Wu <fwu@wavecomp.com> * Added comments for mips toolchain configuration and disabled compiling warnings for libpng. Signed-off-by: Fei Wu <fwu@wavecomp.com> * Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2. Signed-off-by: Fei Wu <fwu@wavecomp.com> * 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake. 2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h. 3. Removed hasSIMD128() in intrin_msa.hpp. 4. Define CPU_MSA as 150. Signed-off-by: Fei Wu <fwu@wavecomp.com> * 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp. 2. Removed unnecessary CV_MSA related code block in dotProd_8u(). Signed-off-by: Fei Wu <fwu@wavecomp.com> * 1. Defined CPU_MSA_FLAGS_ON as "-mmsa". 2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp. Signed-off-by: Fei Wu <fwu@wavecomp.com> * Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h. Signed-off-by: Fei Wu <fwu@wavecomp.com>	2019-09-20 19:52:48 +03:00
cyy	10fb88d027	Merge pull request #12391 from DEEPIR:master fix some errors found by static analyzer. (#12391) * fix possible divided by zero and by negative values * only 4 elements are used in these arrays * fix uninitialized member * use boolean type for semantic boolean variables * avoid invalid array index * to avoid exception and because base64_beg is only used in this block * use std::atomic<bool> to avoid thread control race condition	2018-09-04 16:39:19 +03:00
cyy	09837928d9	Merge pull request #12357 from DEEPIR:master * fix some static analyzer warnings * fix some static analyzer warnings * fix race condition of workthread control	2018-09-02 16:34:43 +03:00
Alexander Alekhin	98c8584b88	next: drop CV_CXX11 conditions define itself is still here for compatibility	2018-04-10 18:09:54 +03:00
Alexander Alekhin	7dc162cb42	core: fix mm_pause() for non-SSE i386 builds replaced to safe binary compatible 'rep; nop' asm instruction	2018-04-09 18:37:35 +03:00
Alexander Alekhin	491502a349	core: fix parallel_for data race	2018-02-16 21:13:48 +03:00
luz.paz	5718d09e39	Misc. modules/ typos Found via `codespell`	2018-02-12 07:09:43 -05:00
Alexander Alekhin	914f57f28d	core(parallel_for): fix data race	2018-02-06 18:19:50 +03:00
Alexander Alekhin	b10fedde56	core(parallel_for): cleanup remove 'dont_wait' (can be replaced with has_wake_signal)	2018-02-06 16:10:41 +03:00
Sayed Adel	4e1d396ce1	core:ppc Add yield support	2018-01-31 04:03:35 +00:00
Alexander Alekhin	c49d5d5252	core: fix pthreads performance OpenCV pthreads-based implementation changes: - rework worker threads pool, allow to execute job by the main thread too - rework synchronization scheme (wait for job completion, threads 'pong' answer is not required) - allow "active wait" (spin) by worker threads and by the main thread - use _mm_pause() during active wait (support for Hyper-Threading technology) - use sched_yield() to avoid preemption of still working other workers - don't use getTickCount() - optional builtin thread pool profiler (disabled by compilation flag)	2018-01-26 04:09:11 +00:00

27 Commits