opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-12 23:49:36 +08:00

Author	SHA1	Message	Date
Alexander Alekhin	762a5c8334	imgproc: align GaussianBlur/sepFilter2D OpenCL with CPU version	2020-07-08 15:13:48 +00:00
Alexander Alekhin	49a75079f2	Merge pull request #17047 from alalek:fix_permissions	2020-04-13 12:34:08 +00:00
Alexander Alekhin	f0ffc52435	fix files permissions	2020-04-13 04:29:55 +00:00
Tomoaki Teshima	3371e679ce	fix OpenCL spec violation	2020-04-07 14:34:55 +09:00
Rostislav Vasilikhin	d99a4af229	Merge pull request #13379 from savuor:color_5x5 RGB to/from Gray rewritten to wide intrinsics (#13379) * 5x5 to RGB added * RGB25x5 added * Gray2RGB added * Gray2RGB5x5 added * vx_set moved out of loops * RGB5x52Gray added * RGB2Gray written * warnings fixed (int -> (u)short conversion) * warning fixed * warning fixed * "i < n-vsize+1" to "i <= n-vsize" * RGBA2mRGBA vectorized * try to fix ARM builds * fixed ARM build for RGB2RGB5x5 * mRGBA2RGBA: saturation, vectorization * fixed CL implementation of mRGBA2RGBA (saturation added)	2018-12-14 17:01:01 +03:00
Tomoaki Teshima	87a4f4ab3a	Merge pull request #11409 from tomoaki0705/fixCLAHEfailure Arm: fix the test failure of OCL_Imgproc/CLAHETest.Accuracy on ODROID-XU4 (#11409) * fix the test failure of OCL_Imgproc/CLAHETest.Accuracy on ODROID-XU4 * avoid the race condition in the reduce * imgproc(ocl): simplify CLAHE code * remove unused class	2018-04-27 16:41:56 +03:00
Rostislav Vasilikhin	64916d3d83	Merge pull request #10869 from savuor:color_cpp_split color.cpp split (#10869) * initial split is done * files renamed (these names are excluded during compilation) * IPP code moved to corresponding files * splineBuild, splineInterpolate -> color_lab.cpp * Lab, Luv: little refactored * it compiles (didn't check work); Lab OCL code moved to color_lab.cpp * cvtcolor.cl: Lab/Luv part moved to color_lab.cl * cvtcolor.cl: color_rgb.cl extracted * cvtcolor.cl: color_yuv.cl separated * cvtcolor.cl: color_hsv.cl extracted * cvtcolor.cl: extracted to color_lab.cl and color_rgb.cl * helper functions moved to hpp file * Lab, Luv: moved to color_lab.cpp * CPU XYZ: to color_lab.cpp * OCL XYZ: to color_lab.cpp * warning fixed * CvtHelper added * CPU YUV: to color_yuv.cpp, helpers to color.hpp * CPU HLS/HSV: to color_hsv.cpp * CPU BGR2BGR: to color_rgb.cpp * CPU RGB: to color_rgb.cpp * extra arg removed * CPU YUV: to color_yuv.cpp * color code decoded * OclHelper added, some funcs rewritten * color_lab.cpp: refactored to use OclHelper * OCL RGB: to color_rgb.cpp * OCL HLS/HSV: to color_hsv.cpp * OCL YUV: to color_yuv.cpp * OCL YUV planes: to color_yuv.cpp * OCL: color code reduced * licence to demosaicing.cpp * IPP func tables to color_rgb.cpp * code cleanup * HAVE_OPENCL ifdefs added * helpers made more common * fixed two plane YUV with separate mats * fixed warning in gcc7.2.0 * precomp header fixed * color space classification functions fixed * helpers fixed * rename: isSRGB -> is_sRGB	2018-03-15 14:10:40 +03:00
luz.paz	d05714995c	Misc. modules/ cont. pt2 Found via `codespell`	2018-02-13 11:28:11 -05:00
Alexander Alekhin	813ff37967	imgproc(ocl): fix RGB2RGBA kernel out of range access	2017-12-20 14:19:46 +00:00
Rostislav Vasilikhin	cc547e8260	Bit-exact version of Luv2RGB_b (#9470 ) * lab_tetra squashed * initial version is almost written * unfinished work * compilation fixed, to be debugged * Lab test removed * more fixes * Luv2RGBinteger: channels order fixed * Lab structs removed * good trilinear interpolation added * several fixes * removed Luv2RGB interpolations, XYZ tables; 8-cell LUT added * no_interpolate made 8-cell * interpolations rewritten to 8-cell, minor fixes * packed interpolation added for RGB2Luv * tetra implemented * removing unnecessary code * LUT building merged * changes ported to color.cpp * minor fixes; try to suppress warnings * fixed v range of Luv * fixed incorrect src channel number * minor fixes * preliminary version of Luv2RGBinteger is done * Luv2RGB_b is in progress * XYZ color constants converted to softfloat * Luv test: precision fixed * Luv bit-exactness test added * warnings fixed * compilation fixed, error message fixed * Luv check is limited to [0-2,0-2,0-2] by XYZ * L->Y generation moved to LUT * LUTs added for up and vp of Luv2RGB_b * still works * fixed-point is done, works at maxerr 2 * vectorized code is done, 2x slower than original * perf improved by 10% * extra comments removed * code moved to color.cpp * test_lab.cpp updated * minor refactoring * test added for Luv2RGB * OCL Luv2RGB_b: XYZ are limited to [0, 2]; docs updated * Luv2RGB_b rewritten to universal intrinsics * test_lab.cpp moved to luv_tetra branch	2017-09-21 14:20:45 +03:00
Alexander Alekhin	89bb028bfc	imgproc(ocl): don't use doubles to process float data	2017-09-07 12:42:20 +03:00
Alexander Alekhin	e3b12bdb59	imgproc(ocl): eliminate div by zero in Canny	2017-08-29 19:29:53 +03:00
Rostislav Vasilikhin	4b75be801e	initial version of Lab2RGB_f tetrahedral interpolation written RGB2Lab_f added, bugs fixed, moved to float several bugs fixed LUT fixed, no switch in tetraInterpolate() temporary code; to be removed and rewritten before refactoring extra interpolations removed, some things to do left added Lab2RGB_b +XYZ version, etc. basic version is done, to be sped up tetra refactored interpolations: LUT for weights, refactor., etc. address arithm optimized initial version of vectorized code added (not compiling now) compilation fixed, now segfaults a lot of fixes, vectorization temp. disabled fixed trilinear shift size, max error dropped from 19 to 10 fixed several bugs (255 vs 256, signed vs unsigned, bIdx) minor changes packed: address arithmetics fixed shorter code experiments with pure integer calculations Lab2RGB max error decreased to 2; need to clean the code ready for vectorization; need cleaning vectorized, to be debugged precision fixed, max error is 2 Lab->XYZ shortened minor fixes Lab2RGB_f version fixed, to be completely rewritten using _b code RGB2Lab_f vectorized minors moved to separate file refactored Lab2RGB to float and int versions minor fix Lab2RGB_f vectorized minor refactoring Lab2RGBint refactored: process methods, vectorize by 4 pix Lab2RGB_f int version is done cleanup extra code code copied to color.cpp fixed blue idx bug optimizations enabled when testing; mulFracConst introduced divConst -> mulFracConst calc min time in perf instead of avg minors process() slightly sped up Lab2RGB_f: disabled int version reinterpret added, minor fixes in names some warnings fixed changes transferred to color.cpp RGB2Lab_f code (and trilinear interpolation code) moved to rgb2lab_faster whitespace shift negative fixed more warnings fixed "constant condition" warnings fixed, little speed up minor changes test_photo decolor fixed changes copied to test_lab.cpp idx bounds checking in LUT init several fixes WIP: softfloat almost integrated test_lab partially rewritten to SoftFloat color.cpp rewritten to SoftFloat test_lab.cpp: accuracy code added several fixes RGB2Lab_b testing fixed splineBuild() rewritten to SoftFloat accuracy control improved rounding fixed Luv <=> RGB: rewritten to SoftFloat OCL cvtColor Lab and Lut rewritten to SoftFloat minor fixes refactored to new SoftFloat interface round() -> cvRound, etc. fixed OCL tests softfloat.cpp: internal functions made static, unused ones removed meaningful constants extra lines removed unused function removed unfinished work it works, need to fix TODOs refactoring; more calls rewritten mulFracConst removed constants made bit exact; minors changes moved to color.cpp fixed 1 bug and 4 warnings OCL: fixed constants pow(x, _1_3f) replaced by cubeRoot(x) fixed compilation on MSVC32 magic constants explained file with internal accuracy&speed tests moved to lab_tetra branch	2017-07-17 00:32:30 +03:00
Rostislav Vasilikhin	aa621d6f3c	magic constants explained	2017-07-06 00:30:53 +03:00
Rostislav Vasilikhin	704c688225	OCL code fixed, fix for NEON added	2017-07-05 22:08:49 +03:00
Maksim Shabunin	ce50df564c	Fixed cvtColor OCL compilation issue (BGRA2mBGRA)	2017-04-05 11:48:29 +03:00
Alexander Alekhin	ba8a6e3533	ocl: don't use vload4 for 3 channel images	2017-03-03 19:36:38 +03:00
mshabunin	8c66531c42	imgproc/CLAHE/ocl: Removed unnecessary __local variable	2017-01-13 16:25:43 +03:00
Li Peng	396921dd23	5x5 gaussian blur optimization Add new 5x5 gaussian blur kernel for CV_8UC1 format, it is 50% ~ 70% faster than current ocl kernel in the perf test. Signed-off-by: Li Peng <peng.li@intel.com>	2016-12-06 09:42:37 +08:00
Li Peng	b69cdb2434	Image pyramids upsampling optimization Add new ocl kernel for image pyramids upsampling, It is 35% faster than current OCL kernel in perf test. Signed-off-by: Li Peng <peng.li@intel.com>	2016-12-02 13:54:58 +08:00
Li Peng	2ca5a7e862	more optimization for warpAffine and warpPerspective Add new OpenCL kernels for bicubic interploation, it is 20% faster than current warp image kernel with bicubic interploation. Signed-off-by: Li Peng <peng.li@intel.com>	2016-11-30 15:43:41 +08:00
Vadim Pisarevsky	c47267ef7f	Merge pull request #7538 from Tetragramm:CLAHEfix	2016-11-29 16:42:05 +00:00
Alexander Alekhin	90b52cd9b8	Merge pull request #7726 from pengli:warp_image	2016-11-29 12:19:18 +00:00
Li Peng	b72d196753	optimization for warpAffine and warpPerspective Add new ocl kernels for warpAffine and warpPerspective, The average performance improvemnt is about 30%. The new ocl kernels require CV_8UC1 format and support nearest neighbor and bilinear interpolation. Signed-off-by: Li Peng <peng.li@intel.com>	2016-11-29 14:55:58 +08:00
Rostislav Vasilikhin	7db43f9fff	fixed wrong equivalence in YUV conversion (#7481 ) * fixed wrong equivalence in YUV conversion * fixed channel order from YVU to YUV	2016-11-23 17:39:18 +03:00
Li Peng	6cb73356b1	laplacian ocl kernel optimization This ocl kernel is 46%~171% faster than current laplacian 3x3 ocl kernel in the perf test, with image format "CV_8UC1". Signed-off-by: Li Peng <peng.li@intel.com>	2016-11-17 12:01:02 +08:00
Li Peng	8d4a7d3dcc	sobel and scharr ocl kernel optimization It improves 108%~230% performance in the perf test with image format "CV_8UC1" and kernel size 3. Signed-off-by: Li Peng <peng.li@intel.com>	2016-11-14 15:34:59 +08:00
Alexander Alekhin	17ffb28807	Merge pull request #7602 from mshabunin:fix-opencl-warnings	2016-11-09 12:35:00 +00:00
Li Peng	8f63f51e81	gaussian blur ocl kernel optimization This ocl kernel is for 3x3 kernel size and CV_8UC1 format It is 115% ~ 300% faster than current ocl path in perf test python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture* Signed-off-by: Li Peng <peng.li@intel.com>	2016-11-08 11:22:26 +08:00
Alexander Alekhin	442380bfac	Merge pull request #7585 from pengli:morph_filter	2016-11-07 17:11:32 +00:00
mshabunin	3e28d51779	Fixed several OpenCL compiler warnings	2016-11-07 16:49:12 +03:00
Li Peng	35198b84a4	morph ocl kernel for erode and dilate filter This kernel is for CV_8UC1 format and 3x3 kernel size, It is about 33% ~ 55% faster than current ocl kernel with below perf test python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture* python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture* Also add accuracy test cases for this kernel, the test command is ./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3* Signed-off-by: Li Peng <peng.li@intel.com>	2016-11-04 12:24:24 +08:00
Tetragramm	17df65e666	Fix the OpenCL portion to match the c++ code. Fix an undiscovered bug in the c++ code.	2016-11-03 20:41:16 -05:00
Vadim Pisarevsky	2b7866f21b	Merge pull request #7503 from pengli:box_filter_v2	2016-10-29 21:20:06 +00:00
Li Peng	3607da9f6b	ocl kernel performance optimization for box filter The optimization is for CV_8UC1 format and 3x3 box filter, it is 15%~87% faster than current ocl kernel with below perf test ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture* Also add test cases for this ocl kernel. Signed-off-by: Li Peng <peng.li@intel.com>	2016-10-26 11:56:11 +08:00
LukeZhu	ef47bcc88b	Fix the problem: filterSmall.cl report error with double	2016-10-17 15:12:42 +08:00
Alexander Alekhin	b8e08d5d3c	ocl: fix Canny for Intel devices There is an issue with processing of abs(short) function for negative argument. Affected OpenCL devices: - iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 ) - CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))	2016-08-09 12:48:06 +03:00
ohnozzy	db9f611767	Add OpenCL support to linearPolar & logPolar Add OpenCL support to linearPolar & logPolar. The OpenCL code use float instead of double, so that it does not require cl_khr_fp64 extension, with slight precision lost. Add explicit conversion Add explicit conversion from double to float to eliminate warning during compilation.	2016-04-24 08:37:56 +08:00
Zhigang Gong	0b08d2559e	fix potential race condition in canny.cl. See the below code snippet: while(l_counter != 0) { int mod = l_counter % LOCAL_TOTAL; int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0); for (int i = 0; i < pix_per_thr; ++i) { int index = atomic_dec(&l_counter) - 1; .... } .... barrier(CLK_LOCAL_MEM_FENCE); } If we don't put a barrier before the for loop, then there is a possiblity that some work item enter this loop but the others are not, the the l_counter will be reduced in the for loop and may be changed to zero, and the other work items may can't enter the while loop. If this happens, it breaks the barrier's rule which requires all the work items reach the same barrier. And it may hang the GPU depends on the implementation of opencl platform. This issue is raised at: https://github.com/Itseez/opencv/issues/5175 Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>	2016-03-15 19:11:15 +08:00
Zhigang Gong	0f7de40e66	Fixed the race condition between inc and dec on the l_counter. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>	2015-05-26 22:06:18 +08:00
Zhigang Gong	3c85200989	Avoid negative index for a local buffer in Canny.cl. int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0); The pix_per_thr * LOCAL_TOTAL may be larger than l_counter. Thus the index of l_stack may be negative which may cause serious problems. Let's skip the loop when we get negative index and we need to add back the lcounter to keep its balance and avoid potential negative counter. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>	2015-05-26 08:48:05 +08:00
Pavel Rojtberg	1ea41e7246	fix gftt opencv kernel when using mask	2015-04-22 16:13:50 +02:00
Yan Wang	6e7050555e	Optimize pyrUp_unrolled() by mad function. It could improve performance when image size is large. E.g. OCL_PyrUpFixture_PyrUp.PyrUp/18	2014-11-26 16:55:08 +08:00
Alexander Alekhin	569a95e9e1	Merge pull request #3394 from akarsakov:ocl_canny	2014-11-07 12:42:24 +00:00
Alexander Karsakov	7c870014fb	Correctly unrolled some cycles	2014-11-07 12:13:00 +03:00
Alexander Karsakov	0ec0aeb7d0	Minor optimization for ocl_canny	2014-11-06 13:07:33 +03:00
vbystricky	957e5ef8eb	Fix OpenCL version of HoughLinesP function	2014-11-05 14:31:06 +04:00
Alexander Karsakov	643c906f3d	Added optimized loading to YUV2RGB_422 kernel	2014-10-28 15:07:51 +03:00
Alexander Karsakov	1466621f99	Added loading 4 pixels in line instead of 2 to RGB[A] -> YUV(420) kernel	2014-10-27 16:00:34 +03:00
Alexander Karsakov	60367907fe	Used direct float calculations	2014-10-21 17:18:03 +03:00

1 2 3 4 5 ...

293 Commits