Commit Graph

293 Commits

Author SHA1 Message Date
Alexander Alekhin
762a5c8334 imgproc: align GaussianBlur/sepFilter2D OpenCL with CPU version 2020-07-08 15:13:48 +00:00
Alexander Alekhin
49a75079f2 Merge pull request #17047 from alalek:fix_permissions 2020-04-13 12:34:08 +00:00
Alexander Alekhin
f0ffc52435 fix files permissions 2020-04-13 04:29:55 +00:00
Tomoaki Teshima
3371e679ce fix OpenCL spec violation 2020-04-07 14:34:55 +09:00
Rostislav Vasilikhin
d99a4af229 Merge pull request #13379 from savuor:color_5x5
RGB to/from Gray rewritten to wide intrinsics (#13379)

* 5x5 to RGB added

* RGB25x5 added

* Gray2RGB added

* Gray2RGB5x5 added

* vx_set moved out of loops

* RGB5x52Gray added

* RGB2Gray written

* warnings fixed (int -> (u)short conversion)

* warning fixed

* warning fixed

* "i < n-vsize+1" to "i <= n-vsize"

* RGBA2mRGBA vectorized

* try to fix ARM builds

* fixed ARM build for RGB2RGB5x5

* mRGBA2RGBA: saturation, vectorization

* fixed CL implementation of mRGBA2RGBA (saturation added)
2018-12-14 17:01:01 +03:00
Tomoaki Teshima
87a4f4ab3a Merge pull request #11409 from tomoaki0705/fixCLAHEfailure
Arm: fix the test failure of OCL_Imgproc/CLAHETest.Accuracy on ODROID-XU4 (#11409)

* fix the test failure of OCL_Imgproc/CLAHETest.Accuracy on ODROID-XU4
  * avoid the race condition in the reduce

* imgproc(ocl): simplify CLAHE code

* remove unused class
2018-04-27 16:41:56 +03:00
Rostislav Vasilikhin
64916d3d83 Merge pull request #10869 from savuor:color_cpp_split
color.cpp split (#10869)

* initial split is done

* files renamed (these names are excluded during compilation)

* IPP code moved to corresponding files

* splineBuild, splineInterpolate -> color_lab.cpp

* Lab, Luv: little refactored

* it compiles (didn't check work); Lab OCL code moved to color_lab.cpp

* cvtcolor.cl: Lab/Luv part moved to color_lab.cl

* cvtcolor.cl: color_rgb.cl extracted

* cvtcolor.cl: color_yuv.cl separated

* cvtcolor.cl: color_hsv.cl extracted

* cvtcolor.cl: extracted to color_lab.cl and color_rgb.cl

* helper functions moved to hpp file

* Lab, Luv: moved to color_lab.cpp

* CPU XYZ: to color_lab.cpp

* OCL XYZ: to color_lab.cpp

* warning fixed

* CvtHelper added

* CPU YUV: to color_yuv.cpp, helpers to color.hpp

* CPU HLS/HSV: to color_hsv.cpp

* CPU BGR2BGR: to color_rgb.cpp

* CPU RGB: to color_rgb.cpp

* extra arg removed

* CPU YUV: to color_yuv.cpp

* color code decoded

* OclHelper added, some funcs rewritten

* color_lab.cpp: refactored to use OclHelper

* OCL RGB: to color_rgb.cpp

* OCL HLS/HSV: to color_hsv.cpp

* OCL YUV: to color_yuv.cpp

* OCL YUV planes: to color_yuv.cpp

* OCL: color code reduced

* licence to demosaicing.cpp

* IPP func tables to color_rgb.cpp

* code cleanup

* HAVE_OPENCL ifdefs added

* helpers made more common

* fixed two plane YUV with separate mats

* fixed warning in gcc7.2.0

* precomp header fixed

* color space classification functions fixed

* helpers fixed

* rename: isSRGB -> is_sRGB
2018-03-15 14:10:40 +03:00
luz.paz
d05714995c Misc. modules/ cont. pt2
Found via `codespell`
2018-02-13 11:28:11 -05:00
Alexander Alekhin
813ff37967 imgproc(ocl): fix RGB2RGBA kernel out of range access 2017-12-20 14:19:46 +00:00
Rostislav Vasilikhin
cc547e8260 Bit-exact version of Luv2RGB_b (#9470)
* lab_tetra squashed

* initial version is almost written

* unfinished work

* compilation fixed, to be debugged

* Lab test removed

* more fixes

* Luv2RGBinteger: channels order fixed

* Lab structs removed

* good trilinear interpolation added

* several fixes

* removed Luv2RGB interpolations, XYZ tables; 8-cell LUT added

* no_interpolate made 8-cell

* interpolations rewritten to 8-cell, minor fixes

* packed interpolation added for RGB2Luv

* tetra implemented

* removing unnecessary code

* LUT building merged

* changes ported to color.cpp

* minor fixes; try to suppress warnings

* fixed v range of Luv

* fixed incorrect src channel number

* minor fixes

* preliminary version of Luv2RGBinteger is done

* Luv2RGB_b is in progress

* XYZ color constants converted to softfloat

* Luv test: precision fixed

* Luv bit-exactness test added

* warnings fixed

* compilation fixed, error message fixed

* Luv check is limited to [0-2,0-2,0-2] by XYZ

* L->Y generation moved to LUT

* LUTs added for up and vp of Luv2RGB_b

* still works

* fixed-point is done, works at maxerr 2

* vectorized code is done, 2x slower than original

* perf improved by 10%

* extra comments removed

* code moved to color.cpp

* test_lab.cpp updated

* minor refactoring

* test added for Luv2RGB

* OCL Luv2RGB_b: XYZ are limited to [0, 2]; docs updated

* Luv2RGB_b rewritten to universal intrinsics

* test_lab.cpp moved to luv_tetra branch
2017-09-21 14:20:45 +03:00
Alexander Alekhin
89bb028bfc imgproc(ocl): don't use doubles to process float data 2017-09-07 12:42:20 +03:00
Alexander Alekhin
e3b12bdb59 imgproc(ocl): eliminate div by zero in Canny 2017-08-29 19:29:53 +03:00
Rostislav Vasilikhin
4b75be801e initial version of Lab2RGB_f tetrahedral interpolation written
RGB2Lab_f added, bugs fixed, moved to float

several bugs fixed

LUT fixed, no switch in tetraInterpolate()

temporary code; to be removed and rewritten

before refactoring

extra interpolations removed, some things to do left

added Lab2RGB_b +XYZ version, etc.

basic version is done, to be sped up

tetra refactored

interpolations: LUT for weights, refactor., etc.

address arithm optimized

initial version of vectorized code added (not compiling now)

compilation fixed, now segfaults

a lot of fixes, vectorization temp. disabled

fixed trilinear shift size, max error dropped from 19 to 10

fixed several bugs (255 vs 256, signed vs unsigned, bIdx)

minor changes

packed: address arithmetics fixed

shorter code

experiments with pure integer calculations

Lab2RGB max error decreased to 2; need to clean the code

ready for vectorization; need cleaning

vectorized, to be debugged

precision fixed, max error is 2

Lab->XYZ shortened

minor fixes

Lab2RGB_f version fixed, to be completely rewritten using _b code

RGB2Lab_f vectorized

minors

moved to separate file

refactored Lab2RGB to float and int versions

minor fix

Lab2RGB_f vectorized

minor refactoring

Lab2RGBint refactored: process methods, vectorize by 4 pix

Lab2RGB_f int version is done

cleanup extra code

code copied to color.cpp

fixed blue idx bug

optimizations enabled when testing; mulFracConst introduced

divConst -> mulFracConst

calc min time in perf instead of avg

minors

process() slightly sped up

Lab2RGB_f: disabled int version

reinterpret added, minor fixes in names

some warnings fixed

changes transferred to color.cpp

RGB2Lab_f code (and trilinear interpolation code) moved to rgb2lab_faster

whitespace

shift negative fixed

more warnings fixed

"constant condition" warnings fixed, little speed up

minor changes

test_photo decolor fixed

changes copied to test_lab.cpp

idx bounds checking in LUT init

several fixes

WIP: softfloat almost integrated

test_lab partially rewritten to SoftFloat

color.cpp rewritten to SoftFloat

test_lab.cpp: accuracy code added

several fixes

RGB2Lab_b testing fixed

splineBuild() rewritten to SoftFloat

accuracy control improved

rounding fixed

Luv <=> RGB: rewritten to SoftFloat

OCL cvtColor Lab and Lut rewritten to SoftFloat

minor fixes

refactored to new SoftFloat interface

round() -> cvRound, etc.

fixed OCL tests

softfloat.cpp: internal functions made static, unused ones removed

meaningful constants

extra lines removed

unused function removed

unfinished work

it works, need to fix TODOs

refactoring; more calls rewritten

mulFracConst removed

constants made bit exact; minors

changes moved to color.cpp

fixed 1 bug and 4 warnings

OCL: fixed constants

pow(x, _1_3f) replaced by cubeRoot(x)

fixed compilation on MSVC32

magic constants explained

file with internal accuracy&speed tests moved to lab_tetra branch
2017-07-17 00:32:30 +03:00
Rostislav Vasilikhin
aa621d6f3c magic constants explained 2017-07-06 00:30:53 +03:00
Rostislav Vasilikhin
704c688225 OCL code fixed, fix for NEON added 2017-07-05 22:08:49 +03:00
Maksim Shabunin
ce50df564c Fixed cvtColor OCL compilation issue (BGRA2mBGRA) 2017-04-05 11:48:29 +03:00
Alexander Alekhin
ba8a6e3533 ocl: don't use vload4 for 3 channel images 2017-03-03 19:36:38 +03:00
mshabunin
8c66531c42 imgproc/CLAHE/ocl: Removed unnecessary __local variable 2017-01-13 16:25:43 +03:00
Li Peng
396921dd23 5x5 gaussian blur optimization
Add new 5x5 gaussian blur kernel for CV_8UC1 format,
it is 50% ~ 70% faster than current ocl kernel in the perf test.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-12-06 09:42:37 +08:00
Li Peng
b69cdb2434 Image pyramids upsampling optimization
Add new ocl kernel for image pyramids upsampling,
It is 35% faster than current OCL kernel in perf test.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-12-02 13:54:58 +08:00
Li Peng
2ca5a7e862 more optimization for warpAffine and warpPerspective
Add new OpenCL kernels for bicubic interploation, it is 20% faster
than current warp image kernel with bicubic interploation.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-30 15:43:41 +08:00
Vadim Pisarevsky
c47267ef7f Merge pull request #7538 from Tetragramm:CLAHEfix 2016-11-29 16:42:05 +00:00
Alexander Alekhin
90b52cd9b8 Merge pull request #7726 from pengli:warp_image 2016-11-29 12:19:18 +00:00
Li Peng
b72d196753 optimization for warpAffine and warpPerspective
Add new ocl kernels for warpAffine and warpPerspective,
The average performance improvemnt is about 30%. The new
ocl kernels require CV_8UC1 format and support nearest
neighbor and bilinear interpolation.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-29 14:55:58 +08:00
Rostislav Vasilikhin
7db43f9fff fixed wrong equivalence in YUV conversion (#7481)
* fixed wrong equivalence in YUV conversion

* fixed channel order from YVU to YUV
2016-11-23 17:39:18 +03:00
Li Peng
6cb73356b1 laplacian ocl kernel optimization
This ocl kernel is 46%~171% faster than current laplacian 3x3
ocl kernel in the perf test, with image format "CV_8UC1".

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-17 12:01:02 +08:00
Li Peng
8d4a7d3dcc sobel and scharr ocl kernel optimization
It improves 108%~230% performance in the perf test
with image format "CV_8UC1" and kernel size 3.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-14 15:34:59 +08:00
Alexander Alekhin
17ffb28807 Merge pull request #7602 from mshabunin:fix-opencl-warnings 2016-11-09 12:35:00 +00:00
Li Peng
8f63f51e81 gaussian blur ocl kernel optimization
This ocl kernel is for 3x3 kernel size and CV_8UC1 format
It is 115% ~ 300% faster than current ocl path in perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture*

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-08 11:22:26 +08:00
Alexander Alekhin
442380bfac Merge pull request #7585 from pengli:morph_filter 2016-11-07 17:11:32 +00:00
mshabunin
3e28d51779 Fixed several OpenCL compiler warnings 2016-11-07 16:49:12 +03:00
Li Peng
35198b84a4 morph ocl kernel for erode and dilate filter
This kernel is for CV_8UC1 format and 3x3 kernel size,
It is about 33% ~ 55% faster than current ocl kernel with below perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture*
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture*

Also add accuracy test cases for this kernel, the test command is

./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3*

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-04 12:24:24 +08:00
Tetragramm
17df65e666 Fix the OpenCL portion to match the c++ code.
Fix an undiscovered bug in the c++ code.
2016-11-03 20:41:16 -05:00
Vadim Pisarevsky
2b7866f21b Merge pull request #7503 from pengli:box_filter_v2 2016-10-29 21:20:06 +00:00
Li Peng
3607da9f6b ocl kernel performance optimization for box filter
The optimization is for CV_8UC1 format and 3x3 box filter,
it is 15%~87% faster than current ocl kernel with below perf test

./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture*

Also add test cases for this ocl kernel.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-10-26 11:56:11 +08:00
LukeZhu
ef47bcc88b Fix the problem: filterSmall.cl report error with double 2016-10-17 15:12:42 +08:00
Alexander Alekhin
b8e08d5d3c ocl: fix Canny for Intel devices
There is an issue with processing of abs(short) function for
negative argument.

Affected OpenCL devices:
- iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 )
- CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))
2016-08-09 12:48:06 +03:00
ohnozzy
db9f611767 Add OpenCL support to linearPolar & logPolar
Add OpenCL  support to linearPolar & logPolar.
The OpenCL code use float instead of double, so that it does not require
cl_khr_fp64 extension, with slight precision lost.

Add explicit conversion

Add explicit conversion from double to float to eliminate warning during
compilation.
2016-04-24 08:37:56 +08:00
Zhigang Gong
0b08d2559e fix potential race condition in canny.cl.
See the below code snippet:

while(l_counter != 0)
{
    int mod = l_counter % LOCAL_TOTAL;
    int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);

    for (int i = 0; i < pix_per_thr; ++i)
    {
        int index = atomic_dec(&l_counter) - 1;
        ....
    }
    ....
    barrier(CLK_LOCAL_MEM_FENCE);
}

If we don't put a barrier before the for loop, then there is a possiblity
that some work item enter this loop but the others are not, the the l_counter
will be reduced in the for loop and may be changed to zero, and the other
work items may can't enter the while loop. If this happens, it breaks the
barrier's rule which requires all the work items reach the same barrier.
And it may hang the GPU depends on the implementation of opencl platform.

This issue is raised at:
https://github.com/Itseez/opencv/issues/5175

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2016-03-15 19:11:15 +08:00
Zhigang Gong
0f7de40e66 Fixed the race condition between inc and dec on the l_counter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-26 22:06:18 +08:00
Zhigang Gong
3c85200989 Avoid negative index for a local buffer in Canny.cl.
int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);
The pix_per_thr * LOCAL_TOTAL may be larger than l_counter.
Thus the index of l_stack may be negative which may cause serious
problems. Let's skip the loop when we get negative index and we need
to add back the lcounter to keep its balance and avoid potential
negative counter.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-26 08:48:05 +08:00
Pavel Rojtberg
1ea41e7246 fix gftt opencv kernel when using mask 2015-04-22 16:13:50 +02:00
Yan Wang
6e7050555e Optimize pyrUp_unrolled() by mad function.
It could improve performance when image size is large.
E.g. OCL_PyrUpFixture_PyrUp.PyrUp/18
2014-11-26 16:55:08 +08:00
Alexander Alekhin
569a95e9e1 Merge pull request #3394 from akarsakov:ocl_canny 2014-11-07 12:42:24 +00:00
Alexander Karsakov
7c870014fb Correctly unrolled some cycles 2014-11-07 12:13:00 +03:00
Alexander Karsakov
0ec0aeb7d0 Minor optimization for ocl_canny 2014-11-06 13:07:33 +03:00
vbystricky
957e5ef8eb Fix OpenCL version of HoughLinesP function 2014-11-05 14:31:06 +04:00
Alexander Karsakov
643c906f3d Added optimized loading to YUV2RGB_422 kernel 2014-10-28 15:07:51 +03:00
Alexander Karsakov
1466621f99 Added loading 4 pixels in line instead of 2 to RGB[A] -> YUV(420) kernel 2014-10-27 16:00:34 +03:00
Alexander Karsakov
60367907fe Used direct float calculations 2014-10-21 17:18:03 +03:00