Commit Graph

267 Commits

Author SHA1 Message Date
Li Peng
6cb73356b1 laplacian ocl kernel optimization
This ocl kernel is 46%~171% faster than current laplacian 3x3
ocl kernel in the perf test, with image format "CV_8UC1".

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-17 12:01:02 +08:00
Li Peng
8d4a7d3dcc sobel and scharr ocl kernel optimization
It improves 108%~230% performance in the perf test
with image format "CV_8UC1" and kernel size 3.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-14 15:34:59 +08:00
Alexander Alekhin
17ffb28807 Merge pull request #7602 from mshabunin:fix-opencl-warnings 2016-11-09 12:35:00 +00:00
Li Peng
8f63f51e81 gaussian blur ocl kernel optimization
This ocl kernel is for 3x3 kernel size and CV_8UC1 format
It is 115% ~ 300% faster than current ocl path in perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture*

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-08 11:22:26 +08:00
Alexander Alekhin
442380bfac Merge pull request #7585 from pengli:morph_filter 2016-11-07 17:11:32 +00:00
mshabunin
3e28d51779 Fixed several OpenCL compiler warnings 2016-11-07 16:49:12 +03:00
Li Peng
35198b84a4 morph ocl kernel for erode and dilate filter
This kernel is for CV_8UC1 format and 3x3 kernel size,
It is about 33% ~ 55% faster than current ocl kernel with below perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture*
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture*

Also add accuracy test cases for this kernel, the test command is

./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3*

Signed-off-by: Li Peng <peng.li@intel.com>
2016-11-04 12:24:24 +08:00
Vadim Pisarevsky
2b7866f21b Merge pull request #7503 from pengli:box_filter_v2 2016-10-29 21:20:06 +00:00
Li Peng
3607da9f6b ocl kernel performance optimization for box filter
The optimization is for CV_8UC1 format and 3x3 box filter,
it is 15%~87% faster than current ocl kernel with below perf test

./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture*

Also add test cases for this ocl kernel.

Signed-off-by: Li Peng <peng.li@intel.com>
2016-10-26 11:56:11 +08:00
LukeZhu
ef47bcc88b Fix the problem: filterSmall.cl report error with double 2016-10-17 15:12:42 +08:00
Alexander Alekhin
b8e08d5d3c ocl: fix Canny for Intel devices
There is an issue with processing of abs(short) function for
negative argument.

Affected OpenCL devices:
- iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 )
- CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))
2016-08-09 12:48:06 +03:00
ohnozzy
db9f611767 Add OpenCL support to linearPolar & logPolar
Add OpenCL  support to linearPolar & logPolar.
The OpenCL code use float instead of double, so that it does not require
cl_khr_fp64 extension, with slight precision lost.

Add explicit conversion

Add explicit conversion from double to float to eliminate warning during
compilation.
2016-04-24 08:37:56 +08:00
Zhigang Gong
0b08d2559e fix potential race condition in canny.cl.
See the below code snippet:

while(l_counter != 0)
{
    int mod = l_counter % LOCAL_TOTAL;
    int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);

    for (int i = 0; i < pix_per_thr; ++i)
    {
        int index = atomic_dec(&l_counter) - 1;
        ....
    }
    ....
    barrier(CLK_LOCAL_MEM_FENCE);
}

If we don't put a barrier before the for loop, then there is a possiblity
that some work item enter this loop but the others are not, the the l_counter
will be reduced in the for loop and may be changed to zero, and the other
work items may can't enter the while loop. If this happens, it breaks the
barrier's rule which requires all the work items reach the same barrier.
And it may hang the GPU depends on the implementation of opencl platform.

This issue is raised at:
https://github.com/Itseez/opencv/issues/5175

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2016-03-15 19:11:15 +08:00
Zhigang Gong
0f7de40e66 Fixed the race condition between inc and dec on the l_counter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-26 22:06:18 +08:00
Zhigang Gong
3c85200989 Avoid negative index for a local buffer in Canny.cl.
int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);
The pix_per_thr * LOCAL_TOTAL may be larger than l_counter.
Thus the index of l_stack may be negative which may cause serious
problems. Let's skip the loop when we get negative index and we need
to add back the lcounter to keep its balance and avoid potential
negative counter.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-26 08:48:05 +08:00
Pavel Rojtberg
1ea41e7246 fix gftt opencv kernel when using mask 2015-04-22 16:13:50 +02:00
Yan Wang
6e7050555e Optimize pyrUp_unrolled() by mad function.
It could improve performance when image size is large.
E.g. OCL_PyrUpFixture_PyrUp.PyrUp/18
2014-11-26 16:55:08 +08:00
Alexander Alekhin
569a95e9e1 Merge pull request #3394 from akarsakov:ocl_canny 2014-11-07 12:42:24 +00:00
Alexander Karsakov
7c870014fb Correctly unrolled some cycles 2014-11-07 12:13:00 +03:00
Alexander Karsakov
0ec0aeb7d0 Minor optimization for ocl_canny 2014-11-06 13:07:33 +03:00
vbystricky
957e5ef8eb Fix OpenCL version of HoughLinesP function 2014-11-05 14:31:06 +04:00
Alexander Karsakov
643c906f3d Added optimized loading to YUV2RGB_422 kernel 2014-10-28 15:07:51 +03:00
Alexander Karsakov
1466621f99 Added loading 4 pixels in line instead of 2 to RGB[A] -> YUV(420) kernel 2014-10-27 16:00:34 +03:00
Alexander Karsakov
60367907fe Used direct float calculations 2014-10-21 17:18:03 +03:00
Alexander Karsakov
5aa9ac9a77 Added OCL code for YUV422 -> RGB[A]|BGR[A] color conversion 2014-10-21 17:18:03 +03:00
Alexander Karsakov
c8707b891b Added OCL code for RGB[A]|BGR[A] -> YUV_[YV12|IYUV] color conversion 2014-10-21 17:18:03 +03:00
Alexander Karsakov
1cc17a7186 Added OCL code for YUV2BGR_YV12 and YUV2BGR_IYUV color conversions 2014-10-21 17:18:02 +03:00
Alexander Karsakov
85b60ee3cb Added support for YUV2RGB[A]_NV21 and YUV2BGR[A]_NV21 conversion 2014-10-21 17:18:02 +03:00
Vadim Pisarevsky
397870d7a5 Merge pull request #3279 from akarsakov:ocl_houghlines 2014-10-09 14:56:45 +00:00
Alexander Karsakov
66a8acfd3d Optimization for HoughLinesP 2014-10-07 17:53:33 +04:00
Alexander Alekhin
14d5358982 Merge pull request #3210 from akarsakov:ocl_gftt_opt 2014-10-07 09:06:54 +00:00
Alexander Karsakov
eaf5a163b1 Added HoughLinesP OCL implementation 2014-09-29 16:48:16 +04:00
Alexander Karsakov
3695a31606 Combined counter and corner buffers into one 2014-09-29 11:10:57 +04:00
Vadim Pisarevsky
470f427a95 Merge pull request #3232 from Chuanbo-Weng:master 2014-09-18 11:48:29 +00:00
Chuanbo Weng
c5552788c5 Use vload to read unaligned data instead of dereference operator.
According to opencl 1.2 spec 6.1.5:
    For arguments to a __kernel function declared to be a pointer to a
    data type, the OpenCL compiler can assume that the pointee is always
    appropriately aligned as required by the data type. The behavior of
    an unaligned load or store is undefined, except for the
    vloadn, vload_halfn, vstoren, and vstore_halfn functions defined in
    section 6.12.7.

Original code read data of type T from address not aligned by multiple
of sizeof(T), so the result is incorrect. With this patch, the cases
./opencv_perf_imgproc
--gtest_filter=OCL_ImgSize_TmplSize_Method_MatType_MatchTemplate.MatchTemplate/*
could work well with beignet 0.9.3.

Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
2014-09-17 19:28:07 +08:00
Alexander Karsakov
8c08714b8c Remove two "set" kernel call 2014-09-11 18:11:23 +04:00
vbystricky
b0bf8478e5 Optimization OpenCL version of Filter2D 2014-09-11 12:59:51 +04:00
Alexander Karsakov
39b27a19be Refactoring and optimization 2014-09-05 12:20:29 +04:00
Alexander Karsakov
d59a6fa518 Optimization for getLines 2014-09-05 11:37:16 +04:00
Alexander Karsakov
fee8f29f48 Refactoring, minor optimization 2014-09-04 16:31:30 +04:00
Alexander Karsakov
07d57db91c Fixed calculation of l_stack_size 2014-09-03 17:40:17 +04:00
Alexander Karsakov
214dab39f6 Fixed BORDER_REFLECT and BORDER_REFLECT_101 extrapolation for case x > 2*maxV 2014-09-02 11:53:31 +04:00
Alexander Alekhin
b332152bef Merge pull request #2956 from ilya-lavrenov:tapi_accumulate 2014-08-28 09:08:51 +00:00
Alexander Karsakov
f7aadd07f6 Added getLines, fill_accum_local kernels 2014-08-27 17:57:22 +04:00
Vadim Pisarevsky
d66815978a Merge pull request #3117 from KruchDmitriy:canny_opt 2014-08-27 10:07:55 +00:00
VBystricky
9ee0789174 Fix issues 2014-08-26 14:39:11 +04:00
vbystricky
e75cd74f5a Optimize OpenCL version of Laplacian filter for kernel size great than 3 2014-08-25 17:56:09 +04:00
Alexander Karsakov
038bfb98ec Added fill_accum kernel 2014-08-25 13:55:09 +04:00
Ilya Lavrenov
a350b76738 optimization of cv::accumulate** 2014-08-25 11:25:01 +04:00
Alexander Karsakov
5c1f71de51 Added make_point_list kernel 2014-08-22 16:50:01 +04:00