Add OpenCL support to linearPolar & logPolar.
The OpenCL code use float instead of double, so that it does not require
cl_khr_fp64 extension, with slight precision lost.
Add explicit conversion
Add explicit conversion from double to float to eliminate warning during
compilation.
See the below code snippet:
while(l_counter != 0)
{
int mod = l_counter % LOCAL_TOTAL;
int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);
for (int i = 0; i < pix_per_thr; ++i)
{
int index = atomic_dec(&l_counter) - 1;
....
}
....
barrier(CLK_LOCAL_MEM_FENCE);
}
If we don't put a barrier before the for loop, then there is a possiblity
that some work item enter this loop but the others are not, the the l_counter
will be reduced in the for loop and may be changed to zero, and the other
work items may can't enter the while loop. If this happens, it breaks the
barrier's rule which requires all the work items reach the same barrier.
And it may hang the GPU depends on the implementation of opencl platform.
This issue is raised at:
https://github.com/Itseez/opencv/issues/5175
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);
The pix_per_thr * LOCAL_TOTAL may be larger than l_counter.
Thus the index of l_stack may be negative which may cause serious
problems. Let's skip the loop when we get negative index and we need
to add back the lcounter to keep its balance and avoid potential
negative counter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
According to opencl 1.2 spec 6.1.5:
For arguments to a __kernel function declared to be a pointer to a
data type, the OpenCL compiler can assume that the pointee is always
appropriately aligned as required by the data type. The behavior of
an unaligned load or store is undefined, except for the
vloadn, vload_halfn, vstoren, and vstore_halfn functions defined in
section 6.12.7.
Original code read data of type T from address not aligned by multiple
of sizeof(T), so the result is incorrect. With this patch, the cases
./opencv_perf_imgproc
--gtest_filter=OCL_ImgSize_TmplSize_Method_MatType_MatchTemplate.MatchTemplate/*
could work well with beignet 0.9.3.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
new hysteresis
delete whitespaces
fix problem with mad24
Dynamic work group size
dynamic work group size
Fix problem with warnings
Fix some problems with border
Another one fix
Delete trailing whitespaces
some changes
fix problem with warning