Inlining and customizing sampling functions to reduce memory traffic and compute
Improve calcOrientation implementation.
Using more efficient rounding routines.
Removing unnecessary use of local memory
1. Added more sync to reduction.
2. Turned off Image2D feature. Probably its support is not detected correctly.
3. Temporary disabled descriptor tests - can't localize a problem of the ocl descriptor.