- new reduce implementation (with kepler optimizations)
- saturate_cast via asm command
- video SIMD instructions in element operations
- float arithmetics instead of double
- new deviceSupports function
wrote more complicated tests for them
implemented own version of warpAffine and warpPerspective for different border interpolation types
refactored some gpu tests