IPP can be switched on and off on runtime;
Optional implementation collector was added (switched off by default in CMake). Gathers data of implementation used in functions and report this info through performance TS;
TS modifications for implementations control;
1. decrease branch number in CL code by replacing them into weights
2. decrease local mem pressure in reduce operation by using private variables
3. decrease image sampler pressure by caching data into local memory
4. remove unnecessary sync point on the HOST side.
Pull requests:
#943 from jet47:cuda-5.5-support
#944 from jet47:cmake-2.8.11-cuda-fix
#912 from SpecLad:contributing
#934 from SpecLad:parallel-for
#931 from jet47:gpu-test-fixes
#932 from bitwangyaoyao:2.4_fixBFM
#918 from bitwangyaoyao:2.4_samples
#924 from pengx17:2.4_arithm_fix
#925 from pengx17:2.4_canny_tmp_fix
#927 from bitwangyaoyao:2.4_perf
#930 from pengx17:2.4_haar_ext
#928 from apavlenko:bugfix_3027
#920 from asmorkalov:android_move
#910 from pengx17:2.4_oclgfft
#913 from janm399:2.4
#916 from bitwangyaoyao:2.4_fixPyrLK
#919 from abidrahmank:2.4
#923 from pengx17:2.4_macfix
Conflicts:
modules/calib3d/src/stereobm.cpp
modules/features2d/src/detectors.cpp
modules/gpu/src/error.cpp
modules/gpu/src/precomp.hpp
modules/imgproc/src/distransform.cpp
modules/imgproc/src/morph.cpp
modules/ocl/include/opencv2/ocl/ocl.hpp
modules/ocl/perf/perf_color.cpp
modules/ocl/perf/perf_imgproc.cpp
modules/ocl/perf/perf_match_template.cpp
modules/ocl/perf/precomp.cpp
modules/ocl/perf/precomp.hpp
modules/ocl/src/arithm.cpp
modules/ocl/src/canny.cpp
modules/ocl/src/filtering.cpp
modules/ocl/src/haar.cpp
modules/ocl/src/hog.cpp
modules/ocl/src/imgproc.cpp
modules/ocl/src/opencl/haarobjectdetect.cl
modules/ocl/src/pyrlk.cpp
modules/video/src/bgfg_gaussmix2.cpp
modules/video/src/lkpyramid.cpp
platforms/linux/scripts/cmake_arm_gnueabi_hardfp.sh
platforms/linux/scripts/cmake_arm_gnueabi_softfp.sh
platforms/scripts/ABI_compat_generator.py
samples/ocl/facedetect.cpp