Commit Graph

336 Commits

Author SHA1 Message Date
Yuantao Feng
3afe8ddaf8
core: Rename cv::float16_t to cv::hfloat (#25217)
* rename cv::float16_t to cv::fp16_t

* add typedef fp16_t float16_t

* remove zero(), bits() from fp16_t class

* fp16_t -> hfloat

* remove cv::float16_t::fromBits; add hfloatFromBits

* undo changes in conv_winograd_f63.simd.hpp and conv_block.simd.hpp

* undo some changes in dnn
2024-03-21 23:44:19 +03:00
Alexander Alekhin
40533dbf69
Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918

relates #24909
relates #24917
relates #24892

Performance changes:

- [x] 12700K (1 thread) + Intel iGPU

|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).


- [x] 12700K + AMD dGPU

|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core. 
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 12:56:52 +03:00
Alexander Smorkalov
23f27d8dbe Use OpenCV logging instead of std::cerr. 2023-07-19 10:49:54 +03:00
Alexander Smorkalov
d4861bfd1f Merge remote-tracking branch 'origin/3.4' into merge-3.4 2023-05-24 14:37:48 +03:00
Sean McBride
58e4a880a2 Deprecated convertTypeStr and made new variant that also takes the buffer size
This allows removing the unsafe sprintf.
2023-04-26 09:48:15 -04:00
Giles Payne
38e35d5137 Fix ocl::device::isIntel implementation 2023-04-24 22:01:53 +09:00
Sean McBride
47bea69322
Merge pull request #23055 from seanm:sprintf2
* Replaced most remaining sprintf with snprintf
* Deprecated encodeFormat and introduced new method that takes the buffer length
* Also increased buffer size at call sites to be a little bigger, in case int is 64 bit
2023-04-18 09:22:59 +03:00
Namgoo Lee
24547f40ff remove const from functions returning by value 2022-05-26 21:30:41 +09:00
Alexander Alekhin
92925e846d Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-12-29 15:22:13 +00:00
Alexander Alekhin
c5a86c22a4 core(ocl): add option to abort on OpenCL kernel build failure
- exceptions are catched by fallback CPU path
- OPENCV_OPENCL_ABORT_ON_BUILD_ERROR (disabled by default)
2021-12-29 00:04:45 +00:00
Alexander Alekhin
c3ac834526 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-09-11 21:27:26 +00:00
Dale Phurrough
de1a459879
fix opencv/opencv#20613
* copy 4.x selectOpenCLDevice() -- it is compatible
* filter platforms rather than trying only first matching
* this works on 3.4 and 4.x master
2021-09-10 21:18:01 +02:00
Alexander Alekhin
5578ad5e14 dnn(ocl): fix automatic globalsize adjusting
- if kernel code doesn't support that
2021-09-06 03:11:29 +00:00
Alexander Alekhin
aaff125608 core(ocl): debug capabilities 2021-09-04 15:37:39 +00:00
Alexander Alekhin
5aa7435d25 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-09-02 15:24:04 +00:00
Alexander Alekhin
f25951c412 core(ocl): handle NULL in dumpValue() debug call
- NULL is used for allocation of workgroup local variables
2021-08-30 11:47:51 +00:00
Alexander Alekhin
4c05a697fa Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-08-28 21:30:28 +00:00
Dale Phurrough
54a9e00970
fix opencv/opencv#20594 - exception handling with refcounts 2021-08-25 14:38:02 +02:00
Joe Howse
6a3d925a47 OpenCL: core support for FP16, more channel orders
* Support cl_image conversion for CL_HALF_FLOAT (float16)

* Support cl_image conversion for additional channel orders:
  CL_A, CL_INTENSITY, CL_LUMINANCE, CL_RG, CL_RA

* Comment on why cl_image conversion is unsupported for CL_RGB

* Predict optimal vector width for float16

* ocl::kernelToStr: support float16

* ocl::Device::halfFPConfig: drop artificial requirement for OpenCL
  version >= 1.2. Even OpenCL 1.0 supports the underlying config
  property, CL_DEVICE_HALF_FP_CONFIG.

* dumpOpenCLInformation: provide info on OpenCL half-float support
  and preferred half-float vector width

* randu: support default range [-1.0, 1.0] for float16

* TestBase::warmup: support float16
2021-06-30 14:14:37 -03:00
JoeHowse
34183237ce
Merge pull request #20203 from JoeHowse:clMath-patches
Fix dynamic loading of clBLAS and clFFT (formerly, clAmdBlas and clAmdFft)

* Fix dynamic loading of clBLAS and clFFT

* Update filenames and function names for clBLAS (formerly, clAmdBlas)

* Update filenames and function names for clFFT (formerly, clAmdFft)

* Uncomment teardown of clFFT; tear down clFFT in same way as clBLAS

* Fix generators for clBLAS and clFFT headers

* Update generators to parse recent clBLAS and clFFT library headers

* Update generators to be compatible with Python 3

* Re-generate OpenCV's clBLAS and clFFT headers

* Update function calls to match names in newly generated headers

* Disable (and comment on) teardown code for clBLAS and clFFT

* Renaming *clamd* files

* Renaming *clamdblas* files to *clblas*

* Renaming *clamdfft* files to *clfft*

* Update generator for CL headers

* Update generator to be compatible with Python 3
2021-06-07 20:24:27 +00:00
Dale Phurrough
c2ce3d927a
UMat usageFlags fixes opencv/opencv#19807
- corrects code to support non- USAGE_DEFAULT settings
- accuracy, regression, perf test cases
- not tested on the 3.x branch
2021-06-03 16:33:03 +02:00
Alexander Alekhin
cb51a155b2 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-05-29 19:00:14 +00:00
Alexander Alekhin
450dc92452 Merge pull request #20172 from alalek:fixup_19334 2021-05-28 14:09:52 +00:00
Alexander Alekhin
3d394943e6 core(ocl): avoid limit of Image kernel args 2021-05-28 00:43:59 +00:00
Mikhail Nikolskii
a604d44d06
Merge pull request #19755 from mikhail-nikolskiy:ffmpeg-umat
cv::UMat output/input in VideoCapture/VideoWriter (data stays in GPU memory)

* FFMPEG with UMat input/output

* OpenCL_D3D* context

* fix Linux build

* cosmetic changes

* fix build if USE_AV_HW_CODECS=0

* simplify how child context pointer stored in parent context

* QSV interop with OpenCL on Windows

* detect_msdk.cmake via pkg-config

* fix av_buffer_ref() usage

* revert windows-decode-mfx whitelisting; remove debug msg

* address review comments

* rename property to HW_ACCELERATION_USE_OPENCL

* fix issue with "cl_khr_d3d11_sharing" extension not reported by OpenCL GPU+CPU platform

* core(ocl): add OpenCL stubs for configurations without OpenCL

* videoio(ffmpeg): update #if guards

* Put OpenCL related code under HAVE_OPENCL; simplify reuse of media context from OpenCL context

* videoio(test): skip unsupported tests

- plugins don't support OpenCL/UMat yet
- change handling of *_USE_OPENCL flag

* videoio(ffmpeg): OpenCL dependency

* videoio(ffmpeg): MediaSDK/oneVPL dependency

* cleanup, logging

* cmake: fix handling of 3rdparty interface targets

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2021-05-14 16:48:50 +00:00
Alexander Alekhin
1d0334fc07 Merge pull request #19584 from diablodale:fix19573_ocl_move 2021-02-21 19:20:03 +00:00
Dale Phurrough
96a15434a2
add move construct/assigns to cv::ocl main classes
- enables inline construct and assigns with r-values
- enables compiler-created default move
  construct/assigns
- includes test cases
2021-02-20 18:56:04 +01:00
Dale Phurrough
4badf640bf add noexcept to default constructors of cv::ocl
- follows iso c++ guideline C.44
- enables default compiler-created constructors to
  also be noexcept

original commit: 77e26a7db3

- handled KernelArg, Image2D
2021-02-20 16:20:25 +00:00
Dale Phurrough
77e26a7db3
add noexcept to default constructors of cv::ocl
- follows iso c++ guideline C.44
- enables default compiler-created constructors to
  also be noexcept
2021-02-20 14:16:47 +01:00
Alexander Alekhin
e85b41f9be Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-01-25 22:42:13 +00:00
Alexander Alekhin
857f339914 Merge pull request #19385 from alalek:ocl_isOpenCLActivated_update 2021-01-25 13:54:00 +00:00
Alexander Alekhin
37e656082b core(ocl): update isOpenCLActivated()
- reuse g_isOpenCLAvailable variable instead
2021-01-24 01:25:17 +00:00
Alexander Alekhin
cd59516433 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2021-01-22 21:29:21 +00:00
Alexander Alekhin
212815a10d core(ocl): fix lifetime handling of Image kernel args 2021-01-18 06:24:36 +00:00
Alexander Alekhin
15265918a7 Merge pull request #19133 from diablodale:fix19132-opencvactivated 2020-12-23 12:08:38 +00:00
Alexander Alekhin
624d532000 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-12-17 21:05:34 +00:00
Alexander Alekhin
d159417474 Merge pull request #19101 from alalek:issue_5209 2020-12-16 22:13:18 +00:00
Dale Phurrough
bb59b81d82
remove g_isOpenCVActivated assign and clarify 2020-12-16 00:27:32 +01:00
Alexander Alekhin
4b3d2c8834 dnn(ocl): fix gemm kernels with beta=0
- dst is not initialized, may include NaN values
- 0*NaN produces NaN
2020-12-15 00:58:43 +00:00
Alexander Alekhin
392991fa0b core(opencl): add version check before clCreateFromGLTexture() call 2020-12-13 20:57:26 +00:00
Dale Phurrough
c08e38acd0
fix missing addref() in ocl::Context::create(str)
- fix https://github.com/opencv/opencv/issues/18906
- unable to add related test cases as there is
  no public access to Context:Impl refcounts
2020-11-25 01:53:41 +01:00
masa-iwm
5ac0712cf1
Merge pull request #18593 from masa-iwm:master
Add support thread-local directx (OpenCL interop) initialization

* support thread-local directx (OpenCL interop) initialization

* reflect reviews

* Remove verbose function prototype declarations

* Countermeasures for VC warnings. (declaration of 'platform' hides class member)

* core(directx): remove internal stuff from public headers
2020-10-18 21:22:06 +00:00
Alexander Alekhin
c945ea125a ocl: fix PlatformInfo usage 2020-09-25 19:22:12 +00:00
Alexander Alekhin
f52a2cf5e1 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-09-19 17:03:08 +00:00
Alexander Alekhin
4fa82809df ocl: avoid rescheduling of async kernels 2020-09-18 14:53:50 +00:00
Alexander Alekhin
8711653530 ocl: fixes for OpenCL multiple contexts support 2020-09-03 20:34:49 +00:00
Alexander Alekhin
2129c72bc0 core(OpenCL): thread-local OpenCL execution context 2020-09-02 05:04:20 +00:00
Alexander Alekhin
0428dce27d Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-09-01 20:59:00 +00:00
Alexander Alekhin
efcf307b4c ocl: cleanup dead code in case of disabled OpenCL 2020-08-31 11:30:42 +00:00
Alexander Alekhin
b45273eccb Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-08-14 19:45:45 +00:00