mirror of
https://github.com/opencv/opencv.git
synced 2024-12-14 00:39:13 +08:00
2f35847960
2x more accurate float => bfloat conversion #26321 There is a magic trick to make float => bfloat conversion more accurate (_original reference needed, is it done this way in PyTorch?_). In simplified form it looks like: ``` uint16_t f2bf(float x) { union { unsigned u; float f; } u; u.f = x; // return (uint16_t)(u.u >> 16); <== the old method before this patch return (uint16_t)((u.u + 0x8000) >> 16); } ``` it works correctly for almost all valid floating-point values, positive, zero or negative, and even for some extreme cases, like `+/-inf`, `nan` etc. The addition of `0x8000` to integer representation of 32-bit float before retrieving the highest 16 bits reduces the rounding error by ~2x. The slight problem with this improved method is that the numbers very close to or equal to `+/-FLT_MAX` are mistakenly converted to `+/-inf`, respectively. This patch implements improved algorithm for `float => bfloat` conversion in scalar and vector form; it fixes the above-mentioned problem using some extra bit magic, i.e. 0x8000 is not added to very big (by absolute value) numbers: ``` // the actual implementation is more efficient, // without conditions or floating-point operations, see the source code return (uint16_t)(u.u + (fabsf(x) <= big_threshold ? 0x8000 : 0)) >> 16); ``` The corresponding test has been added as well and this is output from the test: ``` [----------] 1 test from Core_BFloat [ RUN ] Core_BFloat.convert maxerr0 = 0.00774842, mean0 = 0.00190643, stddev0 = 0.00186063 maxerr1 = 0.00389057, mean1 = 0.000952614, stddev1 = 0.000931268 [ OK ] Core_BFloat.convert (7 ms) ``` Here `maxerr0, mean0, stddev0` are for the original method and `maxerr1, mean1, stddev1` are for the new method. As you can see, there is a significant improvement in accuracy. **Note:** _Actually, on ~32,000,000 random FP32 numbers with uniformly distributed sign, exponent and mantissa the new method is always at least as accurate as the old one._ The test also checks all the corner cases, where we see no degradation either vs the original method. - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake |
||
---|---|---|
.. | ||
ocl | ||
ref_reduce_arg.impl.hpp | ||
test_allocator.cpp | ||
test_arithm.cpp | ||
test_async.cpp | ||
test_concatenation.cpp | ||
test_conjugate_gradient.cpp | ||
test_countnonzero.cpp | ||
test_cuda.cpp | ||
test_downhill_simplex.cpp | ||
test_ds.cpp | ||
test_dxt.cpp | ||
test_eigen.cpp | ||
test_hal_core.cpp | ||
test_hasnonzero.cpp | ||
test_intrin128.simd.hpp | ||
test_intrin256.simd.hpp | ||
test_intrin512.simd.hpp | ||
test_intrin_emulator.cpp | ||
test_intrin_utils.hpp | ||
test_intrin.cpp | ||
test_io.cpp | ||
test_logtagconfigparser.cpp | ||
test_logtagmanager.cpp | ||
test_lpsolver.cpp | ||
test_main.cpp | ||
test_mat.cpp | ||
test_math.cpp | ||
test_misc.cpp | ||
test_opencl.cpp | ||
test_operations.cpp | ||
test_precomp.hpp | ||
test_ptr.cpp | ||
test_quaternion.cpp | ||
test_rand.cpp | ||
test_rotatedrect.cpp | ||
test_umat.cpp | ||
test_utils_tls.impl.hpp | ||
test_utils.cpp |