Parallelize Canny with custom gradient (#8694)
* New Canny implementation. Restructuring code in parallelCanny class. Align mag buffer and map.
* Fix warnings.
* Missing SIMD check added.
* Replaced local trailingZeros in contours.cpp. Use alignSize in canny.cpp
* Fix warnings in alignSize and allocate just minimum extra columns.
* Fix another warning in map.create.
* Exchange for loop by do loop to avoid double check at the beginning.
Define extra SIMD CANNY_CHECK to avoid unnecessary continue.
* Correct the existing documented T-API functions to match the doxygen format.
* docs: fix comments style
* T-API documentation: minor formatting changes
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
Add support for std::array<T, N> (#8535)
* Add support for std::array<T, N>
* Add std::array<Mat, N> support
* Remove UMat constructor with std::array parameter
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f5469.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
- use suffixes like '.avx.cpp'
- added CMake-generated files for '.simd.hpp' optimization approach
- wrap HAL intrinsic headers into separate namespaces for different build flags
- automatic vzeroupper insertion (via CV_INSTRUMENT_REGION macro)
`template<typename _Tp> inline const _Tp* Mat_<_Tp>::operator [](int y) const` does not support 3d matrix since it checks rows.
This operator[] shall check size.p[0] instead.
* Fix the documentation for Mat::diag(int).
Fix issue #8181
* Fix the documentation for Mat::diag(int).
Fix issue #8181.
* Add support for printing out cv::Complex.
* Remove extra spaces.
* cv::Complex is submitted as a new pull request.
Add support for printing out cv::Complex. (#8208)
* Add support for printing out cv::Complex.
* Conform to the format of std::complex.
* Remove extra spaces.
* Remove extra spaces.
Fix typos in the documentation for AutoBuffer. (#8197)
* Allocate 1000 floats to match the documentation
Fix the documentation of `AutoBuffer`. By default, the following code
```.cpp
cv::AutoBuffer<float> m;
````
allocates only 264 floats. But the comment in the demonstration code says it allocates 1000 floats, which is
not correct.
* fix typo in the comment.
Currently, to select a submatrix of a N-dimensional matrix, it requires
two lines of code while only one line of code is required if using a 2D
array.
I added functionality to be able to select an N-dim submatrix using a
vector list instead of a Range pointer. This allows initializer lists to
be used for a one-line selection.
This allows for an N-dimensional array to be setup in one line instead of two when using C++11 initializer lists. cv::Mat(3, {zDim, yDim, xDim}, ...) can be used instead of having to create an int pointer to hold the size array.
Maximum depth limit var was added to the instrumentation structure;
Trace names output console output fix: improper tree formatting could happen;
Output in case of error was added;
Custom regions improvements;
Improved timing and weight calculation for parallel regions; New TC (threads counter) value to indicate how many different threads accessed particular node;
parallel_for, warnings fixes and ReturnAddress code from Alexander Alekhin;
* use hasSIMD128 rather than calling checkHardwareSupport
* add SIMD check in spartialgradient.cpp
* add SIMD check in stereosgbm.cpp
* add SIMD check in canny.cpp
A bug in ICC improperly identified the first parameter as "void*"
rather than the proper "volatile long*". This is scheduled to be
fixed in ICC in a future release.
This patch casts only to a "long*" to preserve backwards compatibility
with the ICC 16 and ICC 17 releases.
[GSOC] New camera model for stitching pipeline
* implement estimateAffine2D
estimates affine transformation using robust RANSAC method.
* uses RANSAC framework in calib3d
* includes accuracy test
* uses SVD decomposition for solving 3 point equation
* implement estimateAffinePartial2D
estimates limited affine transformation
* includes accuracy test
* stitching: add affine matcher
initial version of matcher that estimates affine transformation
* stitching: added affine transform estimator
initial version of estimator that simply chain transformations in homogeneous coordinates
* calib3d: rename estimateAffine3D test
test Calib3d_EstimateAffineTransform rename to Calib3d_EstimateAffine3D. This is more descriptive and prevents confusion with estimateAffine2D tests.
* added perf test for estimateAffine functions
tests both estimateAffine2D and estimateAffinePartial2D
* calib3d: compare error in square in estimateAffine2D
* incorporates fix from #6768
* rerun affine estimation on inliers
* stitching: new API for parallel feature finding
due to ABI breakage new functionality is added to `FeaturesFinder2`, `SurfFeaturesFinder2` and `OrbFeaturesFinder2`
* stitching: add tests for parallel feature find API
* perf test (about linear speed up)
* accuracy test compares results with serial version
* stitching: use dynamic_cast to overcome ABI issues
adding parallel API to FeaturesFinder breaks ABI. This commit uses dynamic_cast and hardcodes thread-safe finders to avoid breaking ABI.
This should be replaced by proper method similar to FeaturesMatcher on next ABI break.
* use estimateAffinePartial2D in AffineBestOf2NearestMatcher
* add constructor to AffineBestOf2NearestMatcher
* allows to choose between full affine transform and partial affine transform. Other params are the as for BestOf2NearestMatcher
* added protected field
* samples: stitching_detailed support affine estimator and matcher
* added new flags to choose matcher and estimator
* stitching: rework affine matcher
represent transformation in homogeneous coordinates
affine matcher: remove duplicite code
rework flow to get rid of duplicite code
affine matcher: do not center points to (0, 0)
it is not needed for affine model. it should not affect estimation in any way.
affine matcher: remove unneeded cv namespacing
* stitching: add stub bundle adjuster
* adds stub bundle adjuster that does nothing
* can be used in place of standard bundle adjusters to omit bundle adjusting step
* samples: stitching detailed, support no budle adjust
* uses new NoBundleAdjuster
* added affine warper
* uses R to get whole affine transformation and propagates rotation and translation to plane warper
* add affine warper factory class
* affine warper: compensate transformation
* samples: stitching_detailed add support for affine warper
* add Stitcher::create method
this method follows similar constructor methods and returns smart pointer. This allows constructing Stitcher according to OpenCV guidelines.
* supports multiple stitcher configurations (PANORAMA and SCANS) for convenient setup
* returns cv::Ptr
* stitcher: dynamicaly determine correct estimator
we need to use affine estimator for affine matcher
* preserves ABI (but add hints for ABI 4)
* uses dynamic_cast hack to inject correct estimator
* sample stitching: add support for multiple modes
shows how to use different configurations of stitcher easily (panorama stitching and scans affine model)
* stitcher: find features in parallel
use new FeatureFinder API to find features in parallel. Parallelized using TBB.
* stitching: disable parallel feature finding for OCL
it does not bring much speedup to run features finder in parallel when OpenCL is enabled, because finder needs to wait for OCL device.
Also, currently ORB is not thread-safe when OCL is enabled.
* stitching: move matcher tests
move matchers tests perf_stich.cpp -> perf_matchers.cpp
* stitching: add affine stiching integration test
test basic affine stitching (SCANS mode of stitcher) with images that have only translation between them
* enable surf for stitching tests
stitching.b12 test was failing with surf
investigated the issue, surf is producing good result. Transformation is only slightly different from ORB, so that resulting pano does not exactly match ORB's result. That caused sanity check to fail.
* added size checks similar to other tests
* sanity check will be applied only for ORB
* stitching: fix wrong estimator choice
if case was exactly wrong, estimators were chosen wrong
added logging for estimated transformation
* enable surf for matchers stitching tests
* enable SURF
* rework sanity checking. Check estimated transform instead of matches. Est. transform should be more stable and comparable between SURF and ORB.
* remove regression checking for VectorFeatures tests. It has a lot if data andtest is the same as previous except it test different vector size for performance, so sanity checking does not add any value here. Added basic sanity asserts instead.
* stitching tests: allow relative error for transform
* allows .01 relative error for estimated homography sanity check in stitching matchers tests
* fix VS warning
stitching tests: increase relative error
increase relative error to make it pass on all platforms (results are still good).
stitching test: allow bigger relative error
transformation can differ in small values (with small absolute difference, but large relative difference). transformation output still looks usable for all platforms. This difference affects only mac and windows, linux passes fine with small difference.
* stitching: add tests for affine matcher
uses s1, s2 images. added also new sanity data.
* stitching tests: use different data for matchers tests
this data should yeild more stable transformation (it has much more matches, especially for surf). Sanity data regenerated.
* stitching test: rework tests for matchers
* separated rotation and translations as they are different by scale.
* use appropriate absolute error for them separately. (relative error does not work for values near zero.)
* stitching: fix affine warper compensation
calculation of rotation and translation extracted for plane warper was wrong
* stitching test: enable surf for opencl integration tests
* enable SURF with correct guard (HAVE_OPENCV_XFEATURES2D)
* add OPENCL guard and correct namespace as usual for opencl tests
* stitching: add ocl accuracy test for affine warper
test consistent results with ocl on and off
* stitching: add affine warper ocl perf test
add affine warper to existing warper perf tests. Added new sanity data.
* stitching: do not overwrite inliers in affine matcher
* estimation is run second time on inliers only, inliers produces in second run will not be therefore correct for all matches
* calib3d: add Levenberg–Marquardt refining to estimateAffine2D* functions
this adds affine Levenberg–Marquardt refining to estimateAffine2D functions similar to what is done in findHomography.
implements Levenberg–Marquardt refinig for both full affine and partial affine transformations.
* stitching: remove reestimation step in affine matcher
reestimation step is not needed. estimateAffine2D* functions are running their own reestimation on inliers using the Levenberg-Marquardt algorithm, which is better than simply rerunning RANSAC on inliers.
* implement partial affine bundle adjuster
bundle adjuster that expect affine transform with 4DOF. Refines parameters for all cameras together.
stitching: fix bug in BundleAdjusterAffinePartial
* use the invers properly
* use static buffer for invers to speed it up
* samples: add affine bundle adjuster option to stitching_detailed
* add support for using affine bundle adjuster with 4DOF
* improve logging of initial intristics
* sttiching: add affine bundle adjuster test
* fix build warnings
* stitching: increase limit on sanity check
prevents spurious test failures on mac. values are still pretty fine.
* stitching: set affine bundle adjuster for SCANS mode
* fix bug with AffineBestOf2NearestMatcher (we want to select affine partial mode)
* select right bundle adjuster
* stitching: increase error bound for matcher tests
* this prevents failure on mac. tranformation is still ok.
* stitching: implement affine bundle adjuster
* implements affine bundle adjuster that is using full affine transform
* existing test case modified to test both affinePartial an full affine bundle adjuster
* add stitching tutorial
* show basic usage of stitching api (Stitcher class)
* stitching: add more integration test for affine stitching
* added new datasets to existing testcase
* removed unused include
* calib3d: move `haveCollinearPoints` to common header
* added comment to make that this also checks too close points
* calib3d: redone checkSubset for estimateAffine* callback
* use common function to check collinearity
* this also ensures that point will not be too close to each other
* calib3d: change estimateAffine* functions API
* more similar to `findHomography`, `findFundamentalMat`, `findEssentialMat` and similar
* follows standard recommended semantic INPUTS, OUTPUTS, FLAGS
* allows to disable refining
* supported LMEDS robust method (tests yet to come) along with RANSAC
* extended docs with some tips
* calib3d: rewrite estimateAffine2D test
* rewrite in googletest style
* parametrize to test both robust methods (RANSAC and LMEDS)
* get rid of boilerplate
* calib3d: rework estimateAffinePartial2D test
* rework in googletest style
* add testing for LMEDS
* calib3d: rework estimateAffine*2D perf test
* test for LMEDS speed
* test with/without Levenberg-Marquart
* remove sanity checking (this is covered by accuracy tests)
* calib3d: improve estimateAffine*2D tests
* test transformations in loop
* improves test by testing more potential transformations
* calib3d: rewrite kernels for estimateAffine*2D functions
* use analytical solution instead of SVD
* this version is faster especially for smaller amount of points
* calib3d: tune up perf of estimateAffine*2D functions
* avoid copying inliers
* avoid converting input points if not necessary
* check only `from` point for collinearity, as `to` does not affect stability of transform
* tutorials: add commands examples to stitching tutorials
* add some examples how to run stitcher sample code
* mention stitching_detailed.cpp
* calib3d: change computeError for estimateAffine*2D
* do error computing in floats instead of doubles
this have required precision + we were storing the result in float anyway. This make code faster and allows auto-vectorization by smart compilers.
* documentation: mention estimateAffine*2D function
* refer to new functions on appropriate places
* prefer estimateAffine*2D over estimateRigidTransform
* stitching: add camera models documentations
* mention camera models in module documentation to give user a better overview and reduce confusion
* use __GNUC_MINOR__ in correct place to check the version of GCC
* check processor support of FP16 at run time
* check compiler support of FP16 and pass correct compiler option
* rely on ENABLE_AVX on gcc since AVX is generated when mf16c is passed
* guard correctly using ifdef in case of various configuration
* use v_float16x4 correctly by including the right header file
- calculate ticksTotal instead of ticksMean
- local / global width is based on ticksTotal value
- added instrumentation for OpenCL program compilation
- added instrumentation for OpenCL kernel execution
* use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation
* define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed
* brush up implementation on old compiler (guard correctly)
* add test for v_load_f16 and round trip conversion of v_float16x4
* fix conversion error
* use universal intrinsic for accumulate series using float/double
* accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
* add v_cvt_f64_high in both SSE/NEON
* add test for conversion v_cvt_f64_high in test_intrin.cpp
* improve some existing universal intrinsic by using new instructions in Aarch64
* add workaround for Android build in intrin_neon.hpp
* Added 2-channel ops to match existing 3-channel and 4-channel ops
* v_load_deinterleave() and v_store_interleave()
* Implements float32x4 only on SSE (but all types on NEON and CPP)
* Includes tests
* Will be used to vectorize 2D functions, such as estimateAffine2D()
* raise an error when wrong bit depth passed
* raise an build error when wrong depth is specified for cvtScaleHalf_
* remove unnecessary safe check in cvtScaleHalf_
* use intrinsic instead of direct pointer access
* update the explanation
Major changes:
- modify the Base64 functions to compatible with `cvWriteRawData` and so
on.
- add a Base64 flag for FileStorage and outputs raw data in Base64
automatically.
- complete all testing and documentation.
The three new functions:
```cpp
void cvStartWriteRawData_Base64(::CvFileStorage * fs, const char* name,
int len, const char* dt);
void cvWriteRawData_Base64(::CvFileStorage *
fs, const void* _data, int len);
void
cvEndWriteRawData_Base64(::CvFileStorage * fs);
```
Test is also updated. (And it's remarkable that there is a bug in
`cvWriteReadData`.)
* check compiler support
* check HW support before executing
* add test doing round trip conversion from / to FP32
* treat array correctly if size is not multiple of 4
* add declaration to prevent warning
* make it possible to enable fp16 on 32bit ARM
* let the conversion possible on non-supported HW, too.
* add test using both HW and SW implementation
Changed statements of type "#if __CUDA_ARCH__ >= 200" to
"#if defined __CUDA_ARCH__ && __CUDA_ARCH__ >= 200" in order to
avoid warnings about __CUDA_ARCH__ being undefined.
computes the complement of the Jaccard Index as described in
https://en.wikipedia.org/wiki/Jaccard_index. For rectangles this reduces
to computing the intersection over the union.
- added new functions from core module: split, merge, add, sub, mul, div, ...
- added function replacement mechanism
- added example of HAL replacement library
Instead of chaining a bunch of sanity checks together with "&&", let's just have several asserts. That way, when an assert fails, you don't get a monsterous "<huge evil expression>
failed" error, but only the bit you care about, making your life rather a lot easier.
The 12 and 16 arguments Matx constructors differs from all others,
leaving values initialized and requiring the argument number to be equal
to the channels number.
dotProd_16s - disabled for IPP 9.0.0;
filter2D - fixed kernel preparation;
morphology - conditions fix and disabled FilterMin and FilterMax for IPP 9.0.0;
GaussianBlur - disabled for CV_8UC1 due to buffer overflow;
integral - disabled for IPP 9.0.0;
IppAutoBuffer class was added;
HAVE_IPP_ICV_ONLY will be undefined if OpenCV was linked against ICV packet from IPP9 or greater. ICV9+ packets will be aligned with IPP in OpenCV APIs
This will ease code management between IPP and ICV
MSVC and GCC compilers interprets cv::String a(0) as a valid
statement with conversion of "int" argument to "const char*".
This patch forbids this expected behaviour.
IPP_VERSION_MAJOR * 100 + IPP_VERSION_MINOR*10 + IPP_VERSION_UPDATE
to manage changes between updates more easily.
IPP_DISABLE_BLOCK was added to ease tracking of disabled IPP functions;
some mandatory string keys like paths must not be empty. Add the special
default value `<none>` so the CommandLineParser can enforce this and
generate an according error message for us.
Disables TLS copy constructor and operator, as they can lead to errors and reservation of too much keys in TLS storage;
gather method was added to TLS to gather data from all threads;
- IPP is disabled by default when compiler is mingw (couldn't make it
work)
- fixed some warnings
- fixed some `__GNUC__` version checks (for correctness and convenience)
- removed UTF-8 BOM from hough.cpp (fixes#5253)
rewrite & change convertFromGLBuffer() & convertToGLBuffer() into acquireGLBuffer() & releaseGLBuffer(), respectively
opengl sample: added buffer support
tested and fixed buffer support on Windows
change glFlush() call to glFinish()
added UMat::release() call; fixed functions' names
adopted & implemented API suggestion(s) from Alexander
fixed unreachable code warning
added more info to the mapGLBuffer/unmapGLBuffer description
add template specialization Mat::push_back() for MatExpr paramters
extend push_back MatExpr to mat in unit test
cast to object instead of reference
test with multi-row MatExpr input
057cd52 first versions: cv::ogl::convertFromGLTexture2D & cv::ogl::convertToGLTexture2D
5656e94 added autogenerated stuff for cl_gl.h
765f1fd resolved CL functions in opengl.cpp
9f9fee3 implemented function cv::ogl::ocl::initializeContextFromGLTexture2D()
a792adb cv::ogl::ocl::initializeContextFromGLTexture2D() - added linux support (glx.h)
51c2869 added missing error message in function cv::ogl::ocl::initializeContextFromGLTexture2D()
513b887 fixed extension call in function cv::ogl::ocl::initializeContextFromGLTexture2D()
475a3e9 added CL-GL interop Windows sample (gpu/opengl_interop.cpp)
07af28f added building of CL-GL interop sample - Windows only
befe3a2 fixed whitespace errors & doxygen warnings (precommit_docs)
551251a changed function name to cv::ogl::ocl::initializeContextFromGL(), removed unused argument
4d5f009 changed CL_DEVICES_FOR_GL_CONTEXT_KHR to CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR
9fc3055 changed CL_DEVICES_FOR_GL_CONTEXT_KHR to CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KH
6d31cee Revert "changed CL_DEVICES_FOR_GL_CONTEXT_KHR to CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KH"
cc6a025 added texture format check in cv::ogl::convertFromGLTexture2D()
063a2c1 CL-GL sample: added Linux implementation (Xlib/GLX)
c392ae9 fixed trailing whitespace
85a80d0 fixed include files
ae23628 excluded samples/opengl from build case 2
9870ea5 added android EGL support
530b64c added doxygen documentation comments to CL-GL interop functions
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
Removed IPP port for tiny arithm.cpp functions
Additional warnings fix on various platforms.
Build without OPENCL and GCC warnings fixed
Fixed warnings, trailing spaces and removed unused secure_cpy.
IPP code refactored.
IPP code path implemented as separate static functions to simplify future work with IPP code and make it more readable.
2. Algorithm::load/save added (moved from StatModel)
3. copyrights updated; added copyright/licensing info for ffmpeg
4. some warnings from Xcode 6.x are fixed
- note: uses VFPv3 instructions
- also added overloaded cvRound variants with float and int parameters
- thanks to Marina Kolpakova from Itseez for idea
- thanks to developers from #llvm IRC channel for help with inline asm