opencv

mirror of https://github.com/opencv/opencv.git synced 2025-01-10 22:28:13 +08:00

Author	SHA1	Message	Date
Li Peng	7a4c5e9421	slice layer ocl support Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-29 22:34:32 +08:00
Alexander Alekhin	2876670de3	dnn(ocl): fix build options for Apple OpenCL	2018-01-28 01:54:25 +00:00
Alexander Alekhin	104502c5be	Merge pull request #10676 from dkurt:dnn_for_newer_mobilenet_ssd	2018-01-26 04:02:21 +00:00
Li Peng	2493083935	mvn, batch_norm and relu layer fusion Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-25 18:57:05 +08:00
Li Peng	e15928b49e	convolution and tanh layer fusion Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-25 17:45:33 +08:00
Dmitry Kurtaev	9e9926a2f0	PriorBox layer with explicit normalized sizes	2018-01-24 14:01:42 +03:00
Alexander Alekhin	26e0f408f0	Merge pull request #10639 from pengli:dnn	2018-01-19 10:01:41 +00:00
Li Peng	fe494297e4	more update on MVN layer ocl implementation cut one ocl kernel if normVariance is disabled, also use native_powr for performance reason. Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-19 22:54:04 +08:00
Alexander Alekhin	c3569211d5	Merge pull request #10591 from drkoller:master	2018-01-19 09:44:21 +00:00
Li Peng	2124361ff7	ocl support for Deconvolution layer Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-18 23:40:22 +08:00
David Koller	d1a3b530be	Make DNN Crop layer match Caffe default offset behavior and add parametric unit test for crop layer.	2018-01-17 10:52:36 -05:00
Li Peng	e77af4ae33	MVN layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-17 17:11:32 +08:00
Li Peng	7bc017601f	Power, Tanh and Channels ReLU layer ocl support Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-17 17:11:27 +08:00
Li Peng	4189214d04	batch_norm layer ocl update use a batch_norm ocl kernel to do the work Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-16 19:01:58 +08:00
Dmitry Kurtaev	1f4fdfd599	Untrainable version of Scale layer from Caffe	2018-01-13 10:35:29 +03:00
Dmitry Kurtaev	64a9e92390	Merge pull request #10466 from dkurt:reduce_umat_try_2 * UMat blobs are wrapped * Replace getUMat and getMat at OpenCLBackendWrapper	2018-01-10 21:50:54 +03:00
Li Peng	e3b42bf93b	batch_norm and blank layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-09 21:58:46 +08:00
Li Peng	67f9406cbe	add normalize_bbox layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-05 19:38:36 +08:00
Li Peng	f99a135eda	add eltwise layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-05 19:38:30 +08:00
Li Peng	34bfd7ef51	add ocl implementation of proposal layer Signed-off-by: Li Peng <peng.li@intel.com>	2018-01-04 18:40:51 +08:00
Alexander Alekhin	7d67d60fb1	cmake(opt): AVX512_SKX	2017-12-29 07:18:11 +00:00
Alexander Alekhin	a65b5df5da	Merge pull request #10416 from fenrus75:avx512	2017-12-28 15:56:56 +00:00
Alexander Alekhin	898ca38257	cmake: AVX512 -> AVX_512F	2017-12-28 15:20:27 +00:00
Arjan van de Ven	2938860b3f	Provide a few AVX512 optimized functions for the DNN module This patch adds AVX512 optimized fastConv as well as the hookups needed to get these called in the convolution_layer. AVX512 fastConv is code-identical on a C level to the AVX2 one, but is measurably faster due to AVX512 having more registers available to cache results in. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2017-12-26 16:00:17 +00:00
Alexander Alekhin	adf43e7d2a	build: fix MSVS2010 build error	2017-12-23 00:06:34 +00:00
Dmitry Kurtaev	c67e75b68f	Refactor NMS procedure at RegionLayer	2017-12-21 12:21:45 +03:00
Dmitry Kurtaev	0ed2cbc931	R-FCN models support	2017-12-20 10:43:22 +03:00
Alexander Alekhin	dcdd6af5a8	Merge pull request #10341 from pengli:dnn	2017-12-19 14:04:55 +00:00
Li Peng	910d7dab1f	prior box layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2017-12-19 17:44:10 +08:00
Dmitry Kurtaev	2b43d4f477	Fix default pooling layer type	2017-12-17 16:46:40 +03:00
Maksim Shabunin	1033f2b1bd	Fixed 3 issues found by static analysis	2017-12-15 17:29:26 +03:00
Vadim Pisarevsky	62359f70ff	Merge pull request #10306 from dkurt:faster_rcnn	2017-12-15 12:23:53 +00:00
Dmitry Kurtaev	08112f3821	Faster-RCNN models support	2017-12-15 12:16:21 +03:00
Alexander Alekhin	0da947e6b3	dnn: more debug information	2017-12-14 19:21:17 +03:00
Alexander Alekhin	c231472ad6	Merge pull request #10290 from tomoaki0705:fixVS2012Round	2017-12-13 15:30:21 +00:00
Tomoaki Teshima	ecb6bcf2e0	fix build error on Visual Studio 2012 * round doesn't exists in standard library of Visual Studio 2012 * apply the correct computation of ROI	2017-12-13 17:40:07 +03:00
Alexander Alekhin	eff42f6387	dnn: more debug info	2017-12-12 12:04:10 +03:00
Vadim Pisarevsky	7e680bd9ff	Merge pull request #10215 from dkurt:dnn_js	2017-12-11 12:47:52 +00:00
Vadim Pisarevsky	c24f10d647	Merge pull request #10268 from dkurt:fix_scale_layer	2017-12-08 18:46:50 +00:00
Dmitry Kurtaev	f503515082	JavaScript bindings for dnn module	2017-12-08 18:33:48 +03:00
Dmitry Kurtaev	e307065c8e	Scale layer in case of 2D inputs	2017-12-08 17:34:59 +03:00
Alexander Alekhin	f2070c9f5d	Merge pull request #10255 from dkurt:dnn_roi_pooling	2017-12-08 11:20:07 +00:00
Dmitry Kurtaev	17dcf0e82d	ROIPooling layer	2017-12-07 19:04:38 +03:00
Dmitry Kurtaev	ef0650179b	Fix conv/deconv/fc layers FLOPS computation	2017-12-07 11:42:04 +03:00
Alexander Alekhin	6074f92d48	Merge pull request #10228 from pengli:dnn_new	2017-12-06 15:50:12 +00:00
Li Peng	59cbaca4d3	detection_output layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2017-12-06 22:35:59 +08:00
Li Peng	66feea6cac	region layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2017-12-07 02:26:46 +08:00
Li Peng	7707c9bfba	reorg layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2017-12-07 02:26:46 +08:00
Li Peng	85b1c4060c	support axis in concat layer ocl path Signed-off-by: Li Peng <peng.li@intel.com>	2017-12-07 02:26:46 +08:00
Li Peng	07bec6bdcd	reshape layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2017-12-07 02:26:40 +08:00
Li Peng	7b7033ac60	permute layer ocl implementation Signed-off-by: Li Peng <peng.li@intel.com>	2017-12-05 22:10:05 +08:00
Dmitry Kurtaev	bbbec300a6	nn.BatchNormalization and nn.Dropout layers from Torch	2017-12-04 12:57:21 +03:00
Dmitry Kurtaev	99ed085752	Update PriorBox layer	2017-11-27 16:47:20 +03:00
Alexander Alekhin	f071a48ec7	Merge pull request #10143 from pengli:ocl4dnn	2017-11-23 18:47:14 +00:00
Alexander Alekhin	0f34628af7	dnn: drop OpenCL code path for DetectionOutputLayer getUMat()/getMat() calls are scope based. Results of these calls can't be stored somewhere for future usage.	2017-11-21 17:28:42 +03:00
Alexander Alekhin	438e456ce9	Merge pull request #10113 from wzw-intel:fusion	2017-11-20 18:13:33 +00:00
Dmitry Kurtaev	6c5dd5cf6d	Replace caffe::NormalizedBBox to local structure	2017-11-20 18:03:31 +03:00
Wu Zhiwen	45d11dde57	dnn(ocl4dnn): add fusion support for Power activation and eltwise add Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>	2017-11-20 14:58:53 +08:00
Li Peng	55260a8d3c	reshape mat before doing computation in fc layer Signed-off-by: Li Peng <peng.li@intel.com>	2017-11-13 09:29:50 +08:00
Li Peng	8f99083726	Add new layer forward interface Add layer forward interface with InputArrayOfArrays and OutputArrayOfArrays parameters, it allows UMat buffer to be processed and transferred in the layers. Signed-off-by: Li Peng <peng.li@intel.com>	2017-11-09 15:59:39 +08:00
Alexander Alekhin	bacc96f4e8	dnn(ocl): fix softmax global/local size consistency	2017-11-02 17:08:40 +03:00
Dmitry Kurtaev	03cefa7bfe	Set zero confidences in case of no detections	2017-10-30 10:17:57 +03:00
Vadim Pisarevsky	e0e40405ed	Merge pull request #9847 from wzw-intel:ocl4dnn_fusion	2017-10-27 13:59:46 +00:00
Dmitry Kurtaev	4b52b8df34	Layers for fast-neural-style models: https://github.com/jcjohnson/fast-neural-style	2017-10-27 14:26:45 +03:00
Vadim Pisarevsky	bc93775385	Merge pull request #9862 from sovrasov:dnn_nms	2017-10-27 11:19:57 +00:00
Vadim Pisarevsky	825c0ffdb4	Merge pull request #9874 from dkurt:fix_identity_permute_layer	2017-10-27 11:11:48 +00:00
Vladislav Sovrasov	7e3e9144de	dnn: add an accuracy test for NMS	2017-10-25 13:40:56 +03:00
Vladislav Sovrasov	c704942b8a	dnn: add a documentation for NMS, fix missing experimantal namespace	2017-10-25 13:35:49 +03:00
Vladislav Sovrasov	acedb4a579	dnn: make NMS function public	2017-10-25 13:35:49 +03:00
Dmitry Kurtaev	a36ebaecdc	PReLU layer for multidimensional input	2017-10-23 16:13:03 +03:00
Dmitry Kurtaev	a3a446c197	Output blobs shapes initialization in case of identity permutation (NCHW->NCHW)	2017-10-17 17:15:25 +03:00
Wu Zhiwen	2d8f2c2aea	dnn(ocl4dnn): add fusion support ocl4dnn supports following fusion styles: Conv + [BN] + [Scale] + [ReLU/PReLU] Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>	2017-10-16 19:18:36 +08:00
Maksim Shabunin	b066dd36ff	Fixed uninitialized class fields	2017-10-16 13:47:43 +03:00
Alexander Alekhin	df5b2224d7	Merge pull request #9829 from pengli:ocl4dnn	2017-10-12 11:26:20 +00:00
Li Peng	937b8e4277	dnn(ocl4dnn): support log softmax in ocl4dnn Signed-off-by: Li Peng <peng.li@intel.com>	2017-10-12 09:51:13 +08:00
Vadim Pisarevsky	8b168175ec	Merge pull request #9636 from dkurt:duplicate_lp_norm_layer	2017-10-11 13:36:14 +00:00
Vladislav Sovrasov	f7175f5050	dnn: fix additional text boxes handling after the latest adaptations for TF	2017-10-11 14:04:48 +03:00
Vladislav Sovrasov	050916fd6b	dnn: modify priorBox layer	2017-10-11 11:43:50 +03:00
Dmitry Kurtaev	905a9dada2	Removed LPNormalize layer.	2017-10-10 20:38:55 +03:00
Vadim Pisarevsky	b7ff9ddcdd	Merge pull request #9705 from AlexeyAB:dnn_darknet_yolo_v2	2017-10-10 12:02:03 +00:00
Vadim Pisarevsky	046045239c	Merge pull request #9750 from dkurt:feature_dnn_tf_text_graph	2017-10-10 10:06:24 +00:00
AlexeyAB	ecc34dc521	Added DNN Darknet Yolo v2 for object detection	2017-10-09 21:08:44 +03:00
Dmitry Kurtaev	eabf728682	PReLU layer from Caffe	2017-10-09 20:30:37 +03:00
Alexander Alekhin	e615fafe2d	build: fix MSVS2010	2017-10-08 23:32:22 +03:00
Dmitry Kurtaev	e4aa39f9e5	Text TensorFlow graphs parsing. MobileNet-SSD for 90 classes.	2017-10-08 22:25:29 +03:00
Vadim Pisarevsky	21bd834a59	Merge pull request #9772 from dkurt:fix_caffe_eltwise_and_fc_layers	2017-10-06 13:47:54 +00:00
Vadim Pisarevsky	b969d86415	Merge pull request #9787 from dkurt:feature_dnn_resize_nearest_neighbor	2017-10-06 13:46:50 +00:00
Vadim Pisarevsky	fe58b58937	Merge pull request #9778 from dkurt:dnn_colorization	2017-10-06 11:48:05 +00:00
Dmitry Kurtaev	b9f94c9315	Nearest neighbor resize layer	2017-10-06 14:33:26 +03:00
Dmitry Kurtaev	e268606e26	Grayscale colorization model (https://github.com/richzhang/colorization ) test.	2017-10-06 09:33:41 +03:00
Dmitry Kurtaev	ad8bbaf008	Multidimensional eltwise layer. Fixed fully-connected layer axis.	2017-10-04 14:01:44 +03:00
Dmitry Kurtaev	2a21c10837	Fix TensorFlow split layer	2017-10-02 22:44:42 +03:00
pengli	e340ff9c3a	Merge pull request #9114 from pengli:dnn_rebase add libdnn acceleration to dnn module (#9114) * import libdnn code Signed-off-by: Li Peng <peng.li@intel.com> * add convolution layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add pooling layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add softmax layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add lrn layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add innerproduct layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add HAVE_OPENCL macro Signed-off-by: Li Peng <peng.li@intel.com> * fix for convolution ocl Signed-off-by: Li Peng <peng.li@intel.com> * enable getUMat() for multi-dimension Mat Signed-off-by: Li Peng <peng.li@intel.com> * use getUMat for ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * use CV_OCL_RUN macro Signed-off-by: Li Peng <peng.li@intel.com> * set OPENCL target when it is available and disable fuseLayer for OCL target for the time being Signed-off-by: Li Peng <peng.li@intel.com> * fix innerproduct accuracy test Signed-off-by: Li Peng <peng.li@intel.com> * remove trailing space Signed-off-by: Li Peng <peng.li@intel.com> * Fixed tensorflow demo bug. Root cause is that tensorflow has different algorithm with libdnn to calculate convolution output dimension. libdnn don't calculate output dimension anymore and just use one passed in by config. * split gemm ocl file split it into gemm_buffer.cl and gemm_image.cl Signed-off-by: Li Peng <peng.li@intel.com> * Fix compile failure Signed-off-by: Li Peng <peng.li@intel.com> * check env flag for auto tuning Signed-off-by: Li Peng <peng.li@intel.com> * switch to new ocl kernels for softmax layer Signed-off-by: Li Peng <peng.li@intel.com> * update softmax layer on some platform subgroup extension may not work well, fallback to non subgroup ocl acceleration. Signed-off-by: Li Peng <peng.li@intel.com> * fallback to cpu path for fc layer with multi output Signed-off-by: Li Peng <peng.li@intel.com> * update output message Signed-off-by: Li Peng <peng.li@intel.com> * update fully connected layer fallback to gemm API if libdnn return false Signed-off-by: Li Peng <peng.li@intel.com> * Add ReLU OCL implementation * disable layer fusion for now Signed-off-by: Li Peng <peng.li@intel.com> * Add OCL implementation for concat layer Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com> * libdnn: update license and copyrights Also refine libdnn coding style Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com> Signed-off-by: Li Peng <peng.li@intel.com> * DNN: Don't link OpenCL library explicitly * DNN: Make default preferableTarget to DNN_TARGET_CPU User should set it to DNN_TARGET_OPENCL explicitly if want to use OpenCL acceleration. Also don't fusion when using DNN_TARGET_OPENCL * DNN: refine coding style * Add getOpenCLErrorString * DNN: Use int32_t/uint32_t instread of alias * Use namespace ocl4dnn to include libdnn things * remove extra copyTo in softmax ocl path Signed-off-by: Li Peng <peng.li@intel.com> * update ReLU layer ocl path Signed-off-by: Li Peng <peng.li@intel.com> * Add prefer target property for layer class It is used to indicate the target for layer forwarding, either the default CPU target or OCL target. Signed-off-by: Li Peng <peng.li@intel.com> * Add cl_event based timer for cv::ocl * Rename libdnn to ocl4dnn Signed-off-by: Li Peng <peng.li@intel.com> Signed-off-by: wzw <zhiwen.wu@intel.com> * use UMat for ocl4dnn internal buffer Remove allocateMemory which use clCreateBuffer directly Signed-off-by: Li Peng <peng.li@intel.com> Signed-off-by: wzw <zhiwen.wu@intel.com> * enable buffer gemm in ocl4dnn innerproduct Signed-off-by: Li Peng <peng.li@intel.com> * replace int_tp globally for ocl4dnn kernels. Signed-off-by: wzw <zhiwen.wu@intel.com> Signed-off-by: Li Peng <peng.li@intel.com> * create UMat for layer params Signed-off-by: Li Peng <peng.li@intel.com> * update sign ocl kernel Signed-off-by: Li Peng <peng.li@intel.com> * update image based gemm of inner product layer Signed-off-by: Li Peng <peng.li@intel.com> * remove buffer gemm of inner product layer call cv::gemm API instead Signed-off-by: Li Peng <peng.li@intel.com> * change ocl4dnn forward parameter to UMat Signed-off-by: Li Peng <peng.li@intel.com> * Refine auto-tuning mechanism. - Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory for fine-tuned kernel configuration. e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp, the cache directory will be /home/tmp/spatialkernels/ on Linux. - Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable auto-tuning. - OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling for OpenCL command queue. This fix basic kernel get wrong running time, i.e. 0ms. - If creating cache directory failed, disable auto-tuning. * Detect and create cache dir on windows Signed-off-by: Li Peng <peng.li@intel.com> * Refine gemm like convolution kernel. Signed-off-by: Li Peng <peng.li@intel.com> * Fix redundant swizzleWeights calling when use cached kernel config. * Fix "out of resource" bug when auto-tuning too many kernels. * replace cl_mem with UMat in ocl4dnnConvSpatial class * OCL4DNN: reduce the tuning kernel candidate. This patch could reduce 75% of the tuning candidates with less than 2% performance impact for the final result. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> * replace cl_mem with umat in ocl4dnn convolution Signed-off-by: Li Peng <peng.li@intel.com> * remove weight_image_ of ocl4dnn inner product Actually it is unused in the computation Signed-off-by: Li Peng <peng.li@intel.com> * Various fixes for ocl4dnn 1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel()) 2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp 3. Code comments cleanup 4. ignore check on OCL cpu device Signed-off-by: Li Peng <peng.li@intel.com> * add build option for log softmax Signed-off-by: Li Peng <peng.li@intel.com> * remove unused ocl kernels in ocl4dnn Signed-off-by: Li Peng <peng.li@intel.com> * replace ocl4dnnSet with opencv setTo Signed-off-by: Li Peng <peng.li@intel.com> * replace ALIGN with cv::alignSize Signed-off-by: Li Peng <peng.li@intel.com> * check kernel build options Signed-off-by: Li Peng <peng.li@intel.com> * Handle program compilation fail properly. * Use std::numeric_limits<float>::infinity() for large float number * check ocl4dnn kernel compilation result Signed-off-by: Li Peng <peng.li@intel.com> * remove unused ctx_id Signed-off-by: Li Peng <peng.li@intel.com> * change clEnqueueNDRangeKernel to kernel.run() Signed-off-by: Li Peng <peng.li@intel.com> * change cl_mem to UMat in image based gemm Signed-off-by: Li Peng <peng.li@intel.com> * check intel subgroup support for lrn and pooling layer Signed-off-by: Li Peng <peng.li@intel.com> * Fix convolution bug if group is greater than 1 Signed-off-by: Li Peng <peng.li@intel.com> * Set default layer preferableTarget to be DNN_TARGET_CPU Signed-off-by: Li Peng <peng.li@intel.com> * Add ocl perf test for convolution Signed-off-by: Li Peng <peng.li@intel.com> * Add more ocl accuracy test Signed-off-by: Li Peng <peng.li@intel.com> * replace cl_image with ocl::Image2D Signed-off-by: Li Peng <peng.li@intel.com> * Fix build failure in elementwise layer Signed-off-by: Li Peng <peng.li@intel.com> * use getUMat() to get blob data Signed-off-by: Li Peng <peng.li@intel.com> * replace cl_mem handle with ocl::KernelArg Signed-off-by: Li Peng <peng.li@intel.com> * dnn(build): don't use C++11, OPENCL_LIBRARIES fix * dnn(ocl4dnn): remove unused OpenCL kernels * dnn(ocl4dnn): extract OpenCL code into .cl files * dnn(ocl4dnn): refine auto-tuning Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING environment variable to enable it. Use a set of pre-tuned configs as default config if auto-tuning is disabled. These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet, AlexNet, ResNet-50 If default config is not suitable, use the first available kernel config from the candidates. Candidate priority from high to low is gemm like kernel, IDLF kernel, basick kernel. * dnn(ocl4dnn): pooling doesn't use OpenCL subgroups * dnn(ocl4dnn): fix perf test OpenCV has default 3sec time limit for each performance test. Warmup OpenCL backend outside of perf measurement loop. * use ocl::KernelArg as much as possible Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): fix bias bug for gemm like kernel * dnn(ocl4dnn): wrap cl_mem into UMat Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): Refine signature of kernel config - Use more readable string as signture of kernel config - Don't count device name and vendor in signature string - Default kernel configurations are tuned for Intel GPU with 24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model. * dnn(ocl4dnn): swap width/height in configuration * dnn(ocl4dnn): enable configs for Intel OpenCL runtime only * core: make configuration helper functions accessible from non-core modules * dnn(ocl4dnn): update kernel auto-tuning behavior Avoid unwanted creation of directories * dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash * dnn(ocl4dnn): remove redundant code * dnn(ocl4dnn): Add more clear message for simd size dismatch. * dnn(ocl4dnn): add const to const argument Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel * dnn(ocl4dnn): drop unused tuneLocalSize() * dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method * dnn(ocl4dnn): sanitize file names used for cache * dnn(perf): enable Network tests with OpenCL * dnn(ocl4dnn/conv): drop computeGlobalSize() * dnn(ocl4dnn/conv): drop unused fields * dnn(ocl4dnn/conv): simplify ctor * dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL * dnn(ocl4dnn/conv): drop unsupported double / untested half types * dnn(ocl4dnn/conv): drop unused variable * dnn(ocl4dnn/conv): alignSize/divUp * dnn(ocl4dnn/conv): use enum values * dnn(ocl4dnn): drop unused innerproduct variable Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): add an generic function to check cl option support * dnn(ocl4dnn): run softmax subgroup version kernel first Signed-off-by: Li Peng <peng.li@intel.com>	2017-10-02 15:38:00 +03:00
Vadim Pisarevsky	5e93c82023	Merge pull request #9491 from dkurt:tf_lstm	2017-09-28 21:04:06 +00:00
Vadim Pisarevsky	68cc2e292d	Merge pull request #9734 from dkurt:fix_deconv_layer_kernel_layout	2017-09-28 11:42:57 +00:00
Vadim Pisarevsky	45365e4df1	Merge pull request #9691 from dkurt:padding_layer_refactoring	2017-09-28 11:34:28 +00:00
Dmitry Kurtaev	6e593cd1f0	Swap dimensions of deconvolution kernel	2017-09-27 22:38:34 +03:00
Alexander Alekhin	3dee92ec50	fix usage of CV_FMA3 macro	2017-09-26 17:23:54 +03:00
Dmitry Kurtaev	84cec17913	LSTM layer for TensorFlow importer	2017-09-26 12:59:36 +03:00
Dmitry Kurtaev	222149b9c6	Refactored Padding layer	2017-09-22 12:39:00 +03:00
Dmitry Kurtaev	17a85b16fc	Remove reorder_dims attribute of Reshape layer	2017-09-21 16:42:03 +03:00
Dmitry Kurtaev	d891e9b1d8	Layers for MobileNet from TensorFlow	2017-09-15 20:17:30 +03:00
Vadim Pisarevsky	6bf8fe815d	Merge pull request #9384 from dkurt:torch_split	2017-09-15 13:05:05 +00:00
Vadim Pisarevsky	41b23fde9f	Merge pull request #9524 from dkurt:dnn_torch_openface	2017-09-15 12:38:12 +00:00
Dmitry Kurtaev	0ce7c33bc8	Torch's Concat and ConcatTable doesn't use Split layer	2017-09-14 09:26:57 +03:00
Dmitry Kurtaev	7dc6b1d7d4	Layers for OpenFace face recognition network	2017-09-14 09:11:31 +03:00
Dmitry Kurtaev	58b890b9f7	Dilated convolution import from TensorFlow	2017-09-13 18:44:14 +03:00
Maksim Shabunin	248e2c7d47	Fixed some issues found by static analysis	2017-09-08 12:22:12 +03:00
dkurt	339793143c	Unit tests for TensorFlow importer	2017-08-03 11:29:48 +03:00
Aleksandr Rybnikov	8d6b8b45b6	Added ELU and test for it	2017-08-02 11:13:59 +03:00
Alexander Alekhin	2959e7aba9	Merge pull request #9188 from arrybn:mobilenet_ssd_sample	2017-08-01 11:12:54 +00:00
Aleksandr Rybnikov	ce1cc352d9	MobileNet SSD sample	2017-08-01 12:30:27 +03:00
Tomoaki Teshima	0f91faddae	fix linker error when trying CPU_BASELINE=AVX	2017-07-21 21:13:47 +09:00
Aleksandr Rybnikov	7d1140340e	Rewrote googlenet tests	2017-07-18 18:49:14 +03:00
Vadim Pisarevsky	0488d9bdb2	optimize out scaleLayer & concatLayer whenever possible fixed problem in concat layer by disabling memory re-use in layers with multiple inputs trying to fix the tests when Halide is used to run deep nets another attempt to fix Halide tests see if the Halide tests will pass with concat layer fusion turned off trying to fix failures in halide tests; another try one more experiment to make halide_concat & halide_enet tests pass continue attempts to fix halide tests moving on uncomment parallel concat layer seemingly fixed failures in Halide tests and re-enabled concat layer fusion; thanks to dkurt for the patch	2017-07-14 18:30:53 +03:00
Alexander Alekhin	4784c7be5f	dnn: cleanup dispatched code, fix SIMD128 types	2017-07-13 19:00:34 +03:00
Alexander Alekhin	c3e6de293f	dnn: code cleanup, refactor detection output layer	2017-07-13 19:00:34 +03:00
Alexander Alekhin	520da7aaaf	Merge pull request #9111 from vpisarev:dnn_optim_avx1	2017-07-13 12:27:05 +00:00
dkurt	3203635765	Eltwise layer fixes	2017-07-10 12:58:11 +03:00
Vadim Pisarevsky	ed9564106c	reuse AVX2-optimized kernels for AVX1 CPUs (like IvyBridge)	2017-07-06 21:36:59 +03:00
Maksim Shabunin	e0393f8557	Fixed some issues found by static analysis (4th round)	2017-06-30 12:26:53 +03:00
Vadim Pisarevsky	ac49a17a82	Merge pull request #9022 from dkurt:keep_conv_weights_for_halide	2017-06-29 11:09:17 +00:00
Maksim Shabunin	ace0701a46	Merge pull request #9019 from alalek:dnn_trace	2017-06-29 07:33:46 +00:00
Alexander Alekhin	324851882a	Merge pull request #9025 from mshabunin:fix-static-3	2017-06-28 20:50:21 +00:00
Maksim Shabunin	a769d69a9d	Fixed several issues found by static analysis	2017-06-28 18:06:18 +03:00
dkurt	b46f5b1b38	Align convolutional layer weights separately from origin ones	2017-06-28 17:05:56 +03:00
Alexander Alekhin	ed10383359	dnn: added trace macros	2017-06-28 14:57:26 +03:00
Vadim Pisarevsky	c5faa9aefa	Merge pull request #9013 from arrybn:ssd_last_layers_optim	2017-06-28 10:38:55 +00:00
Vadim Pisarevsky	bbb14d3746	Merge pull request #9003 from dkurt:halide_bug_fixes	2017-06-28 08:48:27 +00:00
Aleksandr Rybnikov	ec321e651f	Removed usage of std::map in DetectionOutput layer	2017-06-28 11:31:38 +03:00
Vadim Pisarevsky	8b3d6603d5	another round of dnn optimization (#9011 ) * another round of dnn optimization: * increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly * improved SIMD optimization of pooling layer, optimized average pooling * cleaned up convolution layer implementation * made activation layer "attacheable" to all other layers, including fully connected and addition layer. * fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology. * greatly optimized permutation layer, which improved SSD performance * parallelized element-wise binary/ternary/... ops (sum, prod, max) * also, added missing copyrights to many of the layer implementation files * temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders	2017-06-28 11:15:22 +03:00
Alexander Alekhin	f8a75c4361	dispatch: added CV_TRY_${OPT} macro, fix dnn build - 1: OPT is available directly or via dispatcher - 0: optimization is not compiled at all	2017-06-27 17:05:15 +03:00
dkurt	121789f78e	Fixed some bugs from Halide tests	2017-06-27 14:52:46 +03:00
Alexander Alekhin	93091ba203	dnn: AVX2 fix invalid unaligned read	2017-06-26 19:48:42 +03:00
Alexander Alekhin	93729784bb	dnn: move module from opencv_contrib `e6f63c7a38/modules/dnn`	2017-06-26 13:41:51 +03:00

... 3 4 5 6 7

335 Commits