diff --git a/modules/ml/doc/boosting.rst b/modules/ml/doc/boosting.rst
index 7c5bc83fce..76a9293fc3 100644
--- a/modules/ml/doc/boosting.rst
+++ b/modules/ml/doc/boosting.rst
@@ -63,41 +63,30 @@ training examples are recomputed at each training iteration. Examples deleted at
 
 .. [FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting. Technical Report, Dept. of Statistics*, Stanford University, 1998.
 
-CvBoostParams
+Boost::Params
 -------------
-.. ocv:struct:: CvBoostParams : public CvDTreeParams
+.. ocv:struct:: Boost::Params : public DTree::Params
 
     Boosting training parameters.
 
-    There is one structure member that you can set directly:
-
-  .. ocv:member:: int split_criteria
-
-     Splitting criteria used to choose optimal splits during a weak tree construction. Possible values are:
-
-        * **CvBoost::DEFAULT** Use the default for the particular boosting method, see below.
-        * **CvBoost::GINI** Use Gini index. This is default option for Real AdaBoost; may be also used for Discrete AdaBoost.
-        * **CvBoost::MISCLASS** Use misclassification rate. This is default option for Discrete AdaBoost; may be also used for Real AdaBoost.
-        * **CvBoost::SQERR** Use least squares criteria. This is default and the only option for LogitBoost and Gentle AdaBoost.
-
-The structure is derived from :ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
+The structure is derived from ``DTrees::Params`` but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
 
 All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
 
-CvBoostParams::CvBoostParams
+Boost::Params::Params
 ----------------------------
 The constructors.
 
-.. ocv:function:: CvBoostParams::CvBoostParams()
+.. ocv:function:: Boost::Params::Params()
 
-.. ocv:function:: CvBoostParams::CvBoostParams( int boost_type, int weak_count, double weight_trim_rate, int max_depth, bool use_surrogates, const float* priors )
+.. ocv:function:: Boost::Params::Params( int boost_type, int weak_count, double weight_trim_rate, int max_depth, bool use_surrogates, const float* priors )
 
     :param boost_type: Type of the boosting algorithm. Possible values are:
 
-        * **CvBoost::DISCRETE** Discrete AdaBoost.
-        * **CvBoost::REAL** Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
-        * **CvBoost::LOGIT** LogitBoost. It can produce good regression fits.
-        * **CvBoost::GENTLE** Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data.
+        * **Boost::DISCRETE** Discrete AdaBoost.
+        * **Boost::REAL** Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
+        * **Boost::LOGIT** LogitBoost. It can produce good regression fits.
+        * **Boost::GENTLE** Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data.
 
         Gentle AdaBoost and Real AdaBoost are often the preferable choices.
 
@@ -105,131 +94,54 @@ The constructors.
 
     :param weight_trim_rate: A threshold between 0 and 1 used to save computational time. Samples with summary weight :math:`\leq 1 - weight\_trim\_rate` do not participate in the *next* iteration of training. Set this parameter to 0 to turn off this functionality.
 
-See :ocv:func:`CvDTreeParams::CvDTreeParams` for description of other parameters.
+See ``DTrees::Params`` for description of other parameters.
 
 Default parameters are:
 
 ::
 
-    CvBoostParams::CvBoostParams()
+    Boost::Params::Params()
     {
-        boost_type = CvBoost::REAL;
-        weak_count = 100;
-        weight_trim_rate = 0.95;
-        cv_folds = 0;
-        max_depth = 1;
+        boostType = Boost::REAL;
+        weakCount = 100;
+        weightTrimRate = 0.95;
+        CVFolds = 0;
+        maxDepth = 1;
     }
 
-CvBoostTree
------------
-.. ocv:class:: CvBoostTree : public CvDTree
-
-The weak tree classifier, a component of the boosted tree classifier :ocv:class:`CvBoost`, is a derivative of :ocv:class:`CvDTree`. Normally, there is no need to use the weak classifiers directly. However, they can be accessed as elements of the sequence ``CvBoost::weak``, retrieved by :ocv:func:`CvBoost::get_weak_predictors`.
-
-.. note:: In case of LogitBoost and Gentle AdaBoost, each weak predictor is a regression tree, rather than a classification tree. Even in case of Discrete AdaBoost and Real AdaBoost, the ``CvBoostTree::predict`` return value (:ocv:member:`CvDTreeNode::value`) is not an output class label. A negative value "votes" for class #0, a positive value - for class #1. The votes are weighted. The weight of each individual tree may be increased or decreased using the method ``CvBoostTree::scale``.
-
-CvBoost
+Boost
 -------
-.. ocv:class:: CvBoost : public CvStatModel
+.. ocv:class:: Boost : public DTrees
 
-Boosted tree classifier derived from :ocv:class:`CvStatModel`.
+Boosted tree classifier derived from ``DTrees``
 
-CvBoost::CvBoost
+Boost::create
 ----------------
-Default and training constructors.
+Creates the empty model
 
-.. ocv:function:: CvBoost::CvBoost()
+.. ocv:function:: Ptr<Boost> Boost::create(const Params& params=Params())
 
-.. ocv:function:: CvBoost::CvBoost( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvBoostParams params=CvBoostParams() )
+Use ``StatModel::train`` to train the model, ``StatModel::train<Boost>(traindata, params)`` to create and train the model, ``StatModel::load<Boost>(filename)`` to load the pre-trained model.
 
-.. ocv:function:: CvBoost::CvBoost( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvBoostParams params=CvBoostParams() )
-
-.. ocv:pyfunction:: cv2.Boost([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <Boost object>
-
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvBoost::train
---------------
-Trains a boosted tree classifier.
-
-.. ocv:function:: bool CvBoost::train( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvBoostParams params=CvBoostParams(), bool update=false )
-
-.. ocv:function:: bool CvBoost::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvBoostParams params=CvBoostParams(), bool update=false )
-
-.. ocv:function:: bool CvBoost::train( CvMLData* data, CvBoostParams params=CvBoostParams(), bool update=false )
-
-.. ocv:pyfunction:: cv2.Boost.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
-
-    :param update: Specifies whether the classifier needs to be updated (``true``, the new weak tree classifiers added to the existing ensemble) or the classifier needs to be rebuilt from scratch (``false``).
-
-The train method follows the common template of :ocv:func:`CvStatModel::train`. The responses must be categorical, which means that boosted trees cannot be built for regression, and there should be two classes.
-
-CvBoost::predict
-----------------
-Predicts a response for an input sample.
-
-.. ocv:function:: float CvBoost::predict( const cv::Mat& sample, const cv::Mat& missing=Mat(), const cv::Range& slice=Range::all(), bool rawMode=false, bool returnSum=false ) const
-
-.. ocv:function:: float CvBoost::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weak_responses=0, CvSlice slice=CV_WHOLE_SEQ, bool raw_mode=false, bool return_sum=false ) const
-
-.. ocv:pyfunction:: cv2.Boost.predict(sample[, missing[, slice[, rawMode[, returnSum]]]]) -> retval
-
-    :param sample: Input sample.
-
-    :param missing: Optional mask of missing measurements. To handle missing measurements, the weak classifiers must include surrogate splits (see ``CvDTreeParams::use_surrogates``).
-
-    :param weak_responses: Optional output parameter, a floating-point vector with responses of each individual weak classifier. The number of elements in the vector must be equal to the slice length.
-
-    :param slice: Continuous subset of the sequence of weak classifiers to be used for prediction. By default, all the weak classifiers are used.
-
-    :param rawMode: Normally, it should be set to ``false``.
-
-    :param returnSum: If ``true`` then return sum of votes instead of the class label.
-
-The method runs the sample through the trees in the ensemble and returns the output class label based on the weighted voting.
-
-CvBoost::prune
---------------
-Removes the specified weak classifiers.
-
-.. ocv:function:: void CvBoost::prune( CvSlice slice )
-
-.. ocv:pyfunction:: cv2.Boost.prune(slice) -> None
-
-    :param slice: Continuous subset of the sequence of weak classifiers to be removed.
-
-The method removes the specified weak classifiers from the sequence.
-
-.. note:: Do not confuse this method with the pruning of individual decision trees, which is currently not supported.
-
-
-CvBoost::calc_error
--------------------
-Returns error of the boosted tree classifier.
-
-.. ocv:function:: float CvBoost::calc_error( CvMLData* _data, int type , std::vector<float> *resp = 0 )
-
-The method is identical to :ocv:func:`CvDTree::calc_error` but uses the boosted tree classifier as predictor.
-
-
-CvBoost::get_weak_predictors
-----------------------------
-Returns the sequence of weak tree classifiers.
-
-.. ocv:function:: CvSeq* CvBoost::get_weak_predictors()
-
-The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to the :ocv:class:`CvBoostTree` class or to some of its derivatives.
-
-CvBoost::get_params
--------------------
-Returns current parameters of the boosted tree classifier.
-
-.. ocv:function:: const CvBoostParams& CvBoost::get_params() const
-
-
-CvBoost::get_data
+Boost::getBParams
 -----------------
-Returns used train data of the boosted tree classifier.
+Returns the boosting parameters
 
-.. ocv:function:: const CvDTreeTrainData* CvBoost::get_data() const
+.. ocv:function:: Params Boost::getBParams() const
+
+The method returns the training parameters.
+
+Boost::setBParams
+-----------------
+Sets the boosting parameters
+
+.. ocv:function:: void Boost::setBParams( const Params& p )
+
+    :param p: Training parameters of type Boost::Params.
+
+The method sets the training parameters.
+
+Prediction with Boost
+---------------------
+
+StatModel::predict(samples, results, flags) should be used. Pass ``flags=StatModel::RAW_OUTPUT`` to get the raw sum from Boost classifier.
diff --git a/modules/ml/doc/decision_trees.rst b/modules/ml/doc/decision_trees.rst
index de6fc99d63..9400ae4b96 100644
--- a/modules/ml/doc/decision_trees.rst
+++ b/modules/ml/doc/decision_trees.rst
@@ -3,10 +3,7 @@ Decision Trees
 
 The ML classes discussed in this section implement Classification and Regression Tree algorithms described in [Breiman84]_.
 
-The class
-:ocv:class:`CvDTree` represents a single decision tree that may be used alone or as a base class in tree ensembles (see
-:ref:`Boosting` and
-:ref:`Random Trees` ).
+The class ``cv::ml::DTrees`` represents a single decision tree or a collection of decision trees. It's also a base class for ``RTrees`` and ``Boost``.
 
 A decision tree is a binary tree (tree where each non-leaf node has two child nodes). It can be used either for classification or for regression. For classification, each tree leaf is marked with a class label; multiple leaves may have the same label. For regression, a constant is also assigned to each tree leaf, so the approximation function is piecewise constant.
 
@@ -55,123 +52,107 @@ Besides the prediction that is an obvious use of decision trees, the tree can be
 Importance of each variable is computed over all the splits on this variable in the tree, primary and surrogate ones. Thus, to compute variable importance correctly, the surrogate splits must be enabled in the training parameters, even if there is no missing data.
 
 
-CvDTreeSplit
+DTrees::Split
 ------------
-.. ocv:struct:: CvDTreeSplit
+.. ocv:class:: DTrees::Split
 
+  The class represents split in a decision tree. It has public members:
 
-  The structure represents a possible decision tree node split. It has public members:
-
-  .. ocv:member:: int var_idx
+  .. ocv:member:: int varIdx
 
      Index of variable on which the split is created.
 
-  .. ocv:member:: int inversed
+  .. ocv:member:: bool inversed
 
-     If it is not null then inverse split rule is used that is left and right branches are exchanged in the rule expressions below.
+     If true, then the inverse split rule is used (i.e. left and right branches are exchanged in the rule expressions below).
 
   .. ocv:member:: float quality
 
-     The split quality, a positive number. It is used to choose the best primary split, then to choose and sort the surrogate splits. After the tree is constructed, it is also used to compute variable importance.
+     The split quality, a positive number. It is used to choose the best split.
 
-  .. ocv:member:: CvDTreeSplit* next
+  .. ocv:member:: int next
 
-     Pointer to the next split in the node list of splits.
+     Index of the next split in the list of splits for the node
 
-  .. ocv:member:: int[] subset
-
-     Bit array indicating the value subset in case of split on a categorical variable. The rule is: ::
-
-        if var_value in subset
-          then next_node <- left
-          else next_node <- right
-
-  .. ocv:member:: float ord::c
+  .. ocv:member:: float c
 
      The threshold value in case of split on an ordered variable. The rule is: ::
 
-        if var_value < ord.c
-          then next_node<-left
-          else next_node<-right
+       if var_value < c
+         then next_node<-left
+         else next_node<-right
 
-  .. ocv:member:: int ord::split_point
+  .. ocv:member:: int subsetOfs
 
-     Used internally by the training algorithm.
+     Offset of the bitset used by the split on a categorical variable. The rule is: ::
 
-CvDTreeNode
+        if bitset[var_value] == 1
+          then next_node <- left
+          else next_node <- right
+
+DTrees::Node
 -----------
-.. ocv:struct:: CvDTreeNode
+.. ocv:class:: DTrees::Node
 
+  The class represents a decision tree node. It has public members:
 
-  The structure represents a node in a decision tree. It has public members:
+  .. ocv:member:: double value
+  
+    Value at the node: a class label in case of classification or estimated function value in case of regression.
 
-  .. ocv:member:: int class_idx
+  .. ocv:member:: int classIdx
 
     Class index normalized to 0..class_count-1 range and assigned to the node. It is used internally in classification trees and tree ensembles.
 
-  .. ocv:member:: int Tn
+  .. ocv:member:: int parent
 
-    Tree index in a ordered sequence of pruned trees. The indices are used during and after the pruning procedure. The root node has the maximum value ``Tn`` of the whole tree, child nodes have ``Tn`` less than or equal to the parent's ``Tn``, and nodes with :math:`Tn \leq CvDTree::pruned\_tree\_idx` are not used at prediction stage (the corresponding branches are considered as cut-off), even if they have not been physically deleted from the tree at the pruning stage.
+    Index of the parent node
 
-  .. ocv:member:: double value
+  .. ocv:member:: int left
 
-    Value at the node: a class label in case of classification or estimated function value in case of regression.
+    Index of the left child node
 
-  .. ocv:member:: CvDTreeNode* parent
+  .. ocv:member:: int right
 
-    Pointer to the parent node.
+    Index of right child node.
 
-  .. ocv:member:: CvDTreeNode* left
+  .. ocv:member:: int defaultDir
 
-    Pointer to the left child node.
+    Default direction where to go (-1: left or +1: right). It helps in the case of missing values.
 
-  .. ocv:member:: CvDTreeNode* right
+  .. ocv:member:: int split
 
-    Pointer to the right child node.
+    Index of the first split
 
-  .. ocv:member:: CvDTreeSplit* split
-
-    Pointer to the first (primary) split in the node list of splits.
-
-  .. ocv:member:: int sample_count
-
-    The number of samples that fall into the node at the training stage. It is used to resolve the difficult cases - when the variable for the primary split is missing and all the variables for other surrogate splits are missing too. In this case the sample is directed to the left if ``left->sample_count > right->sample_count`` and to the right otherwise.
-
-  .. ocv:member:: int depth
-
-    Depth of the node. The root node depth is 0, the child nodes depth is the parent's depth + 1.
-
-Other numerous fields of ``CvDTreeNode`` are used internally at the training stage.
-
-CvDTreeParams
--------------
-.. ocv:struct:: CvDTreeParams
+DTrees::Params
+---------------
+.. ocv:class:: DTrees::Params
 
 The structure contains all the decision tree training parameters. You can initialize it by default constructor and then override any parameters directly before training, or the structure may be fully initialized using the advanced variant of the constructor.
 
-CvDTreeParams::CvDTreeParams
+DTrees::Params::Params
 ----------------------------
-The constructors.
+The constructors
 
-.. ocv:function:: CvDTreeParams::CvDTreeParams()
+.. ocv:function:: DTrees::Params::Params()
 
-.. ocv:function:: CvDTreeParams::CvDTreeParams( int max_depth, int min_sample_count, float regression_accuracy, bool use_surrogates, int max_categories, int cv_folds, bool use_1se_rule, bool truncate_pruned_tree, const float* priors )
+.. ocv:function:: DTrees::Params::Params( int maxDepth, int minSampleCount, double regressionAccuracy, bool useSurrogates, int maxCategories, int CVFolds, bool use1SERule, bool truncatePrunedTree, const Mat& priors )
 
-    :param max_depth: The maximum possible depth of the tree. That is the training algorithms attempts to split a node while its depth is less than ``max_depth``. The actual depth may be smaller if the other termination criteria are met (see the outline of the training procedure in the beginning of the section), and/or if the tree is pruned.
+    :param maxDepth: The maximum possible depth of the tree. That is the training algorithms attempts to split a node while its depth is less than ``maxDepth``. The root node has zero depth. The actual depth may be smaller if the other termination criteria are met (see the outline of the training procedure in the beginning of the section), and/or if the tree is pruned.
 
-    :param min_sample_count: If the number of samples in a node is less than this parameter then the node will not be split.
+    :param minSampleCount: If the number of samples in a node is less than this parameter then the node will not be split.
 
-    :param regression_accuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be split.
+    :param regressionAccuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be split further.
 
-    :param use_surrogates: If true then surrogate splits will be built. These splits allow to work with missing data and compute variable importance correctly.
+    :param useSurrogates: If true then surrogate splits will be built. These splits allow to work with missing data and compute variable importance correctly. .. note:: currently it's not implemented.
 
-    :param max_categories: Cluster possible values of a categorical variable into ``K`` :math:`\leq` ``max_categories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``max_categories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including ML) try to find sub-optimal split in this case by clustering all the samples into ``max_categories`` clusters that is some categories are merged together. The clustering is applied only in ``n``>2-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
+    :param maxCategories: Cluster possible values of a categorical variable into ``K<=maxCategories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``maxCategories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including our implementation) try to find sub-optimal split in this case by clustering all the samples into ``maxCategories`` clusters that is some categories are merged together. The clustering is applied only in ``n > 2``-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
 
-    :param cv_folds: If ``cv_folds > 1`` then prune a tree with ``K``-fold cross-validation where ``K`` is equal to ``cv_folds``.
+    :param CVFolds: If ``CVFolds > 1`` then algorithms prunes the built decision tree using ``K``-fold cross-validation procedure where ``K`` is equal to ``CVFolds``.
 
-    :param use_1se_rule: If true then a pruning will be harsher. This will make a tree more compact and more resistant to the training data noise but a bit less accurate.
+    :param use1SERule: If true then a pruning will be harsher. This will make a tree more compact and more resistant to the training data noise but a bit less accurate.
 
-    :param truncate_pruned_tree: If true then pruned branches are physically removed from the tree. Otherwise they are retained and it is possible to get results from the original unpruned (or pruned less aggressively) tree by decreasing ``CvDTree::pruned_tree_idx`` parameter.
+    :param truncatePrunedTree: If true then pruned branches are physically removed from the tree. Otherwise they are retained and it is possible to get results from the original unpruned (or pruned less aggressively) tree.
 
     :param priors: The array of a priori class probabilities, sorted by the class label value. The parameter can be used to tune the decision tree preferences toward a certain class. For example, if you want to detect some rare anomaly occurrence, the training base will likely contain much more normal cases than anomalies, so a very good classification performance will be achieved just by considering every case as normal. To avoid this, the priors can be specified, where the anomaly probability is artificially increased (up to 0.5 or even greater), so the weight of the misclassified anomalies becomes much bigger, and the tree is adjusted properly. You can also think about this parameter as weights of prediction categories which determine relative weights that you give to misclassification. That is, if the weight of the first category is 1 and the weight of the second category is 10, then each mistake in predicting the second category is equivalent to making 10 mistakes in predicting the first category.
 
@@ -179,142 +160,82 @@ The default constructor initializes all the parameters with the default values t
 
 ::
 
-    CvDTreeParams() : max_categories(10), max_depth(INT_MAX), min_sample_count(10),
-        cv_folds(10), use_surrogates(true), use_1se_rule(true),
-        truncate_pruned_tree(true), regression_accuracy(0.01f), priors(0)
-    {}
+    DTrees::Params::Params()
+    {
+        maxDepth = INT_MAX;
+        minSampleCount = 10;
+        regressionAccuracy = 0.01f;
+        useSurrogates = false;
+        maxCategories = 10;
+        CVFolds = 10;
+        use1SERule = true;
+        truncatePrunedTree = true;
+        priors = Mat();
+    }
 
 
-CvDTreeTrainData
+DTrees
+------
+
+.. ocv:class:: DTrees : public StatModel
+
+The class represents a single decision tree or a collection of decision trees. The current public interface of the class allows user to train only a single decision tree, however the class is capable of storing multiple decision trees and using them for prediction (by summing responses or using a voting schemes), and the derived from DTrees classes (such as ``RTrees`` and ``Boost``) use this capability to implement decision tree ensembles.
+
+DTrees::create
 ----------------
-.. ocv:struct:: CvDTreeTrainData
+Creates the empty model
 
-Decision tree training data and shared data for tree ensembles. The structure is mostly used internally for storing both standalone trees and tree ensembles efficiently. Basically, it contains the following types of information:
+.. ocv:function:: Ptr<DTrees> DTrees::create(const Params& params=Params())
 
-#. Training parameters, an instance of :ocv:class:`CvDTreeParams`.
+The static method creates empty decision tree with the specified parameters. It should be then trained using ``train`` method (see ``StatModel::train``). Alternatively, you can load the model from file using ``StatModel::load<DTrees>(filename)``.
 
-#. Training data preprocessed to find the best splits more efficiently. For tree ensembles, this preprocessed data is reused by all trees. Additionally, the training data characteristics shared by all trees in the ensemble are stored here: variable types, the number of classes, a class label compression map, and so on.
+DTrees::getDParams
+------------------
+Returns the training parameters
 
-#. Buffers, memory storages for tree nodes, splits, and other elements of the constructed trees.
+.. ocv:function:: Params DTrees::getDParams() const
 
-There are two ways of using this structure. In simple cases (for example, a standalone tree or the ready-to-use "black box" tree ensemble from machine learning, like
-:ref:`Random Trees` or
-:ref:`Boosting` ), there is no need to care or even to know about the structure. You just construct the needed statistical model, train it, and use it. The ``CvDTreeTrainData`` structure is constructed and used internally. However, for custom tree algorithms or another sophisticated cases, the structure may be constructed and used explicitly. The scheme is the following:
+The method returns the training parameters.
 
-#.
-    The structure is initialized using the default constructor, followed by ``set_data``, or it is built using the full form of constructor. The parameter ``_shared`` must be set to ``true``.
-
-#.
-    One or more trees are trained using this data (see the special form of the method :ocv:func:`CvDTree::train`).
-
-#.
-    The structure is released as soon as all the trees using it are released.
-
-CvDTree
--------
-.. ocv:class:: CvDTree : public CvStatModel
-
-The class implements a decision tree as described in the beginning of this section.
-
-
-CvDTree::train
---------------
-Trains a decision tree.
-
-.. ocv:function:: bool CvDTree::train( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvDTreeParams params=CvDTreeParams() )
-
-.. ocv:function:: bool CvDTree::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvDTreeParams params=CvDTreeParams() )
-
-.. ocv:function:: bool CvDTree::train( CvMLData* trainData, CvDTreeParams params=CvDTreeParams() )
-
-.. ocv:function:: bool CvDTree::train( CvDTreeTrainData* trainData, const CvMat* subsampleIdx )
-
-.. ocv:pyfunction:: cv2.DTree.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]) -> retval
-
-There are four ``train`` methods in :ocv:class:`CvDTree`:
-
-* The **first two** methods follow the generic :ocv:func:`CvStatModel::train` conventions. It is the most complete form. Both data layouts (``tflag=CV_ROW_SAMPLE`` and ``tflag=CV_COL_SAMPLE``) are supported, as well as sample and variable subsets, missing measurements, arbitrary combinations of input and output variable types, and so on. The last parameter contains all of the necessary training parameters (see the :ocv:class:`CvDTreeParams` description).
-
-* The **third** method uses :ocv:class:`CvMLData` to pass training data to a decision tree.
-
-* The **last** method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed :ocv:class:`CvDTreeTrainData` instance and an optional subset of the training set. The indices in ``subsampleIdx`` are counted relatively to the ``_sample_idx`` , passed to the ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``subsampleIdx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.
-
-The function is parallelized with the TBB library.
-
-
-
-CvDTree::predict
-----------------
-Returns the leaf node of a decision tree corresponding to the input vector.
-
-.. ocv:function:: CvDTreeNode* CvDTree::predict( const Mat& sample, const Mat& missingDataMask=Mat(), bool preprocessedInput=false ) const
-
-.. ocv:function:: CvDTreeNode* CvDTree::predict( const CvMat* sample, const CvMat* missingDataMask=0, bool preprocessedInput=false ) const
-
-.. ocv:pyfunction:: cv2.DTree.predict(sample[, missingDataMask[, preprocessedInput]]) -> retval
-
-    :param sample: Sample for prediction.
-
-    :param missingDataMask: Optional input missing measurement mask.
-
-    :param preprocessedInput: This parameter is normally set to ``false``, implying a regular input. If it is ``true``, the method assumes that all the values of the discrete input variables have been already normalized to :math:`0` to :math:`num\_of\_categories_i-1` ranges since the decision tree uses such normalized representation internally. It is useful for faster prediction with tree ensembles. For ordered input variables, the flag is not used.
-
-The method traverses the decision tree and returns the reached leaf node as output. The prediction result, either the class label or the estimated function value, may be retrieved as the ``value`` field of the :ocv:class:`CvDTreeNode` structure, for example: ``dtree->predict(sample,mask)->value``.
-
-
-
-CvDTree::calc_error
+DTrees::setDParams
 -------------------
-Returns error of the decision tree.
+Sets the training parameters
 
-.. ocv:function:: float CvDTree::calc_error( CvMLData* trainData, int type, std::vector<float> *resp = 0 )
+.. ocv:function:: void DTrees::setDParams( const Params& p )
 
-    :param trainData: Data for the decision tree.
+    :param p: Training parameters of type DTrees::Params.
 
-    :param type: Type of error. Possible values are:
-
-        * **CV_TRAIN_ERROR** Error on train samples.
-
-        * **CV_TEST_ERROR** Error on test samples.
-
-    :param resp: If it is not null then size of this vector will be set to the number of samples and each element will be set to result of prediction on the corresponding sample.
-
-The method calculates error of the decision tree. In case of classification it is the percentage of incorrectly classified samples and in case of regression it is the mean of squared errors on samples.
+The method sets the training parameters.
 
 
-CvDTree::getVarImportance
--------------------------
-Returns the variable importance array.
+DTrees::getRoots
+-------------------
+Returns indices of root nodes
 
-.. ocv:function:: Mat CvDTree::getVarImportance()
+.. ocv:function:: std::vector<int>& DTrees::getRoots() const
 
-.. ocv:function:: const CvMat* CvDTree::get_var_importance()
+DTrees::getNodes
+-------------------
+Returns all the nodes
 
-.. ocv:pyfunction:: cv2.DTree.getVarImportance() -> retval
+.. ocv:function:: std::vector<Node>& DTrees::getNodes() const
 
-CvDTree::get_root
------------------
-Returns the root of the decision tree.
+all the node indices, mentioned above (left, right, parent, root indices) are indices in the returned vector
 
-.. ocv:function:: const CvDTreeNode* CvDTree::get_root() const
+DTrees::getSplits
+-------------------
+Returns all the splits
 
+.. ocv:function:: std::vector<Split>& DTrees::getSplits() const
 
-CvDTree::get_pruned_tree_idx
-----------------------------
-Returns the ``CvDTree::pruned_tree_idx`` parameter.
+all the split indices, mentioned above (split, next etc.) are indices in the returned vector
 
-.. ocv:function:: int CvDTree::get_pruned_tree_idx() const
+DTrees::getSubsets
+-------------------
+Returns all the bitsets for categorical splits
 
-The parameter ``DTree::pruned_tree_idx`` is used to prune a decision tree. See the ``CvDTreeNode::Tn`` parameter.
-
-CvDTree::get_data
------------------
-Returns used train data of the decision tree.
-
-.. ocv:function:: CvDTreeTrainData* CvDTree::get_data() const
-
-Example: building a tree for classifying mushrooms.  See the ``mushroom.cpp`` sample that demonstrates how to build and use the
-decision tree.
+.. ocv:function:: std::vector<int>& DTrees::getSubsets() const
 
+``Split::subsetOfs`` is an offset in the returned vector
 
 .. [Breiman84] Breiman, L., Friedman, J. Olshen, R. and Stone, C. (1984), *Classification and Regression Trees*, Wadsworth.
diff --git a/modules/ml/doc/ertrees.rst b/modules/ml/doc/ertrees.rst
deleted file mode 100644
index 7e6d03e7fc..0000000000
--- a/modules/ml/doc/ertrees.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-Extremely randomized trees
-==========================
-
-Extremely randomized trees have been introduced by Pierre Geurts, Damien Ernst and Louis Wehenkel in the article "Extremely randomized trees", 2006 [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.7485&rep=rep1&type=pdf]. The algorithm of growing Extremely randomized trees is similar to :ref:`Random Trees` (Random Forest), but there are two differences:
-
-#. Extremely randomized trees don't apply the bagging procedure to construct a set of the training samples for each tree. The same input training set is used to train all trees.
-
-#. Extremely randomized trees pick a node split very extremely (both a variable index and variable splitting value are chosen randomly), whereas Random Forest finds the best split (optimal one by variable index and variable splitting value) among random subset of variables.
-
-
-CvERTrees
-----------
-.. ocv:class:: CvERTrees : public CvRTrees
-
-    The class implements the Extremely randomized trees algorithm. ``CvERTrees`` is inherited from :ocv:class:`CvRTrees` and has the same interface, so see description of :ocv:class:`CvRTrees` class to get details. To set the training parameters of Extremely randomized trees the same class :ocv:struct:`CvRTParams` is used.
diff --git a/modules/ml/doc/expectation_maximization.rst b/modules/ml/doc/expectation_maximization.rst
index b79dea820b..82450be4be 100644
--- a/modules/ml/doc/expectation_maximization.rst
+++ b/modules/ml/doc/expectation_maximization.rst
@@ -91,22 +91,23 @@ already a good enough approximation).
 
 EM
 --
-.. ocv:class:: EM : public Algorithm
+.. ocv:class:: EM : public StatModel
 
-The class implements the EM algorithm as described in the beginning of this section. It is inherited from :ocv:class:`Algorithm`.
+The class implements the EM algorithm as described in the beginning of this section.
 
+EM::Params
+----------
+.. ocv:class:: EM::Params
 
-EM::EM
-------
-The constructor of the class
+The class describes EM training parameters. It includes:
 
-.. ocv:function:: EM::EM(int nclusters=EM::DEFAULT_NCLUSTERS, int covMatType=EM::COV_MAT_DIAGONAL, const TermCriteria& termCrit=TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, EM::DEFAULT_MAX_ITERS, FLT_EPSILON) )
+  .. ocv:member:: int clusters
+  
+    The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
 
-.. ocv:pyfunction:: cv2.EM([nclusters[, covMatType[, termCrit]]]) -> <EM object>
+  .. ocv:member:: int covMatType
 
-    :param nclusters: The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
-
-    :param covMatType: Constraint on covariance matrices which defines type of matrices. Possible values are:
+    Constraint on covariance matrices which defines type of matrices. Possible values are:
 
         * **EM::COV_MAT_SPHERICAL** A scaled identity matrix :math:`\mu_k * I`. There is the only parameter :math:`\mu_k` to be estimated for each matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with ``covMatType=EM::COV_MAT_DIAGONAL``.
 
@@ -114,23 +115,30 @@ The constructor of the class
 
         * **EM::COV_MAT_GENERIC** A symmetric positively defined matrix. The number of free parameters in each matrix is about :math:`d^2/2`. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
 
-    :param termCrit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
+  .. ocv:member:: TermCriteria termCrit
+  
+    The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
+
+
+EM::create
+----------
+Creates empty EM model
+
+.. ocv:function:: Ptr<EM> EM::create(const Params& params=Params())
+
+    :param params: EM parameters
+
+The model should be trained then using ``StatModel::train(traindata, flags)`` method. Alternatively, you can use one of the ``EM::train*`` methods or load it from file using ``StatModel::load<EM>(filename)``.     
 
 EM::train
 ---------
-Estimates the Gaussian mixture parameters from a samples set.
+Static methods that estimate the Gaussian mixture parameters from a samples set
 
-.. ocv:function:: bool EM::train(InputArray samples, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
+.. ocv:function:: Ptr<EM> EM::train(InputArray samples, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())
 
-.. ocv:function:: bool EM::trainE(InputArray samples, InputArray means0, InputArray covs0=noArray(), InputArray weights0=noArray(), OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
+.. ocv:function:: bool EM::train_startWithE(InputArray samples, InputArray means0, InputArray covs0=noArray(), InputArray weights0=noArray(), OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())
 
-.. ocv:function:: bool EM::trainM(InputArray samples, InputArray probs0, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
-
-.. ocv:pyfunction:: cv2.EM.train(samples[, logLikelihoods[, labels[, probs]]]) -> retval, logLikelihoods, labels, probs
-
-.. ocv:pyfunction:: cv2.EM.trainE(samples, means0[, covs0[, weights0[, logLikelihoods[, labels[, probs]]]]]) -> retval, logLikelihoods, labels, probs
-
-.. ocv:pyfunction:: cv2.EM.trainM(samples, probs0[, logLikelihoods[, labels[, probs]]]) -> retval, logLikelihoods, labels, probs
+.. ocv:function:: bool EM::train_startWithM(InputArray samples, InputArray probs0, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())
 
     :param samples: Samples from which the Gaussian mixture model will be estimated. It should be a one-channel matrix, each row of which is a sample. If the matrix does not have ``CV_64F`` type it will be converted to the inner matrix of such type for the further computing.
 
@@ -147,6 +155,8 @@ Estimates the Gaussian mixture parameters from a samples set.
     :param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample). It has :math:`nsamples \times 1` size and ``CV_32SC1`` type.
 
     :param probs: The optional output matrix that contains posterior probabilities of each Gaussian mixture component given the each sample. It has :math:`nsamples \times nclusters` size and ``CV_64FC1`` type.
+    
+    :param params: The Gaussian mixture params, see ``EM::Params`` description above.
 
 Three versions of training method differ in the initialization of Gaussian mixture model parameters and start step:
 
@@ -167,15 +177,13 @@ Unlike many of the ML models, EM is an unsupervised learning algorithm and it do
 :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
 
 The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
-:ocv:class:`CvNormalBayesClassifier`.
+``NormalBayesClassifier``.
 
-EM::predict
------------
+EM::predict2
+------------
 Returns a likelihood logarithm value and an index of the most probable mixture component for the given sample.
 
-.. ocv:function:: Vec2d EM::predict(InputArray sample, OutputArray probs=noArray()) const
-
-.. ocv:pyfunction:: cv2.EM.predict(sample[, probs]) -> retval, probs
+.. ocv:function:: Vec2d EM::predict2(InputArray sample, OutputArray probs=noArray()) const
 
     :param sample: A sample for classification. It should be a one-channel matrix of :math:`1 \times dims` or :math:`dims \times 1` size.
 
@@ -183,28 +191,29 @@ Returns a likelihood logarithm value and an index of the most probable mixture c
 
 The method returns a two-element ``double`` vector. Zero element is a likelihood logarithm value for the sample. First element is an index of the most probable mixture component for the given sample.
 
-CvEM::isTrained
----------------
-Returns ``true`` if the Gaussian mixture model was trained.
 
-.. ocv:function:: bool EM::isTrained() const
+EM::getMeans
+------------
+Returns the cluster centers (means of the Gaussian mixture)
 
-.. ocv:pyfunction:: cv2.EM.isTrained() -> retval
+.. ocv:function:: Mat EM::getMeans() const
 
-EM::read, EM::write
--------------------
-See :ocv:func:`Algorithm::read` and :ocv:func:`Algorithm::write`.
+Returns matrix with the number of rows equal to the number of mixtures and number of columns equal to the space dimensionality.
 
-EM::get, EM::set
-----------------
-See :ocv:func:`Algorithm::get` and :ocv:func:`Algorithm::set`. The following parameters are available:
 
-* ``"nclusters"``
-* ``"covMatType"``
-* ``"maxIters"``
-* ``"epsilon"``
-* ``"weights"`` *(read-only)*
-* ``"means"`` *(read-only)*
-* ``"covs"`` *(read-only)*
+EM::getWeights
+--------------
+Returns weights of the mixtures
 
-..
+.. ocv:function:: Mat EM::getWeights() const
+
+Returns vector with the number of elements equal to the number of mixtures.
+
+
+EM::getCovs
+--------------
+Returns covariation matrices
+
+.. ocv:function:: void EM::getCovs(std::vector<Mat>& covs) const
+
+Returns vector of covariation matrices. Number of matrices is the number of gaussian mixtures, each matrix is a square floating-point matrix NxN, where N is the space dimensionality.
diff --git a/modules/ml/doc/gradient_boosted_trees.rst b/modules/ml/doc/gradient_boosted_trees.rst
deleted file mode 100644
index b83c47e4e1..0000000000
--- a/modules/ml/doc/gradient_boosted_trees.rst
+++ /dev/null
@@ -1,272 +0,0 @@
-.. _Gradient Boosted Trees:
-
-Gradient Boosted Trees
-======================
-
-.. highlight:: cpp
-
-Gradient Boosted Trees (GBT) is a generalized boosting algorithm introduced by
-Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
-In contrast to the AdaBoost.M1 algorithm, GBT can deal with both multiclass
-classification and regression problems. Moreover, it can use any
-differential loss function, some popular ones are implemented.
-Decision trees (:ocv:class:`CvDTree`) usage as base learners allows to process ordered
-and categorical variables.
-
-.. _Training GBT:
-
-Training the GBT model
-----------------------
-
-Gradient Boosted Trees model represents an ensemble of single regression trees
-built in a greedy fashion. Training procedure is an iterative process
-similar to the numerical optimization via the gradient descent method. Summary loss
-on the training set depends only on the current model predictions for the
-training samples,  in other words
-:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
-\equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
-gradient can be computed as follows:
-
-.. math::
-    grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
-    \dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
-    \dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
-
-At every training step, a single regression tree is built to predict an
-antigradient vector components. Step length is computed corresponding to the
-loss function and separately for every region determined by the tree leaf. It
-can be eliminated by changing values of the leaves  directly.
-
-See below the main scheme of the training process:
-
-#.
-    Find the best constant model.
-#.
-    For :math:`i` in :math:`[1,M]`:
-
-    #.
-        Compute the antigradient.
-    #.
-        Grow a regression tree to predict antigradient components.
-    #.
-        Change values in the tree leaves.
-    #.
-        Add the tree to the model.
-
-
-The following loss functions are implemented for regression problems:
-
-*
-    Squared loss (``CvGBTrees::SQUARED_LOSS``):
-    :math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
-*
-    Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):
-    :math:`L(y,f(x))=|y-f(x)|`
-*
-    Huber loss (``CvGBTrees::HUBER_LOSS``):
-    :math:`L(y,f(x)) = \left\{ \begin{array}{lr}
-    \delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
-    \dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
-
-    where :math:`\delta` is the :math:`\alpha`-quantile estimation of the
-    :math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.
-
-
-The following loss functions are implemented for classification problems:
-
-*
-    Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):
-    :math:`K` functions are built, one function for each output class, and
-    :math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,
-    where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
-    is the estimation of the probability of :math:`y=k`.
-
-As a result, you get the following model:
-
-.. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
-
-where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
-is a regularization parameter from the interval :math:`(0,1]`, further called
-*shrinkage*.
-
-.. _Predicting with GBT:
-
-Predicting with the GBT Model
------------------------------
-
-To get the GBT model prediction, you need to compute the sum of responses of
-all the trees in the ensemble. For regression problems, it is the answer.
-For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
-
-
-.. highlight:: cpp
-
-
-CvGBTreesParams
----------------
-.. ocv:struct:: CvGBTreesParams : public CvDTreeParams
-
-GBT training parameters.
-
-The structure contains parameters for each single decision tree in the ensemble,
-as well as the whole model characteristics. The structure is derived from
-:ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
-cross-validation, pruning, and class priorities are not used.
-
-CvGBTreesParams::CvGBTreesParams
---------------------------------
-.. ocv:function:: CvGBTreesParams::CvGBTreesParams()
-
-.. ocv:function:: CvGBTreesParams::CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage, float subsample_portion, int max_depth, bool use_surrogates )
-
-   :param loss_function_type: Type of the loss function used for training
-    (see :ref:`Training GBT`). It must be one of the
-    following types: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,
-    ``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three
-    types are used for regression problems, and the last one for
-    classification.
-
-   :param weak_count: Count of boosting algorithm iterations. ``weak_count*K`` is the total
-    count of trees in the GBT model, where ``K`` is the output classes count
-    (equal to one in case of a regression).
-
-   :param shrinkage: Regularization parameter (see :ref:`Training GBT`).
-
-   :param subsample_portion: Portion of the whole training set used for each algorithm iteration.
-    Subset is generated randomly. For more information see
-    http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
-
-   :param max_depth: Maximal depth of each decision tree in the ensemble (see :ocv:class:`CvDTree`).
-
-   :param use_surrogates: If ``true``, surrogate splits are built (see :ocv:class:`CvDTree`).
-
-By default the following constructor is used:
-
-.. code-block:: cpp
-
-    CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.01f, 0.8f, 3, false)
-        : CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
-
-CvGBTrees
----------
-.. ocv:class:: CvGBTrees : public CvStatModel
-
-The class implements the Gradient boosted tree model as described in the beginning of this section.
-
-CvGBTrees::CvGBTrees
---------------------
-Default and training constructors.
-
-.. ocv:function:: CvGBTrees::CvGBTrees()
-
-.. ocv:function:: CvGBTrees::CvGBTrees( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams() )
-
-.. ocv:function:: CvGBTrees::CvGBTrees( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams() )
-
-.. ocv:pyfunction:: cv2.GBTrees([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <GBTrees object>
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvGBTrees::train
-----------------
-Trains a Gradient boosted tree model.
-
-.. ocv:function:: bool CvGBTrees::train(const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
-
-.. ocv:function:: bool CvGBTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams(), bool update=false )
-
-.. ocv:function:: bool CvGBTrees::train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
-
-.. ocv:pyfunction:: cv2.GBTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
-
-The first train method follows the common template (see :ocv:func:`CvStatModel::train`).
-Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.
-``trainData`` must be of the ``CV_32F`` type. ``responses`` must be a matrix of type
-``CV_32S`` or ``CV_32F``. In both cases it is converted into the ``CV_32F``
-matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a
-list of indices (``CV_32S``) or a mask (``CV_8U`` or ``CV_8S``). ``update`` is
-a dummy parameter.
-
-The second form of :ocv:func:`CvGBTrees::train` function uses :ocv:class:`CvMLData` as a
-data set container. ``update`` is still a dummy parameter.
-
-All parameters specific to the GBT model are passed into the training function
-as a :ocv:class:`CvGBTreesParams` structure.
-
-
-CvGBTrees::predict
-------------------
-Predicts a response for an input sample.
-
-.. ocv:function:: float CvGBTrees::predict(const Mat& sample, const Mat& missing=Mat(), const Range& slice = Range::all(), int k=-1) const
-
-.. ocv:function:: float CvGBTrees::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ, int k=-1 ) const
-
-.. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
-
-   :param sample: Input feature vector that has the same format as every training set
-    element. If not all the variables were actually used during training,
-    ``sample`` contains forged values at the appropriate places.
-
-   :param missing: Missing values mask, which is a dimensional matrix of the same size as
-    ``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
-    in the same position in the ``sample`` vector. If there are no missing values
-    in the feature vector, an empty matrix can be passed instead of the missing mask.
-
-   :param weakResponses: Matrix used to obtain predictions of all the trees.
-    The matrix has :math:`K` rows,
-    where :math:`K` is the count of output classes (1 for the regression case).
-    The matrix has as many columns as the ``slice`` length.
-
-   :param slice: Parameter defining the part of the ensemble used for prediction.
-    If ``slice = Range::all()``, all trees are used. Use this parameter to
-    get predictions of the GBT models with different ensemble sizes learning
-    only one model.
-
-   :param k: Number of tree ensembles built in case of the classification problem
-    (see :ref:`Training GBT`). Use this
-    parameter to change the output to sum of the trees' predictions in the
-    ``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
-    must be -1. For regression problems, ``k`` is also equal to -1.
-
-The method predicts the response corresponding to the given sample
-(see :ref:`Predicting with GBT`).
-The result is either the class label or the estimated function value. The
-:ocv:func:`CvGBTrees::predict` method enables using the parallel version of the GBT model
-prediction if the OpenCV is built with the TBB library. In this case, predictions
-of single trees are computed in a parallel fashion.
-
-
-CvGBTrees::clear
-----------------
-Clears the model.
-
-.. ocv:function:: void CvGBTrees::clear()
-
-.. ocv:pyfunction:: cv2.GBTrees.clear() -> None
-
-The function deletes the data set information and all the weak models and sets all internal
-variables to the initial state. The function is called in :ocv:func:`CvGBTrees::train` and in the
-destructor.
-
-
-CvGBTrees::calc_error
----------------------
-Calculates a training or testing error.
-
-.. ocv:function:: float CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )
-
-   :param _data: Data set.
-
-   :param type: Parameter defining the error that should be computed: train (``CV_TRAIN_ERROR``) or test
-    (``CV_TEST_ERROR``).
-
-   :param resp: If non-zero, a vector of predictions on the corresponding data set is
-    returned.
-
-If the :ocv:class:`CvMLData` data is used to store the data set, :ocv:func:`CvGBTrees::calc_error` can be
-used to get a training/testing error easily and (optionally) all predictions
-on the training/testing set. If the Intel* TBB* library is used, the error is computed in a
-parallel way, namely, predictions for different samples are computed at the same time.
-In case of a regression problem, a mean squared error is returned. For
-classifications, the result is a misclassification error in percent.
diff --git a/modules/ml/doc/k_nearest_neighbors.rst b/modules/ml/doc/k_nearest_neighbors.rst
index 05413c7785..6e16641450 100644
--- a/modules/ml/doc/k_nearest_neighbors.rst
+++ b/modules/ml/doc/k_nearest_neighbors.rst
@@ -5,9 +5,9 @@ K-Nearest Neighbors
 
 The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (**K**) of the nearest neighbors of the sample using voting, calculating weighted sum, and so on. The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
 
-CvKNearest
+KNearest
 ----------
-.. ocv:class:: CvKNearest : public CvStatModel
+.. ocv:class:: KNearest : public StatModel
 
 The class implements K-Nearest Neighbors model as described in the beginning of this section.
 
@@ -17,65 +17,32 @@ The class implements K-Nearest Neighbors model as described in the beginning of
    * (Python) An example of grid search digit recognition using KNearest can be found at opencv_source/samples/python2/digits_adjust.py
    * (Python) An example of video digit recognition using KNearest can be found at opencv_source/samples/python2/digits_video.py
 
-CvKNearest::CvKNearest
+KNearest::create
 ----------------------
-Default and training constructors.
+Creates the empty model
 
-.. ocv:function:: CvKNearest::CvKNearest()
+.. ocv:function:: Ptr<KNearest> KNearest::create(const Params& params=Params())
 
-.. ocv:function:: CvKNearest::CvKNearest( const Mat& trainData, const Mat& responses, const Mat& sampleIdx=Mat(), bool isRegression=false, int max_k=32 )
+    :param params: The model parameters: default number of neighbors to use in predict method (in ``KNearest::findNearest`` this number must be passed explicitly) and the flag on whether classification or regression model should be trained.
 
-.. ocv:function:: CvKNearest::CvKNearest( const CvMat* trainData, const CvMat* responses, const CvMat* sampleIdx=0, bool isRegression=false, int max_k=32 )
+The static method creates empty KNearest classifier. It should be then trained using ``train`` method (see ``StatModel::train``). Alternatively, you can load boost model from file using ``StatModel::load<KNearest>(filename)``.
 
-See :ocv:func:`CvKNearest::train` for additional parameters descriptions.
 
-CvKNearest::train
------------------
-Trains the model.
-
-.. ocv:function:: bool CvKNearest::train( const Mat& trainData, const Mat& responses, const Mat& sampleIdx=Mat(), bool isRegression=false, int maxK=32, bool updateBase=false )
-
-.. ocv:function:: bool CvKNearest::train( const CvMat* trainData, const CvMat* responses, const CvMat* sampleIdx=0, bool is_regression=false, int maxK=32, bool updateBase=false )
-
-.. ocv:pyfunction:: cv2.KNearest.train(trainData, responses[, sampleIdx[, isRegression[, maxK[, updateBase]]]]) -> retval
-
-    :param isRegression: Type of the problem: ``true`` for regression and ``false`` for classification.
-
-    :param maxK: Number of maximum neighbors that may be passed to the method :ocv:func:`CvKNearest::find_nearest`.
-
-    :param updateBase: Specifies whether the model is trained from scratch (``update_base=false``), or it is updated using the new training data (``update_base=true``). In the latter case, the parameter ``maxK`` must not be larger than the original value.
-
-The method trains the K-Nearest model. It follows the conventions of the generic :ocv:func:`CvStatModel::train` approach with the following limitations:
-
-* Only ``CV_ROW_SAMPLE`` data layout is supported.
-* Input variables are all ordered.
-* Output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ).
-* Variable subsets (``var_idx``) and missing measurements are not supported.
-
-CvKNearest::find_nearest
+KNearest::findNearest
 ------------------------
 Finds the neighbors and predicts responses for input vectors.
 
-.. ocv:function:: float CvKNearest::find_nearest( const Mat& samples, int k, Mat* results=0, const float** neighbors=0, Mat* neighborResponses=0, Mat* dist=0 ) const
+.. ocv:function:: float KNearest::findNearest( InputArray samples, int k, OutputArray results, OutputArray neighborResponses=noArray(), OutputArray dist=noArray() ) const
 
-.. ocv:function:: float CvKNearest::find_nearest( const Mat& samples, int k, Mat& results, Mat& neighborResponses, Mat& dists) const
+    :param samples: Input samples stored by rows. It is a single-precision floating-point matrix of ``<number_of_samples> * k`` size.
 
-.. ocv:function:: float CvKNearest::find_nearest( const CvMat* samples, int k, CvMat* results=0, const float** neighbors=0, CvMat* neighborResponses=0, CvMat* dist=0 ) const
+    :param k: Number of used nearest neighbors. Should be greater than 1.
 
-.. ocv:pyfunction:: cv2.KNearest.find_nearest(samples, k[, results[, neighborResponses[, dists]]]) -> retval, results, neighborResponses, dists
+    :param results: Vector with results of prediction (regression or classification) for each input sample. It is a single-precision floating-point vector with ``<number_of_samples>`` elements.
 
+    :param neighborResponses: Optional output values for corresponding neighbors. It is a single-precision floating-point matrix of ``<number_of_samples> * k`` size.
 
-    :param samples: Input samples stored by rows. It is a single-precision floating-point matrix of :math:`number\_of\_samples \times number\_of\_features` size.
-
-    :param k: Number of used nearest neighbors. It must satisfy constraint: :math:`k \le` :ocv:func:`CvKNearest::get_max_k`.
-
-    :param results: Vector with results of prediction (regression or classification) for each input sample. It is a single-precision floating-point vector with ``number_of_samples`` elements.
-
-    :param neighbors: Optional output pointers to the neighbor vectors themselves. It is an array of ``k*samples->rows`` pointers.
-
-    :param neighborResponses: Optional output values for corresponding ``neighbors``. It is a single-precision floating-point matrix of :math:`number\_of\_samples \times k` size.
-
-    :param dist: Optional output distances from the input vectors to the corresponding ``neighbors``. It is a single-precision floating-point matrix of :math:`number\_of\_samples \times k` size.
+    :param dist: Optional output distances from the input vectors to the corresponding neighbors. It is a single-precision floating-point matrix of ``<number_of_samples> * k`` size.
 
 For each input vector (a row of the matrix ``samples``), the method finds the ``k`` nearest neighbors.  In case of regression, the predicted result is a mean value of the particular vector's neighbor responses. In case of classification, the class is determined by voting.
 
@@ -87,110 +54,18 @@ If only a single input vector is passed, all output matrices are optional and th
 
 The function is parallelized with the TBB library.
 
-CvKNearest::get_max_k
+KNearest::getDefaultK
 ---------------------
-Returns the number of maximum neighbors that may be passed to the method :ocv:func:`CvKNearest::find_nearest`.
+Returns the default number of neighbors
 
-.. ocv:function:: int CvKNearest::get_max_k() const
+.. ocv:function:: int KNearest::getDefaultK() const
 
-CvKNearest::get_var_count
--------------------------
-Returns the number of used features (variables count).
+The function returns the default number of neighbors that is used in a simpler ``predict`` method, not ``findNearest``.
 
-.. ocv:function:: int CvKNearest::get_var_count() const
+KNearest::setDefaultK
+---------------------
+Returns the default number of neighbors
 
-CvKNearest::get_sample_count
-----------------------------
-Returns the total number of train samples.
+.. ocv:function:: void KNearest::setDefaultK(int k)
 
-.. ocv:function:: int CvKNearest::get_sample_count() const
-
-CvKNearest::is_regression
--------------------------
-Returns type of the problem: ``true`` for regression and ``false`` for classification.
-
-.. ocv:function:: bool CvKNearest::is_regression() const
-
-
-
-The sample below (currently using the obsolete ``CvMat`` structures) demonstrates the use of the k-nearest classifier for 2D point classification: ::
-
-    #include "ml.h"
-    #include "highgui.h"
-
-    int main( int argc, char** argv )
-    {
-        const int K = 10;
-        int i, j, k, accuracy;
-        float response;
-        int train_sample_count = 100;
-        CvRNG rng_state = cvRNG(-1);
-        CvMat* trainData = cvCreateMat( train_sample_count, 2, CV_32FC1 );
-        CvMat* trainClasses = cvCreateMat( train_sample_count, 1, CV_32FC1 );
-        IplImage* img = cvCreateImage( cvSize( 500, 500 ), 8, 3 );
-        float _sample[2];
-        CvMat sample = cvMat( 1, 2, CV_32FC1, _sample );
-        cvZero( img );
-
-        CvMat trainData1, trainData2, trainClasses1, trainClasses2;
-
-        // form the training samples
-        cvGetRows( trainData, &trainData1, 0, train_sample_count/2 );
-        cvRandArr( &rng_state, &trainData1, CV_RAND_NORMAL, cvScalar(200,200), cvScalar(50,50) );
-
-        cvGetRows( trainData, &trainData2, train_sample_count/2, train_sample_count );
-        cvRandArr( &rng_state, &trainData2, CV_RAND_NORMAL, cvScalar(300,300), cvScalar(50,50) );
-
-        cvGetRows( trainClasses, &trainClasses1, 0, train_sample_count/2 );
-        cvSet( &trainClasses1, cvScalar(1) );
-
-        cvGetRows( trainClasses, &trainClasses2, train_sample_count/2, train_sample_count );
-        cvSet( &trainClasses2, cvScalar(2) );
-
-        // learn classifier
-        CvKNearest knn( trainData, trainClasses, 0, false, K );
-        CvMat* nearests = cvCreateMat( 1, K, CV_32FC1);
-
-        for( i = 0; i < img->height; i++ )
-        {
-            for( j = 0; j < img->width; j++ )
-            {
-                sample.data.fl[0] = (float)j;
-                sample.data.fl[1] = (float)i;
-
-                // estimate the response and get the neighbors' labels
-                response = knn.find_nearest(&sample,K,0,0,nearests,0);
-
-                // compute the number of neighbors representing the majority
-                for( k = 0, accuracy = 0; k < K; k++ )
-                {
-                    if( nearests->data.fl[k] == response)
-                        accuracy++;
-                }
-                // highlight the pixel depending on the accuracy (or confidence)
-                cvSet2D( img, i, j, response == 1 ?
-                    (accuracy > 5 ? CV_RGB(180,0,0) : CV_RGB(180,120,0)) :
-                    (accuracy > 5 ? CV_RGB(0,180,0) : CV_RGB(120,120,0)) );
-            }
-        }
-
-        // display the original training samples
-        for( i = 0; i < train_sample_count/2; i++ )
-        {
-            CvPoint pt;
-            pt.x = cvRound(trainData1.data.fl[i*2]);
-            pt.y = cvRound(trainData1.data.fl[i*2+1]);
-            cvCircle( img, pt, 2, CV_RGB(255,0,0), CV_FILLED );
-            pt.x = cvRound(trainData2.data.fl[i*2]);
-            pt.y = cvRound(trainData2.data.fl[i*2+1]);
-            cvCircle( img, pt, 2, CV_RGB(0,255,0), CV_FILLED );
-        }
-
-        cvNamedWindow( "classifier result", 1 );
-        cvShowImage( "classifier result", img );
-        cvWaitKey(0);
-
-        cvReleaseMat( &trainClasses );
-        cvReleaseMat( &trainData );
-        return 0;
-    }
+The function sets the default number of neighbors that is used in a simpler ``predict`` method, not ``findNearest``.
diff --git a/modules/ml/doc/ml.rst b/modules/ml/doc/ml.rst
index b83e7dedc3..86da3ac4ff 100644
--- a/modules/ml/doc/ml.rst
+++ b/modules/ml/doc/ml.rst
@@ -15,9 +15,7 @@ Most of the classification and regression algorithms are implemented as C++ clas
     support_vector_machines
     decision_trees
     boosting
-    gradient_boosted_trees
     random_trees
-    ertrees
     expectation_maximization
     neural_networks
     mldata
diff --git a/modules/ml/doc/mldata.rst b/modules/ml/doc/mldata.rst
index c3092d1490..8a3b796e30 100644
--- a/modules/ml/doc/mldata.rst
+++ b/modules/ml/doc/mldata.rst
@@ -1,279 +1,126 @@
-MLData
+Training Data
 ===================
 
 .. highlight:: cpp
 
-For the machine learning algorithms, the data set is often stored in a file of the ``.csv``-like format. The file contains a table of predictor and response values where each row of the table corresponds to a sample. Missing values are supported. The UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/) provides many data sets stored in such a format to the machine learning community. The class ``MLData`` is implemented to easily load the data for training one of the OpenCV machine learning algorithms. For float values, only the  ``'.'`` separator is supported. The table can have a header and in such case the user have to set the number of the header lines to skip them duaring the file reading.
+In machine learning algorithms there is notion of training data. Training data includes several components:
 
-CvMLData
+* A set of training samples. Each training sample is a vector of values (in Computer Vision it's sometimes referred to as feature vector). Usually all the vectors have the same number of components (features); OpenCV ml module assumes that. Each feature can be ordered (i.e. its values are floating-point numbers that can be compared with each other and strictly ordered, i.e. sorted) or categorical (i.e. its value belongs to a fixed set of values that can be integers, strings etc.).
+
+* Optional set of responses corresponding to the samples. Training data with no responses is used in unsupervised learning algorithms that learn structure of the supplied data based on distances between different samples. Training data with responses is used in supervised learning algorithms, which learn the function mapping samples to responses. Usually the responses are scalar values, ordered (when we deal with regression problem) or categorical (when we deal with classification problem; in this case the responses are often called "labels"). Some algorithms, most noticeably Neural networks, can handle not only scalar, but also multi-dimensional or vector responses.
+
+* Another optional component is the mask of missing measurements. Most algorithms require all the components in all the training samples be valid, but some other algorithms, such as decision tress, can handle the cases of missing measurements.
+
+* In the case of classification problem user may want to give different weights to different classes. This is useful, for example, when
+  * user wants to shift prediction accuracy towards lower false-alarm rate or higher hit-rate.
+  * user wants to compensate for significantly different amounts of training samples from different classes.
+
+* In addition to that, each training sample may be given a weight, if user wants the algorithm to pay special attention to certain training samples and adjust the training model accordingly.
+
+* Also, user may wish not to use the whole training data at once, but rather use parts of it, e.g. to do parameter optimization via cross-validation procedure.
+
+As you can see, training data can have rather complex structure; besides, it may be very big and/or not entirely available, so there is need to make abstraction for this concept. In OpenCV ml there is ``cv::ml::TrainData`` class for that.
+
+TrainData
 --------
-.. ocv:class:: CvMLData
+.. ocv:class:: TrainData
 
-Class for loading the data from a ``.csv`` file.
-::
+Class encapsulating training data. Please note that the class only specifies the interface of training data, but not implementation. All the statistical model classes in ml take Ptr<TrainData>. In other words, you can create your own class derived from ``TrainData`` and supply smart pointer to the instance of this class into ``StatModel::train``.
 
-    class CV_EXPORTS CvMLData
-    {
-    public:
-        CvMLData();
-        virtual ~CvMLData();
+TrainData::loadFromCSV
+----------------------
+Reads the dataset from a .csv file and returns the ready-to-use training data.
 
-        int read_csv(const char* filename);
-
-        const CvMat* get_values() const;
-        const CvMat* get_responses();
-        const CvMat* get_missing() const;
-
-        void set_response_idx( int idx );
-        int get_response_idx() const;
-
-
-        void set_train_test_split( const CvTrainTestSplit * spl);
-        const CvMat* get_train_sample_idx() const;
-        const CvMat* get_test_sample_idx() const;
-        void mix_train_and_test_idx();
-
-        const CvMat* get_var_idx();
-        void change_var_idx( int vi, bool state );
-
-        const CvMat* get_var_types();
-        void set_var_types( const char* str );
-
-        int get_var_type( int var_idx ) const;
-        void change_var_type( int var_idx, int type);
-
-        void set_delimiter( char ch );
-        char get_delimiter() const;
-
-        void set_miss_ch( char ch );
-        char get_miss_ch() const;
-
-        const std::map<String, int>& get_class_labels_map() const;
-
-    protected:
-        ...
-    };
-
-CvMLData::read_csv
-------------------
-Reads the data set from a ``.csv``-like ``filename`` file and stores all read values in a matrix.
-
-.. ocv:function:: int CvMLData::read_csv(const char* filename)
+.. ocv:function:: Ptr<TrainData> loadFromCSV(const String& filename, int headerLineCount, int responseStartIdx=-1, int responseEndIdx=-1, const String& varTypeSpec=String(), char delimiter=',', char missch='?');
 
     :param filename: The input file name
 
-While reading the data, the method tries to define the type of variables (predictors and responses): ordered or categorical. If a value of the variable is not numerical (except for the label for a missing value), the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all existing values of the variable are numerical, the type of the variable is set to ``CV_VAR_ORDERED``. So, the default definition of variables types works correctly for all cases except the case of a categorical variable with numerical class labels. In this case, the type ``CV_VAR_ORDERED`` is set. You should change the type to ``CV_VAR_CATEGORICAL`` using the method :ocv:func:`CvMLData::change_var_type`. For categorical variables, a common map is built to convert a string class label to the numerical class label. Use :ocv:func:`CvMLData::get_class_labels_map` to obtain this map.
+    :param headerLineCount: The number of lines in the beginning to skip; besides the header, the function also skips empty lines and lines staring with '#'
+    
+    :param responseStartIdx: Index of the first output variable. If -1, the function considers the last variable as the response
+    
+    :param responseEndIdx: Index of the last output variable + 1. If -1, then there is single response variable at ``responseStartIdx``.
+    
+    :param varTypeSpec: The optional text string that specifies the variables' types. It has the format ``ord[n1-n2,n3,n4-n5,...]cat[n6,n7-n8,...]``. That is, variables from n1 to n2 (inclusive range), n3, n4 to n5 ... are considered ordered and n6, n7 to n8 ... are considered as categorical. The range [n1..n2] + [n3] + [n4..n5] + ... + [n6] + [n7..n8] should cover all the variables. If varTypeSpec is not specified, then algorithm uses the following rules:
+        1. all input variables are considered ordered by default. If some column contains has non-numerical values, e.g. 'apple', 'pear', 'apple', 'apple', 'mango', the corresponding variable is considered categorical.
+        2. if there are several output variables, they are all considered as ordered. Error is reported when non-numerical values are used.
+        3. if there is a single output variable, then if its values are non-numerical or are all integers, then it's considered categorical. Otherwise, it's considered ordered.
+    
+    :param delimiter: The character used to separate values in each line.
+    
+    :param missch: The character used to specify missing measurements. It should not be a digit. Although it's a non-numerical value, it surely does not affect the decision of whether the variable ordered or categorical.
 
-Also, when reading the data, the method constructs the mask of missing values. For example, values are equal to `'?'`.
+TrainData::create
+-----------------
+Creates training data from in-memory arrays.
 
-CvMLData::get_values
---------------------
-Returns a pointer to the matrix of predictors and response values
+.. ocv:function:: Ptr<TrainData> create(InputArray samples, int layout, InputArray responses, InputArray varIdx=noArray(), InputArray sampleIdx=noArray(), InputArray sampleWeights=noArray(), InputArray varType=noArray())
 
-.. ocv:function:: const CvMat* CvMLData::get_values() const
+    :param samples: matrix of samples. It should have ``CV_32F`` type.
+    
+    :param layout: it's either ``ROW_SAMPLE``, which means that each training sample is a row of ``samples``, or ``COL_SAMPLE``, which means that each training sample occupies a column of ``samples``.
+    
+    :param responses: matrix of responses. If the responses are scalar, they should be stored as a single row or as a single column. The matrix should have type ``CV_32F`` or ``CV_32S`` (in the former case the responses are considered as ordered by default; in the latter case - as categorical)
+    
+    :param varIdx: vector specifying which variables to use for training. It can be an integer vector (``CV_32S``) containing 0-based variable indices or byte vector (``CV_8U``) containing a mask of active variables.
+    
+    :param sampleIdx: vector specifying which samples to use for training. It can be an integer vector (``CV_32S``) containing 0-based sample indices or byte vector (``CV_8U``) containing a mask of training samples.
+    
+    :param sampleWeights: optional vector with weights for each sample. It should have ``CV_32F`` type.
+    
+    :param varType: optional vector of type ``CV_8U`` and size <number_of_variables_in_samples> + <number_of_variables_in_responses>, containing types of each input and output variable. The ordered variables are denoted by value ``VAR_ORDERED``, and categorical - by ``VAR_CATEGORICAL``.
 
-The method returns a pointer to the matrix of predictor and response ``values``  or ``0`` if the data has not been loaded from the file yet.
 
-The row count of this matrix equals the sample count. The column count equals predictors ``+ 1`` for the response (if exists) count. This means that each row of the matrix contains values of one sample predictor and response. The matrix type is ``CV_32FC1``.
-
-CvMLData::get_responses
------------------------
-Returns a pointer to the matrix of response values
-
-.. ocv:function:: const CvMat* CvMLData::get_responses()
-
-The method returns a pointer to the matrix of response values or throws an exception if the data has not been loaded from the file yet.
-
-This is a single-column matrix of the type ``CV_32FC1``. Its row count is equal to the sample count, one column and .
-
-CvMLData::get_missing
----------------------
-Returns a pointer to the mask matrix of missing values
-
-.. ocv:function:: const CvMat* CvMLData::get_missing() const
-
-The method returns a pointer to the mask matrix of missing values or throws an exception if the data has not been loaded from the file yet.
-
-This matrix has the same size as the  ``values`` matrix (see :ocv:func:`CvMLData::get_values`) and the type ``CV_8UC1``.
-
-CvMLData::set_response_idx
+TrainData::getTrainSamples
 --------------------------
-Specifies index of response column in the data matrix
+Returns matrix of train samples
 
-.. ocv:function:: void CvMLData::set_response_idx( int idx )
+.. ocv:function:: Mat TrainData::getTrainSamples(int layout=ROW_SAMPLE, bool compressSamples=true, bool compressVars=true) const
 
-The method sets the index of a response column in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`) or throws an exception if the data has not been loaded from the file yet.
+    :param layout: The requested layout. If it's different from the initial one, the matrix is transposed.
+    
+    :param compressSamples: if true, the function returns only the training samples (specified by sampleIdx)
+    
+    :param compressVars: if true, the function returns the shorter training samples, containing only the active variables.
+    
+In current implementation the function tries to avoid physical data copying and returns the matrix stored inside TrainData (unless the transposition or compression is needed).
 
-The old response columns become predictors. If ``idx < 0``, there is no response.
 
-CvMLData::get_response_idx
---------------------------
-Returns index of the response column in the loaded data matrix
+TrainData::getTrainResponses
+----------------------------
+Returns the vector of responses
 
-.. ocv:function:: int CvMLData::get_response_idx() const
+.. ocv:function:: Mat TrainData::getTrainResponses() const
 
-The method returns the index of a response column in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`) or throws an exception if the data has not been loaded from the file yet.
+The function returns ordered or the original categorical responses. Usually it's used in regression algorithms.
 
-If ``idx < 0``, there is no response.
 
+TrainData::getClassLabels
+----------------------------
+Returns the vector of class labels
 
-CvMLData::set_train_test_split
-------------------------------
-Divides the read data set into two disjoint training and test subsets.
+.. ocv:function:: Mat TrainData::getClassLabels() const
 
-.. ocv:function:: void CvMLData::set_train_test_split( const CvTrainTestSplit * spl )
+The function returns vector of unique labels occurred in the responses.
 
-This method sets parameters for such a split using ``spl`` (see :ocv:class:`CvTrainTestSplit`) or throws an exception if the data has not been loaded from the file yet.
 
-CvMLData::get_train_sample_idx
-------------------------------
-Returns the matrix of sample indices for a training subset
+TrainData::getTrainNormCatResponses
+-----------------------------------
+Returns the vector of normalized categorical responses
 
-.. ocv:function:: const CvMat* CvMLData::get_train_sample_idx() const
+.. ocv:function:: Mat TrainData::getTrainNormCatResponses() const
 
-The method returns the matrix of sample indices for a training subset. This is a single-row  matrix of the type ``CV_32SC1``. If data split is not set, the method returns ``0``. If the data has not been loaded from the file yet, an exception is thrown.
+The function returns vector of responses. Each response is integer from 0 to <number of classes>-1. The actual label value can be retrieved then from the class label vector, see ``TrainData::getClassLabels``.
 
-CvMLData::get_test_sample_idx
------------------------------
-Returns the matrix of sample indices for a testing subset
+TrainData::setTrainTestSplitRatio
+-----------------------------------
+Splits the training data into the training and test parts
 
-.. ocv:function:: const CvMat* CvMLData::get_test_sample_idx() const
+.. ocv:function:: void TrainData::setTrainTestSplitRatio(double ratio, bool shuffle=true)
 
+The function selects a subset of specified relative size and then returns it as the training set. If the function is not called, all the data is used for training. Please, note that for each of ``TrainData::getTrain*`` there is corresponding ``TrainData::getTest*``, so that the test subset can be retrieved and processed as well.
 
-CvMLData::mix_train_and_test_idx
---------------------------------
-Mixes the indices of training and test samples
 
-.. ocv:function:: void CvMLData::mix_train_and_test_idx()
-
-The method shuffles the indices of training and test samples preserving sizes of training and test subsets if the data split is set by :ocv:func:`CvMLData::get_values`. If the data has not been loaded from the file yet, an exception is thrown.
-
-CvMLData::get_var_idx
----------------------
-Returns the indices of the active variables in the data matrix
-
-.. ocv:function:: const CvMat* CvMLData::get_var_idx()
-
-The method returns the indices of variables (columns) used in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`).
-
-It returns ``0`` if the used subset is not set. It throws an exception if the data has not been loaded from the file yet. Returned matrix is a single-row matrix of the type ``CV_32SC1``. Its column count is equal to the size of the used variable subset.
-
-CvMLData::change_var_idx
-------------------------
-Enables or disables particular variable in the loaded data
-
-.. ocv:function:: void CvMLData::change_var_idx( int vi, bool state )
-
-By default, after reading the data set all variables in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`) are used. But you may want to use only a subset of variables and include/exclude (depending on ``state`` value) a variable with the ``vi`` index from the used subset. If the data has not been loaded from the file yet, an exception is thrown.
-
-CvMLData::get_var_types
------------------------
-Returns a matrix of the variable types.
-
-.. ocv:function:: const CvMat* CvMLData::get_var_types()
-
-The function returns a single-row matrix of the type ``CV_8UC1``, where each element is set to either ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``. The number of columns is equal to the number of variables. If data has not been loaded from file yet an exception is thrown.
-
-CvMLData::set_var_types
------------------------
-Sets the variables types in the loaded data.
-
-.. ocv:function:: void CvMLData::set_var_types( const char* str )
-
-In the string, a variable type is followed by a list of variables indices. For example: ``"ord[0-17],cat[18]"``, ``"ord[0,2,4,10-12], cat[1,3,5-9,13,14]"``, ``"cat"`` (all variables are categorical), ``"ord"`` (all variables are ordered).
-
-CvMLData::get_header_lines_number
----------------------------------
-Returns a number of the table header lines.
-
-.. ocv:function:: int CvMLData::get_header_lines_number() const
-
-CvMLData::set_header_lines_number
----------------------------------
-Sets a number of the table header lines.
-
-.. ocv:function:: void CvMLData::set_header_lines_number( int n )
-
-By default it is supposed that the table does not have a header, i.e. it contains only the data.
-
-CvMLData::get_var_type
-----------------------
-Returns type of the specified variable
-
-.. ocv:function:: int CvMLData::get_var_type( int var_idx ) const
-
-The method returns the type of a variable by the index ``var_idx`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``).
-
-CvMLData::change_var_type
--------------------------
-Changes type of the specified variable
-
-.. ocv:function:: void CvMLData::change_var_type( int var_idx, int type)
-
-The method changes type of variable with index ``var_idx`` from existing type to ``type`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``).
-
-CvMLData::set_delimiter
------------------------
-Sets the delimiter in the file used to separate input numbers
-
-.. ocv:function:: void CvMLData::set_delimiter( char ch )
-
-The method sets the delimiter for variables in a file. For example: ``','`` (default), ``';'``, ``' '`` (space), or other characters. The floating-point separator ``'.'`` is not allowed.
-
-CvMLData::get_delimiter
------------------------
-Returns the currently used delimiter character.
-
-.. ocv:function:: char CvMLData::get_delimiter() const
-
-
-CvMLData::set_miss_ch
----------------------
-Sets the character used to specify missing values
-
-.. ocv:function:: void CvMLData::set_miss_ch( char ch )
-
-The method sets the character used to specify missing values. For example: ``'?'`` (default), ``'-'``. The floating-point separator ``'.'`` is not allowed.
-
-CvMLData::get_miss_ch
----------------------
-Returns the currently used missing value character.
-
-.. ocv:function:: char CvMLData::get_miss_ch() const
-
-CvMLData::get_class_labels_map
--------------------------------
-Returns a map that converts strings to labels.
-
-.. ocv:function:: const std::map<String, int>& CvMLData::get_class_labels_map() const
-
-The method returns a map that converts string class labels to the numerical class labels. It can be used to get an original class label as in a file.
-
-CvTrainTestSplit
-----------------
-.. ocv:struct:: CvTrainTestSplit
-
-Structure setting the split of a data set read by :ocv:class:`CvMLData`.
-::
-
-    struct CvTrainTestSplit
-    {
-        CvTrainTestSplit();
-        CvTrainTestSplit( int train_sample_count, bool mix = true);
-        CvTrainTestSplit( float train_sample_portion, bool mix = true);
-
-        union
-        {
-            int count;
-            float portion;
-        } train_sample_part;
-        int train_sample_part_mode;
-
-        bool mix;
-    };
-
-There are two ways to construct a split:
-
-* Set the training sample count (subset size) ``train_sample_count``. Other existing samples are located in a test subset.
-
-* Set a training sample portion in ``[0,..1]``. The flag ``mix`` is used to mix training and test samples indices when the split is set. Otherwise, the data set is split in the storing order: the first part of samples of a given size is a training subset, the second part is a test subset.
+Other methods
+-------------
+The class includes many other methods that can be used to access normalized categorical input variables, access training data by parts, so that does not have to fit into the memory etc.
diff --git a/modules/ml/doc/neural_networks.rst b/modules/ml/doc/neural_networks.rst
index 776bf243bd..166e2e2f4b 100644
--- a/modules/ml/doc/neural_networks.rst
+++ b/modules/ml/doc/neural_networks.rst
@@ -29,17 +29,17 @@ In other words, given the outputs
 Different activation functions may be used. ML implements three standard functions:
 
 *
-    Identity function ( ``CvANN_MLP::IDENTITY``     ):
+    Identity function ( ``ANN_MLP::IDENTITY``     ):
     :math:`f(x)=x`
 *
-    Symmetrical sigmoid ( ``CvANN_MLP::SIGMOID_SYM``     ):
+    Symmetrical sigmoid ( ``ANN_MLP::SIGMOID_SYM``     ):
     :math:`f(x)=\beta*(1-e^{-\alpha x})/(1+e^{-\alpha x}`     ), which is the default choice for MLP. The standard sigmoid with
     :math:`\beta =1, \alpha =1`     is shown below:
 
     .. image:: pics/sigmoid_bipolar.png
 
 *
-    Gaussian function ( ``CvANN_MLP::GAUSSIAN``     ):
+    Gaussian function ( ``ANN_MLP::GAUSSIAN``     ):
     :math:`f(x)=\beta e^{-\alpha x*x}`     , which is not completely supported at the moment.
 
 In ML, all the neurons have the same activation functions, with the same free parameters (
@@ -95,60 +95,90 @@ The second (default) one is a batch RPROP algorithm.
 .. [RPROP93] M. Riedmiller and H. Braun, *A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm*, Proc. ICNN, San Francisco (1993).
 
 
-CvANN_MLP_TrainParams
+ANN_MLP::Params
 ---------------------
-.. ocv:struct:: CvANN_MLP_TrainParams
+.. ocv:class:: ANN_MLP::Params
 
-  Parameters of the MLP training algorithm. You can initialize the structure by a constructor or the individual parameters can be adjusted after the structure is created.
+  Parameters of the MLP and of the training algorithm. You can initialize the structure by a constructor or the individual parameters can be adjusted after the structure is created.
+
+  The network structure:
+  
+  .. ocv:member:: Mat layerSizes
+  
+     The number of elements in each layer of network. The very first element specifies the number of elements in the input layer. The last element - number of elements in the output layer.
+     
+  .. ocv:member:: int activateFunc
+  
+     The activation function. Currently the only fully supported activation function is ``ANN_MLP::SIGMOID_SYM``.
+     
+  .. ocv:member:: double fparam1
+  
+     The first parameter of activation function, 0 by default.
+     
+  .. ocv:member:: double fparam2
+  
+     The second parameter of the activation function, 0 by default.
+     
+     .. note::
+     
+         If you are using the default ``ANN_MLP::SIGMOID_SYM`` activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].    
 
   The back-propagation algorithm parameters:
 
-  .. ocv:member:: double bp_dw_scale
+  .. ocv:member:: double bpDWScale
 
      Strength of the weight gradient term. The recommended value is about 0.1.
 
-  .. ocv:member:: double bp_moment_scale
+  .. ocv:member:: double bpMomentScale
 
      Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough
 
   The RPROP algorithm parameters (see [RPROP93]_ for details):
 
-  .. ocv:member:: double rp_dw0
+  .. ocv:member:: double prDW0
 
      Initial value :math:`\Delta_0` of update-values :math:`\Delta_{ij}`.
 
-  .. ocv:member:: double rp_dw_plus
+  .. ocv:member:: double rpDWPlus
 
      Increase factor :math:`\eta^+`. It must be >1.
 
-  .. ocv:member:: double rp_dw_minus
+  .. ocv:member:: double rpDWMinus
 
      Decrease factor :math:`\eta^-`. It must be <1.
 
-  .. ocv:member:: double rp_dw_min
+  .. ocv:member:: double rpDWMin
 
      Update-values lower limit :math:`\Delta_{min}`. It must be positive.
 
-  .. ocv:member:: double rp_dw_max
+  .. ocv:member:: double rpDWMax
 
      Update-values upper limit :math:`\Delta_{max}`. It must be >1.
 
 
-CvANN_MLP_TrainParams::CvANN_MLP_TrainParams
+ANN_MLP::Params::Params
 --------------------------------------------
-The constructors.
+Construct the parameter structure
 
-.. ocv:function:: CvANN_MLP_TrainParams::CvANN_MLP_TrainParams()
+.. ocv:function:: ANN_MLP::Params()
 
-.. ocv:function:: CvANN_MLP_TrainParams::CvANN_MLP_TrainParams( CvTermCriteria term_crit, int train_method, double param1, double param2=0 )
+.. ocv:function:: ANN_MLP::Params::Params( const Mat& layerSizes, int activateFunc, double fparam1, double fparam2, TermCriteria termCrit, int trainMethod, double param1, double param2=0 )
 
-    :param term_crit: Termination criteria of the training algorithm. You can specify the maximum number of iterations (``max_iter``) and/or how much the error could change between the iterations to make the algorithm continue (``epsilon``).
+    :param layerSizes: Integer vector specifying the number of neurons in each layer including the input and output layers.
+
+    :param activateFunc: Parameter specifying the activation function for each neuron: one of  ``ANN_MLP::IDENTITY``, ``ANN_MLP::SIGMOID_SYM``, and ``ANN_MLP::GAUSSIAN``.
+
+    :param fparam1: The first parameter of the activation function, :math:`\alpha`. See the formulas in the introduction section.
+
+    :param fparam2: The second parameter of the activation function, :math:`\beta`. See the formulas in the introduction section.
+
+    :param termCrit: Termination criteria of the training algorithm. You can specify the maximum number of iterations (``maxCount``) and/or how much the error could change between the iterations to make the algorithm continue (``epsilon``).
 
     :param train_method: Training method of the MLP. Possible values are:
 
-        * **CvANN_MLP_TrainParams::BACKPROP** The back-propagation algorithm.
+        * **ANN_MLP_TrainParams::BACKPROP** The back-propagation algorithm.
 
-        * **CvANN_MLP_TrainParams::RPROP** The RPROP algorithm.
+        * **ANN_MLP_TrainParams::RPROP** The RPROP algorithm.
 
     :param param1: Parameter of the training method. It is ``rp_dw0`` for ``RPROP`` and ``bp_dw_scale`` for ``BACKPROP``.
 
@@ -158,126 +188,54 @@ By default the RPROP algorithm is used:
 
 ::
 
-    CvANN_MLP_TrainParams::CvANN_MLP_TrainParams()
+    ANN_MLP_TrainParams::ANN_MLP_TrainParams()
     {
-        term_crit = cvTermCriteria( CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 1000, 0.01 );
+        layerSizes = Mat();
+        activateFun = SIGMOID_SYM;
+        fparam1 = fparam2 = 0;
+        term_crit = TermCriteria( TermCriteria::MAX_ITER + TermCriteria::EPS, 1000, 0.01 );
         train_method = RPROP;
-        bp_dw_scale = bp_moment_scale = 0.1;
-        rp_dw0 = 0.1; rp_dw_plus = 1.2; rp_dw_minus = 0.5;
-        rp_dw_min = FLT_EPSILON; rp_dw_max = 50.;
+        bpDWScale = bpMomentScale = 0.1;
+        rpDW0 = 0.1; rpDWPlus = 1.2; rpDWMinus = 0.5;
+        rpDWMin = FLT_EPSILON; rpDWMax = 50.;
     }
 
-CvANN_MLP
+ANN_MLP
 ---------
-.. ocv:class:: CvANN_MLP : public CvStatModel
+.. ocv:class:: ANN_MLP : public StatModel
 
 MLP model.
 
-Unlike many other models in ML that are constructed and trained at once, in the MLP model these steps are separated. First, a network with the specified topology is created using the non-default constructor or the method :ocv:func:`CvANN_MLP::create`. All the weights are set to zeros. Then, the network is trained using a set of input and output vectors. The training procedure can be repeated more than once, that is, the weights can be adjusted based on the new training data.
+Unlike many other models in ML that are constructed and trained at once, in the MLP model these steps are separated. First, a network with the specified topology is created using the non-default constructor or the method :ocv:func:`ANN_MLP::create`. All the weights are set to zeros. Then, the network is trained using a set of input and output vectors. The training procedure can be repeated more than once, that is, the weights can be adjusted based on the new training data.
 
 
-CvANN_MLP::CvANN_MLP
+ANN_MLP::create
 --------------------
-The constructors.
+Creates empty model
 
-.. ocv:function:: CvANN_MLP::CvANN_MLP()
+.. ocv:function:: Ptr<ANN_MLP> ANN_MLP::create(const Params& params=Params())
 
-.. ocv:function:: CvANN_MLP::CvANN_MLP( const CvMat* layerSizes, int activateFunc=CvANN_MLP::SIGMOID_SYM, double fparam1=0, double fparam2=0 )
+Use ``StatModel::train`` to train the model, ``StatModel::train<ANN_MLP>(traindata, params)`` to create and train the model, ``StatModel::load<ANN_MLP>(filename)`` to load the pre-trained model. Note that the train method has optional flags, and the following flags are handled by ``ANN_MLP``:
 
-.. ocv:pyfunction::  cv2.ANN_MLP([layerSizes[, activateFunc[, fparam1[, fparam2]]]]) -> <ANN_MLP object>
+        * **UPDATE_WEIGHTS** Algorithm updates the network weights, rather than computes them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
 
-The advanced constructor allows to create MLP with the specified topology. See :ocv:func:`CvANN_MLP::create` for details.
+        * **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation equal to 1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
 
-CvANN_MLP::create
------------------
-Constructs MLP with the specified topology.
+        * **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
 
-.. ocv:function:: void CvANN_MLP::create( const Mat& layerSizes, int activateFunc=CvANN_MLP::SIGMOID_SYM, double fparam1=0, double fparam2=0 )
 
-.. ocv:function:: void CvANN_MLP::create( const CvMat* layerSizes, int activateFunc=CvANN_MLP::SIGMOID_SYM, double fparam1=0, double fparam2=0 )
+ANN_MLP::setParams
+-------------------
+Sets the new network parameters
 
-.. ocv:pyfunction:: cv2.ANN_MLP.create(layerSizes[, activateFunc[, fparam1[, fparam2]]]) -> None
+.. ocv:function:: void ANN_MLP::setParams(const Params& params)
 
-    :param layerSizes: Integer vector specifying the number of neurons in each layer including the input and output layers.
+    :param params: The new parameters
 
-    :param activateFunc: Parameter specifying the activation function for each neuron: one of  ``CvANN_MLP::IDENTITY``, ``CvANN_MLP::SIGMOID_SYM``, and ``CvANN_MLP::GAUSSIAN``.
+The existing network, if any, will be destroyed and new empty one will be created. It should be re-trained after that.
 
-    :param fparam1: Free parameter of the activation function, :math:`\alpha`. See the formulas in the introduction section.
+ANN_MLP::getParams
+-------------------
+Retrieves the current network parameters
 
-    :param fparam2: Free parameter of the activation function, :math:`\beta`. See the formulas in the introduction section.
-
-The method creates an MLP network with the specified topology and assigns the same activation function to all the neurons.
-
-CvANN_MLP::train
-----------------
-Trains/updates MLP.
-
-.. ocv:function:: int CvANN_MLP::train( const Mat& inputs, const Mat& outputs, const Mat& sampleWeights, const Mat& sampleIdx=Mat(), CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(), int flags=0 )
-
-.. ocv:function:: int CvANN_MLP::train( const CvMat* inputs, const CvMat* outputs, const CvMat* sampleWeights, const CvMat* sampleIdx=0, CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(), int flags=0 )
-
-.. ocv:pyfunction:: cv2.ANN_MLP.train(inputs, outputs, sampleWeights[, sampleIdx[, params[, flags]]]) -> retval
-
-    :param inputs: Floating-point matrix of input vectors, one vector per row.
-
-    :param outputs: Floating-point matrix of the corresponding output vectors, one vector per row.
-
-    :param sampleWeights: (RPROP only) Optional floating-point vector of weights for each sample. Some samples may be more important than others for training. You may want to raise the weight of certain classes to find the right balance between hit-rate and false-alarm rate, and so on.
-
-    :param sampleIdx: Optional integer vector indicating the samples (rows of ``inputs`` and ``outputs``) that are taken into account.
-
-    :param params: Training parameters. See the :ocv:class:`CvANN_MLP_TrainParams` description.
-
-    :param flags: Various parameters to control the training algorithm. A combination of the following parameters is possible:
-
-            * **UPDATE_WEIGHTS** Algorithm updates the network weights, rather than computes them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
-
-            * **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation equal to 1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
-
-            * **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
-
-This method applies the specified training algorithm to computing/adjusting the network weights. It returns the number of done iterations.
-
-The RPROP training algorithm is parallelized with the TBB library.
-
-If you are using the default ``cvANN_MLP::SIGMOID_SYM`` activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results.
-
-CvANN_MLP::predict
-------------------
-Predicts responses for input samples.
-
-.. ocv:function:: float CvANN_MLP::predict( const Mat& inputs, Mat& outputs ) const
-
-.. ocv:function:: float CvANN_MLP::predict( const CvMat* inputs, CvMat* outputs ) const
-
-.. ocv:pyfunction:: cv2.ANN_MLP.predict(inputs[, outputs]) -> retval, outputs
-
-    :param inputs: Input samples.
-
-    :param outputs: Predicted responses for corresponding samples.
-
-The method returns a dummy value which should be ignored.
-
-If you are using the default ``cvANN_MLP::SIGMOID_SYM`` activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].
-
-CvANN_MLP::get_layer_count
---------------------------
-Returns the number of layers in the MLP.
-
-.. ocv:function:: int CvANN_MLP::get_layer_count()
-
-CvANN_MLP::get_layer_sizes
---------------------------
-Returns numbers of neurons in each layer of the MLP.
-
-.. ocv:function:: const CvMat* CvANN_MLP::get_layer_sizes()
-
-The method returns the integer vector specifying the number of neurons in each layer including the input and output layers of the MLP.
-
-CvANN_MLP::get_weights
-----------------------
-Returns neurons weights of the particular layer.
-
-.. ocv:function:: double* CvANN_MLP::get_weights(int layer)
-
-    :param layer: Index of the particular layer.
+.. ocv:function:: Params ANN_MLP::getParams() const
diff --git a/modules/ml/doc/normal_bayes_classifier.rst b/modules/ml/doc/normal_bayes_classifier.rst
index dbd6ae229c..e3aba21c32 100644
--- a/modules/ml/doc/normal_bayes_classifier.rst
+++ b/modules/ml/doc/normal_bayes_classifier.rst
@@ -9,55 +9,26 @@ This simple classification model assumes that feature vectors from each class ar
 
 .. [Fukunaga90] K. Fukunaga. *Introduction to Statistical Pattern Recognition*. second ed., New York: Academic Press, 1990.
 
-CvNormalBayesClassifier
+NormalBayesClassifier
 -----------------------
-.. ocv:class:: CvNormalBayesClassifier : public CvStatModel
+.. ocv:class:: NormalBayesClassifier : public StatModel
 
 Bayes classifier for normally distributed data.
 
-CvNormalBayesClassifier::CvNormalBayesClassifier
-------------------------------------------------
-Default and training constructors.
+NormalBayesClassifier::create
+-----------------------------
+Creates empty model
 
-.. ocv:function:: CvNormalBayesClassifier::CvNormalBayesClassifier()
+.. ocv:function:: Ptr<NormalBayesClassifier> NormalBayesClassifier::create(const NormalBayesClassifier::Params& params=Params())
 
-.. ocv:function:: CvNormalBayesClassifier::CvNormalBayesClassifier( const Mat& trainData, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat() )
+    :param params: The model parameters. There is none so far, the structure is used as a placeholder for possible extensions.
 
-.. ocv:function:: CvNormalBayesClassifier::CvNormalBayesClassifier( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0 )
+Use ``StatModel::train`` to train the model, ``StatModel::train<NormalBayesClassifier>(traindata, params)`` to create and train the model, ``StatModel::load<NormalBayesClassifier>(filename)`` to load the pre-trained model.
 
-.. ocv:pyfunction:: cv2.NormalBayesClassifier([trainData, responses[, varIdx[, sampleIdx]]]) -> <NormalBayesClassifier object>
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvNormalBayesClassifier::train
-------------------------------
-Trains the model.
-
-.. ocv:function:: bool CvNormalBayesClassifier::train( const Mat& trainData, const Mat& responses, const Mat& varIdx = Mat(), const Mat& sampleIdx=Mat(), bool update=false )
-
-.. ocv:function:: bool CvNormalBayesClassifier::train( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx = 0, const CvMat* sampleIdx=0, bool update=false )
-
-.. ocv:pyfunction:: cv2.NormalBayesClassifier.train(trainData, responses[, varIdx[, sampleIdx[, update]]]) -> retval
-
-    :param update: Identifies whether the model should be trained from scratch (``update=false``) or should be updated using the new training data (``update=true``).
-
-The method trains the Normal Bayes classifier. It follows the conventions of the generic :ocv:func:`CvStatModel::train` approach with the following limitations:
-
-* Only ``CV_ROW_SAMPLE`` data layout is supported.
-* Input variables are all ordered.
-* Output variable is categorical , which means that elements of ``responses`` must be integer numbers, though the vector may have the ``CV_32FC1`` type.
-* Missing measurements are not supported.
-
-CvNormalBayesClassifier::predict
---------------------------------
+NormalBayesClassifier::predictProb
+----------------------------------
 Predicts the response for sample(s).
 
-.. ocv:function:: float CvNormalBayesClassifier::predict(  const Mat& samples,  Mat* results=0, Mat* results_prob=0 ) const
+.. ocv:function:: float NormalBayesClassifier::predictProb( InputArray inputs, OutputArray outputs, OutputArray outputProbs, int flags=0 ) const
 
-.. ocv:function:: float CvNormalBayesClassifier::predict( const CvMat* samples, CvMat* results=0, CvMat* results_prob=0 ) const
-
-.. ocv:pyfunction:: cv2.NormalBayesClassifier.predict(samples) -> retval, results
-
-The method estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``samples``. In case of multiple input vectors, there should be one output vector ``results``. The predicted class for a single input vector is returned by the method. The vector ``results_prob`` contains the output probabilities coresponding to each element of ``result``.
-
-The function is parallelized with the TBB library.
+The method estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``inputs``. In case of multiple input vectors, there should be one output vector ``outputs``. The predicted class for a single input vector is returned by the method. The vector ``outputProbs`` contains the output probabilities corresponding to each element of ``result``.
diff --git a/modules/ml/doc/random_trees.rst b/modules/ml/doc/random_trees.rst
index 8d7911d368..3b851261e9 100644
--- a/modules/ml/doc/random_trees.rst
+++ b/modules/ml/doc/random_trees.rst
@@ -40,179 +40,65 @@ For the random trees usage example, please, see letter_recog.cpp sample in OpenC
 
   * And other articles from the web site http://www.stat.berkeley.edu/users/breiman/RandomForests/cc_home.htm
 
-CvRTParams
-----------
-.. ocv:struct:: CvRTParams : public CvDTreeParams
+RTrees::Params
+--------------
+.. ocv:struct:: RTrees::Params : public DTrees::Params
 
     Training parameters of random trees.
 
 The set of training parameters for the forest is a superset of the training parameters for a single tree. However, random trees do not need all the functionality/features of decision trees. Most noticeably, the trees are not pruned, so the cross-validation parameters are not used.
 
 
-CvRTParams::CvRTParams:
+RTrees::Params::Params
 -----------------------
-The constructors.
+The constructors
 
-.. ocv:function:: CvRTParams::CvRTParams()
+.. ocv:function:: RTrees::Params::Params()
 
-.. ocv:function:: CvRTParams::CvRTParams( int max_depth, int min_sample_count, float regression_accuracy, bool use_surrogates, int max_categories, const float* priors, bool calc_var_importance, int nactive_vars, int max_num_of_trees_in_the_forest, float forest_accuracy, int termcrit_type )
+.. ocv:function:: RTrees::Params::Params( int maxDepth, int minSampleCount, double regressionAccuracy, bool useSurrogates, int maxCategories, const Mat& priors, bool calcVarImportance, int nactiveVars, TermCriteria termCrit )
 
-    :param max_depth: the depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.
+    :param maxDepth: the depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.
 
-    :param min_sample_count: minimum samples required at a leaf node for it to be split. A reasonable value is a small percentage of the total data e.g. 1%.
+    :param minSampleCount: minimum samples required at a leaf node for it to be split. A reasonable value is a small percentage of the total data e.g. 1%.
 
-    :param max_categories: Cluster possible values of a categorical variable into ``K`` :math:`\leq` ``max_categories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``max_categories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including ML) try to find sub-optimal split in this case by clustering all the samples into ``max_categories`` clusters that is some categories are merged together. The clustering is applied only in ``n``>2-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
+    :param maxCategories: Cluster possible values of a categorical variable into ``K <= maxCategories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``max_categories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including ML) try to find sub-optimal split in this case by clustering all the samples into ``maxCategories`` clusters that is some categories are merged together. The clustering is applied only in ``n``>2-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
 
-    :param calc_var_importance: If true then variable importance will be calculated and then it can be retrieved by :ocv:func:`CvRTrees::get_var_importance`.
+    :param calcVarImportance: If true then variable importance will be calculated and then it can be retrieved by ``RTrees::getVarImportance``.
 
-    :param nactive_vars: The size of the randomly selected subset of features at each tree node and that are used to find the best split(s). If you set it to 0 then the size will be set to the square root of the total number of features.
+    :param nactiveVars: The size of the randomly selected subset of features at each tree node and that are used to find the best split(s). If you set it to 0 then the size will be set to the square root of the total number of features.
 
-    :param max_num_of_trees_in_the_forest: The maximum number of trees in the forest (surprise, surprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
-
-    :param forest_accuracy: Sufficient accuracy (OOB error).
-
-    :param termcrit_type: The type of the termination criteria:
-
-        * **CV_TERMCRIT_ITER** Terminate learning by the ``max_num_of_trees_in_the_forest``;
-
-        * **CV_TERMCRIT_EPS** Terminate learning by the ``forest_accuracy``;
-
-        * **CV_TERMCRIT_ITER | CV_TERMCRIT_EPS** Use both termination criteria.
-
-For meaning of other parameters see :ocv:func:`CvDTreeParams::CvDTreeParams`.
+    :param termCrit: The termination criteria that specifies when the training algorithm stops - either when the specified number of trees is trained and added to the ensemble or when sufficient accuracy (measured as OOB error) is achieved. Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
 
 The default constructor sets all parameters to default values which are different from default values of :ocv:class:`CvDTreeParams`:
 
 ::
 
-    CvRTParams::CvRTParams() : CvDTreeParams( 5, 10, 0, false, 10, 0, false, false, 0 ),
-        calc_var_importance(false), nactive_vars(0)
+    RTrees::Params::Params() : DTrees::Params( 5, 10, 0, false, 10, 0, false, false, Mat() ),
+        calcVarImportance(false), nactiveVars(0)
     {
-        term_crit = cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 50, 0.1 );
+        termCrit = cvTermCriteria( TermCriteria::MAX_ITERS + TermCriteria::EPS, 50, 0.1 );
     }
 
 
-CvRTrees
+RTrees
 --------
-.. ocv:class:: CvRTrees : public CvStatModel
+.. ocv:class:: RTrees : public DTrees
 
     The class implements the random forest predictor as described in the beginning of this section.
 
-CvRTrees::train
+RTrees::create
 ---------------
-Trains the Random Trees model.
+Creates the empty model
 
-.. ocv:function:: bool CvRTrees::train( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvRTParams params=CvRTParams() )
+.. ocv:function:: bool RTrees::create(const RTrees::Params& params=Params())
 
-.. ocv:function:: bool CvRTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvRTParams params=CvRTParams() )
+Use ``StatModel::train`` to train the model, ``StatModel::train<RTrees>(traindata, params)`` to create and train the model, ``StatModel::load<RTrees>(filename)`` to load the pre-trained model.
 
-.. ocv:function:: bool CvRTrees::train( CvMLData* data, CvRTParams params=CvRTParams() )
-
-.. ocv:pyfunction:: cv2.RTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]) -> retval
-
-The method :ocv:func:`CvRTrees::train` is very similar to the method :ocv:func:`CvDTree::train` and follows the generic method :ocv:func:`CvStatModel::train` conventions. All the parameters specific to the algorithm training are passed as a :ocv:class:`CvRTParams` instance. The estimate of the training error (``oob-error``) is stored in the protected class member ``oob_error``.
-
-The function is parallelized with the TBB library.
-
-CvRTrees::predict
------------------
-Predicts the output for an input sample.
-
-.. ocv:function:: float CvRTrees::predict( const Mat& sample, const Mat& missing=Mat() ) const
-
-.. ocv:function:: float CvRTrees::predict( const CvMat* sample, const CvMat* missing = 0 ) const
-
-.. ocv:pyfunction:: cv2.RTrees.predict(sample[, missing]) -> retval
-
-    :param sample: Sample for classification.
-
-    :param missing: Optional missing measurement mask of the sample.
-
-The input parameters of the prediction method are the same as in :ocv:func:`CvDTree::predict`  but the return value type is different. This method returns the cumulative result from all the trees in the forest (the class that receives the majority of voices, or the mean of the regression function estimates).
-
-
-CvRTrees::predict_prob
-----------------------
-Returns a fuzzy-predicted class label.
-
-.. ocv:function:: float CvRTrees::predict_prob( const cv::Mat& sample, const cv::Mat& missing = cv::Mat() ) const
-
-.. ocv:function:: float CvRTrees::predict_prob( const CvMat* sample, const CvMat* missing = 0 ) const
-
-.. ocv:pyfunction:: cv2.RTrees.predict_prob(sample[, missing]) -> retval
-
-    :param sample: Sample for classification.
-
-    :param missing: Optional missing measurement mask of the sample.
-
-The function works for binary classification problems only. It returns the number between 0 and 1. This number represents probability or confidence of the sample belonging to the second class. It is calculated as the proportion of decision trees that classified the sample to the second class.
-
-
-CvRTrees::getVarImportance
+RTrees::getVarImportance
 ----------------------------
 Returns the variable importance array.
 
-.. ocv:function:: Mat CvRTrees::getVarImportance()
+.. ocv:function:: Mat RTrees::getVarImportance() const
 
-.. ocv:function:: const CvMat* CvRTrees::get_var_importance()
+The method returns the variable importance vector, computed at the training stage when ``RTParams::calcVarImportance`` is set to true. If this flag was set to false, the empty matrix is returned.
 
-.. ocv:pyfunction:: cv2.RTrees.getVarImportance() -> retval
-
-The method returns the variable importance vector, computed at the training stage when ``CvRTParams::calc_var_importance`` is set to true. If this flag was set to false, the ``NULL`` pointer is returned. This differs from the decision trees where variable importance can be computed anytime after the training.
-
-
-CvRTrees::get_proximity
------------------------
-Retrieves the proximity measure between two training samples.
-
-.. ocv:function:: float CvRTrees::get_proximity( const CvMat* sample1, const CvMat* sample2, const CvMat* missing1 = 0, const CvMat* missing2 = 0 ) const
-
-    :param sample1: The first sample.
-
-    :param sample2: The second sample.
-
-    :param missing1: Optional missing measurement mask of the first sample.
-
-    :param missing2:  Optional missing measurement mask of the second sample.
-
-The method returns proximity measure between any two samples. This is a ratio of those trees in the ensemble, in which the samples fall into the same leaf node, to the total number of the trees.
-
-CvRTrees::calc_error
---------------------
-Returns error of the random forest.
-
-.. ocv:function:: float CvRTrees::calc_error( CvMLData* data, int type, std::vector<float>* resp=0 )
-
-The method is identical to :ocv:func:`CvDTree::calc_error` but uses the random forest as predictor.
-
-
-CvRTrees::get_train_error
--------------------------
-Returns the train error.
-
-.. ocv:function:: float CvRTrees::get_train_error()
-
-The method works for classification problems only. It returns the proportion of incorrectly classified train samples.
-
-
-CvRTrees::get_rng
------------------
-Returns the state of the used random number generator.
-
-.. ocv:function:: CvRNG* CvRTrees::get_rng()
-
-
-CvRTrees::get_tree_count
-------------------------
-Returns the number of trees in the constructed random forest.
-
-.. ocv:function:: int CvRTrees::get_tree_count() const
-
-
-CvRTrees::get_tree
-------------------
-Returns the specific decision tree in the constructed random forest.
-
-.. ocv:function:: CvForestTree* CvRTrees::get_tree(int i) const
-
-    :param i: Index of the decision tree.
diff --git a/modules/ml/doc/support_vector_machines.rst b/modules/ml/doc/support_vector_machines.rst
index 9793bd6e3f..003ec4dc6a 100644
--- a/modules/ml/doc/support_vector_machines.rst
+++ b/modules/ml/doc/support_vector_machines.rst
@@ -14,21 +14,21 @@ SVM implementation in OpenCV is based on [LibSVM]_.
 .. [LibSVM] C.-C. Chang and C.-J. Lin. *LIBSVM: a library for support vector machines*, ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. (http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf)
 
 
-CvParamGrid
+ParamGrid
 -----------
-.. ocv:struct:: CvParamGrid
+.. ocv:class:: ParamGrid
 
   The structure represents the logarithmic grid range of statmodel parameters. It is used for optimizing statmodel accuracy by varying model parameters, the accuracy estimate being computed by cross-validation.
 
-  .. ocv:member:: double CvParamGrid::min_val
+  .. ocv:member:: double ParamGrid::minVal
 
      Minimum value of the statmodel parameter.
 
-  .. ocv:member:: double CvParamGrid::max_val
+  .. ocv:member:: double ParamGrid::maxVal
 
      Maximum value of the statmodel parameter.
 
-  .. ocv:member:: double CvParamGrid::step
+  .. ocv:member:: double ParamGrid::logStep
 
      Logarithmic step for iterating the statmodel parameter.
 
@@ -36,88 +36,78 @@ The grid determines the following iteration sequence of the statmodel parameter
 
 .. math::
 
-    (min\_val, min\_val*step, min\_val*{step}^2, \dots,  min\_val*{step}^n),
+    (minVal, minVal*step, minVal*{step}^2, \dots,  minVal*{logStep}^n),
 
 where :math:`n` is the maximal index satisfying
 
 .. math::
 
-    \texttt{min\_val} * \texttt{step} ^n <  \texttt{max\_val}
+    \texttt{minVal} * \texttt{logStep} ^n <  \texttt{maxVal}
 
-The grid is logarithmic, so ``step`` must always be greater then 1.
+The grid is logarithmic, so ``logStep`` must always be greater then 1.
 
-CvParamGrid::CvParamGrid
+ParamGrid::ParamGrid
 ------------------------
 The constructors.
 
-.. ocv:function:: CvParamGrid::CvParamGrid()
+.. ocv:function:: ParamGrid::ParamGrid()
 
-.. ocv:function:: CvParamGrid::CvParamGrid( double min_val, double max_val, double log_step )
+.. ocv:function:: ParamGrid::ParamGrid( double minVal, double maxVal, double logStep )
 
 The full constructor initializes corresponding members. The default constructor creates a dummy grid:
 
 ::
 
-    CvParamGrid::CvParamGrid()
+    ParamGrid::ParamGrid()
     {
-        min_val = max_val = step = 0;
+        minVal = maxVal = 0;
+        logStep = 1;
     }
 
-CvParamGrid::check
-------------------
-Checks validness of the grid.
 
-.. ocv:function:: bool CvParamGrid::check()
-
-Returns ``true`` if the grid is valid and ``false`` otherwise. The grid is valid if and only if:
-
-* Lower bound of the grid is less then the upper one.
-* Lower bound of the grid is positive.
-* Grid step is greater then 1.
-
-CvSVMParams
+SVM::Params
 -----------
-.. ocv:struct:: CvSVMParams
+.. ocv:class:: SVM::Params
 
 SVM training parameters.
 
-The structure must be initialized and passed to the training method of :ocv:class:`CvSVM`.
+The structure must be initialized and passed to the training method of :ocv:class:`SVM`.
 
-CvSVMParams::CvSVMParams
+SVM::Params::Params
 ------------------------
-The constructors.
+The constructors
 
-.. ocv:function:: CvSVMParams::CvSVMParams()
+.. ocv:function:: SVM::Params::Params()
 
-.. ocv:function:: CvSVMParams::CvSVMParams( int svm_type, int kernel_type, double degree, double gamma, double coef0, double Cvalue, double nu, double p, CvMat* class_weights, CvTermCriteria term_crit )
+.. ocv:function:: SVM::Params::Params( int svmType, int kernelType, double degree, double gamma, double coef0, double Cvalue, double nu, double p, const Mat& classWeights, TermCriteria termCrit )
 
-    :param svm_type: Type of a SVM formulation. Possible values are:
+    :param svmType: Type of a SVM formulation. Possible values are:
 
-        * **CvSVM::C_SVC** C-Support Vector Classification. ``n``-class classification (``n`` :math:`\geq` 2), allows imperfect separation of classes with penalty multiplier ``C`` for outliers.
+        * **SVM::C_SVC** C-Support Vector Classification. ``n``-class classification (``n`` :math:`\geq` 2), allows imperfect separation of classes with penalty multiplier ``C`` for outliers.
 
-        * **CvSVM::NU_SVC** :math:`\nu`-Support Vector Classification. ``n``-class classification with possible imperfect separation. Parameter :math:`\nu`  (in the range 0..1, the larger the value, the smoother the decision boundary) is used instead of ``C``.
+        * **SVM::NU_SVC** :math:`\nu`-Support Vector Classification. ``n``-class classification with possible imperfect separation. Parameter :math:`\nu`  (in the range 0..1, the larger the value, the smoother the decision boundary) is used instead of ``C``.
 
-        * **CvSVM::ONE_CLASS** Distribution Estimation (One-class SVM). All the training data are from the same class, SVM builds a boundary that separates the class from the rest of the feature space.
+        * **SVM::ONE_CLASS** Distribution Estimation (One-class SVM). All the training data are from the same class, SVM builds a boundary that separates the class from the rest of the feature space.
 
-        * **CvSVM::EPS_SVR** :math:`\epsilon`-Support Vector Regression. The distance between feature vectors from the training set and the fitting hyper-plane must be less than ``p``. For outliers the penalty multiplier ``C`` is used.
+        * **SVM::EPS_SVR** :math:`\epsilon`-Support Vector Regression. The distance between feature vectors from the training set and the fitting hyper-plane must be less than ``p``. For outliers the penalty multiplier ``C`` is used.
 
-        * **CvSVM::NU_SVR** :math:`\nu`-Support Vector Regression. :math:`\nu` is used instead of ``p``.
+        * **SVM::NU_SVR** :math:`\nu`-Support Vector Regression. :math:`\nu` is used instead of ``p``.
 
         See [LibSVM]_ for details.
 
-    :param kernel_type: Type of a SVM kernel. Possible values are:
+    :param kernelType: Type of a SVM kernel. Possible values are:
 
-        * **CvSVM::LINEAR** Linear kernel. No mapping is done, linear discrimination (or regression) is done in the original feature space. It is the fastest option. :math:`K(x_i, x_j) = x_i^T x_j`.
+        * **SVM::LINEAR** Linear kernel. No mapping is done, linear discrimination (or regression) is done in the original feature space. It is the fastest option. :math:`K(x_i, x_j) = x_i^T x_j`.
 
-        * **CvSVM::POLY** Polynomial kernel: :math:`K(x_i, x_j) = (\gamma x_i^T x_j + coef0)^{degree}, \gamma > 0`.
+        * **SVM::POLY** Polynomial kernel: :math:`K(x_i, x_j) = (\gamma x_i^T x_j + coef0)^{degree}, \gamma > 0`.
 
-        * **CvSVM::RBF** Radial basis function (RBF), a good choice in most cases. :math:`K(x_i, x_j) = e^{-\gamma ||x_i - x_j||^2}, \gamma > 0`.
+        * **SVM::RBF** Radial basis function (RBF), a good choice in most cases. :math:`K(x_i, x_j) = e^{-\gamma ||x_i - x_j||^2}, \gamma > 0`.
 
-        * **CvSVM::SIGMOID** Sigmoid kernel: :math:`K(x_i, x_j) = \tanh(\gamma x_i^T x_j + coef0)`.
+        * **SVM::SIGMOID** Sigmoid kernel: :math:`K(x_i, x_j) = \tanh(\gamma x_i^T x_j + coef0)`.
 
-        * **CvSVM::CHI2** Exponential Chi2 kernel, similar to the RBF kernel: :math:`K(x_i, x_j) = e^{-\gamma \chi^2(x_i,x_j)}, \chi^2(x_i,x_j) = (x_i-x_j)^2/(x_i+x_j), \gamma > 0`.
+        * **SVM::CHI2** Exponential Chi2 kernel, similar to the RBF kernel: :math:`K(x_i, x_j) = e^{-\gamma \chi^2(x_i,x_j)}, \chi^2(x_i,x_j) = (x_i-x_j)^2/(x_i+x_j), \gamma > 0`.
 
-        * **CvSVM::INTER** Histogram intersection kernel. A fast kernel. :math:`K(x_i, x_j) = min(x_i,x_j)`.
+        * **SVM::INTER** Histogram intersection kernel. A fast kernel. :math:`K(x_i, x_j) = min(x_i,x_j)`.
 
     :param degree: Parameter ``degree`` of a kernel function (POLY).
 
@@ -131,19 +121,19 @@ The constructors.
 
     :param p: Parameter :math:`\epsilon` of a SVM optimization problem (EPS_SVR).
 
-    :param class_weights: Optional weights in the C_SVC problem , assigned to particular classes. They are multiplied by ``C`` so the parameter ``C`` of class ``#i`` becomes :math:`class\_weights_i * C`. Thus these weights affect the misclassification penalty for different classes. The larger weight, the larger penalty on misclassification of data from the corresponding class.
+    :param classWeights: Optional weights in the C_SVC problem , assigned to particular classes. They are multiplied by ``C`` so the parameter ``C`` of class ``#i`` becomes ``classWeights(i) * C``. Thus these weights affect the misclassification penalty for different classes. The larger weight, the larger penalty on misclassification of data from the corresponding class.
 
-    :param term_crit: Termination criteria of the iterative SVM training procedure which solves a partial case of constrained quadratic optimization problem. You can specify tolerance and/or the maximum number of iterations.
+    :param termCrit: Termination criteria of the iterative SVM training procedure which solves a partial case of constrained quadratic optimization problem. You can specify tolerance and/or the maximum number of iterations.
 
 The default constructor initialize the structure with following values:
 
 ::
 
-    CvSVMParams::CvSVMParams() :
-        svm_type(CvSVM::C_SVC), kernel_type(CvSVM::RBF), degree(0),
-        gamma(1), coef0(0), C(1), nu(0), p(0), class_weights(0)
+    SVMParams::SVMParams() :
+        svmType(SVM::C_SVC), kernelType(SVM::RBF), degree(0),
+        gamma(1), coef0(0), C(1), nu(0), p(0), classWeights(0)
     {
-        term_crit = cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, FLT_EPSILON );
+        termCrit = TermCriteria( TermCriteria::MAX_ITER+TermCriteria::EPS, 1000, FLT_EPSILON );
     }
 
 A comparison of different kernels on the following 2D test case with four classes. Four C_SVC SVMs have been trained (one against rest) with auto_train. Evaluation on three different kernels (CHI2, INTER, RBF). The color depicts the class with max score. Bright means max-score > 0, dark means max-score < 0.
@@ -151,10 +141,9 @@ A comparison of different kernels on the following 2D test case with four classe
 .. image:: pics/SVM_Comparison.png
 
 
-
-CvSVM
+SVM
 -----
-.. ocv:class:: CvSVM : public CvStatModel
+.. ocv:class:: SVM : public StatModel
 
 Support Vector Machines.
 
@@ -164,55 +153,27 @@ Support Vector Machines.
    * (Python) An example of grid search digit recognition using SVM can be found at opencv_source/samples/python2/digits_adjust.py
    * (Python) An example of video digit recognition using SVM can be found at opencv_source/samples/python2/digits_video.py
 
-CvSVM::CvSVM
+SVM::create
 ------------
-Default and training constructors.
+Creates empty model
 
-.. ocv:function:: CvSVM::CvSVM()
+.. ocv:function:: Ptr<SVM> SVM::create(const Params& p=Params(), const Ptr<Kernel>& customKernel=Ptr<Kernel>())
 
-.. ocv:function:: CvSVM::CvSVM( const Mat& trainData, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), CvSVMParams params=CvSVMParams() )
+    :param p: SVM parameters
+    :param customKernel: the optional custom kernel to use. It must implement ``SVM::Kernel`` interface.
 
-.. ocv:function:: CvSVM::CvSVM( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, CvSVMParams params=CvSVMParams() )
-
-.. ocv:pyfunction:: cv2.SVM([trainData, responses[, varIdx[, sampleIdx[, params]]]]) -> <SVM object>
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvSVM::train
-------------
-Trains an SVM.
-
-.. ocv:function:: bool CvSVM::train( const Mat& trainData, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), CvSVMParams params=CvSVMParams() )
-
-.. ocv:function:: bool CvSVM::train( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, CvSVMParams params=CvSVMParams() )
-
-.. ocv:pyfunction:: cv2.SVM.train(trainData, responses[, varIdx[, sampleIdx[, params]]]) -> retval
-
-The method trains the SVM model. It follows the conventions of the generic :ocv:func:`CvStatModel::train` approach with the following limitations:
-
-* Only the ``CV_ROW_SAMPLE`` data layout is supported.
-
-* Input variables are all ordered.
-
-* Output variables can be either categorical (``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC``), or ordered (``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR``), or not required at all (``params.svm_type=CvSVM::ONE_CLASS``).
-
-* Missing measurements are not supported.
-
-All the other parameters are gathered in the
-:ocv:class:`CvSVMParams` structure.
+Use ``StatModel::train`` to train the model, ``StatModel::train<RTrees>(traindata, params)`` to create and train the model, ``StatModel::load<RTrees>(filename)`` to load the pre-trained model. Since SVM has several parameters, you may want to find the best parameters for your problem. It can be done with ``SVM::trainAuto``.
 
 
-CvSVM::train_auto
+SVM::trainAuto
 -----------------
 Trains an SVM with optimal parameters.
 
-.. ocv:function:: bool CvSVM::train_auto( const Mat& trainData, const Mat& responses, const Mat& varIdx, const Mat& sampleIdx, CvSVMParams params, int k_fold = 10, CvParamGrid Cgrid = CvSVM::get_default_grid(CvSVM::C), CvParamGrid gammaGrid = CvSVM::get_default_grid(CvSVM::GAMMA), CvParamGrid pGrid = CvSVM::get_default_grid(CvSVM::P), CvParamGrid nuGrid  = CvSVM::get_default_grid(CvSVM::NU), CvParamGrid coeffGrid = CvSVM::get_default_grid(CvSVM::COEF), CvParamGrid degreeGrid = CvSVM::get_default_grid(CvSVM::DEGREE), bool balanced=false)
+.. ocv:function:: bool SVM::trainAuto( const Ptr<TrainData>& data, int kFold = 10, ParamGrid Cgrid = SVM::getDefaultGrid(SVM::C), ParamGrid gammaGrid  = SVM::getDefaultGrid(SVM::GAMMA), ParamGrid pGrid = SVM::getDefaultGrid(SVM::P), ParamGrid nuGrid = SVM::getDefaultGrid(SVM::NU), ParamGrid coeffGrid = SVM::getDefaultGrid(SVM::COEF), ParamGrid degreeGrid = SVM::getDefaultGrid(SVM::DEGREE), bool balanced=false)
 
-.. ocv:function:: bool CvSVM::train_auto( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx, const CvMat* sampleIdx, CvSVMParams params, int kfold = 10, CvParamGrid Cgrid = get_default_grid(CvSVM::C), CvParamGrid gammaGrid = get_default_grid(CvSVM::GAMMA), CvParamGrid pGrid = get_default_grid(CvSVM::P), CvParamGrid nuGrid = get_default_grid(CvSVM::NU), CvParamGrid coeffGrid = get_default_grid(CvSVM::COEF), CvParamGrid degreeGrid = get_default_grid(CvSVM::DEGREE), bool balanced=false )
+    :param data: the training data that can be constructed using ``TrainData::create`` or ``TrainData::loadFromCSV``.
 
-.. ocv:pyfunction:: cv2.SVM.train_auto(trainData, responses, varIdx, sampleIdx, params[, k_fold[, Cgrid[, gammaGrid[, pGrid[, nuGrid[, coeffGrid[, degreeGrid[, balanced]]]]]]]]) -> retval
-
-    :param k_fold: Cross-validation parameter. The training set is divided into ``k_fold`` subsets. One subset is used to test the model, the others form the train set. So, the SVM algorithm is executed ``k_fold`` times.
+    :param kFold: Cross-validation parameter. The training set is divided into ``kFold`` subsets. One subset is used to test the model, the others form the train set. So, the SVM algorithm is executed ``kFold`` times.
 
     :param \*Grid: Iteration grid for the corresponding SVM parameter.
 
@@ -220,97 +181,76 @@ Trains an SVM with optimal parameters.
 
 The method trains the SVM model automatically by choosing the optimal
 parameters ``C``, ``gamma``, ``p``, ``nu``, ``coef0``, ``degree`` from
-:ocv:class:`CvSVMParams`. Parameters are considered optimal
+:ocv:class:`SVMParams`. Parameters are considered optimal
 when the cross-validation estimate of the test set error
 is minimal.
 
-If there is no need to optimize a parameter, the corresponding grid step should be set to any value less than or equal to 1. For example, to avoid optimization in ``gamma``, set ``gamma_grid.step = 0``, ``gamma_grid.min_val``, ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma``.
+If there is no need to optimize a parameter, the corresponding grid step should be set to any value less than or equal to 1. For example, to avoid optimization in ``gamma``, set ``gammaGrid.step = 0``, ``gammaGrid.minVal``, ``gamma_grid.maxVal`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma``.
 
 And, finally, if the optimization in a parameter is required but
-the corresponding grid is unknown, you may call the function :ocv:func:`CvSVM::get_default_grid`. To generate a grid, for example, for ``gamma``, call ``CvSVM::get_default_grid(CvSVM::GAMMA)``.
+the corresponding grid is unknown, you may call the function :ocv:func:`SVM::getDefaulltGrid`. To generate a grid, for example, for ``gamma``, call ``SVM::getDefaulltGrid(SVM::GAMMA)``.
 
 This function works for the classification
-(``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC``)
+(``params.svmType=SVM::C_SVC`` or ``params.svmType=SVM::NU_SVC``)
 as well as for the regression
-(``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR``). If ``params.svm_type=CvSVM::ONE_CLASS``, no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
-
-CvSVM::predict
---------------
-Predicts the response for input sample(s).
-
-.. ocv:function:: float CvSVM::predict( const Mat& sample, bool returnDFVal=false ) const
-
-.. ocv:function:: float CvSVM::predict( const CvMat* sample, bool returnDFVal=false ) const
-
-.. ocv:function:: float CvSVM::predict( const CvMat* samples, CvMat* results, bool returnDFVal=false ) const
-
-.. ocv:pyfunction:: cv2.SVM.predict(sample[, returnDFVal]) -> retval
-
-.. ocv:pyfunction:: cv2.SVM.predict_all(samples[, results]) -> results
-
-    :param sample: Input sample for prediction.
-
-    :param samples: Input samples for prediction.
-
-    :param returnDFVal: Specifies a type of the return value. If ``true`` and the problem is 2-class classification then the method returns the decision function value that is signed distance to the margin, else the function returns a class label (classification) or estimated function value (regression).
-
-    :param results: Output prediction responses for corresponding samples.
-
-If you pass one sample then prediction result is returned. If you want to get responses for several samples then you should pass the ``results`` matrix where prediction results will be stored.
-
-The function is parallelized with the TBB library.
+(``params.svmType=SVM::EPS_SVR`` or ``params.svmType=SVM::NU_SVR``). If ``params.svmType=SVM::ONE_CLASS``, no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
 
 
-CvSVM::get_default_grid
+SVM::getDefaulltGrid
 -----------------------
 Generates a grid for SVM parameters.
 
-.. ocv:function:: CvParamGrid CvSVM::get_default_grid( int param_id )
+.. ocv:function:: ParamGrid SVM::getDefaulltGrid( int param_id )
 
     :param param_id: SVM parameters IDs that must be one of the following:
 
-            * **CvSVM::C**
+            * **SVM::C**
 
-            * **CvSVM::GAMMA**
+            * **SVM::GAMMA**
 
-            * **CvSVM::P**
+            * **SVM::P**
 
-            * **CvSVM::NU**
+            * **SVM::NU**
 
-            * **CvSVM::COEF**
+            * **SVM::COEF**
 
-            * **CvSVM::DEGREE**
+            * **SVM::DEGREE**
 
         The grid is generated for the parameter with this ID.
 
-The function generates a grid for the specified parameter of the SVM algorithm. The grid may be passed to the function :ocv:func:`CvSVM::train_auto`.
+The function generates a grid for the specified parameter of the SVM algorithm. The grid may be passed to the function :ocv:func:`SVM::trainAuto`.
 
-CvSVM::get_params
+SVM::getParams
 -----------------
 Returns the current SVM parameters.
 
-.. ocv:function:: CvSVMParams CvSVM::get_params() const
+.. ocv:function:: SVM::Params SVM::getParams() const
 
-This function may be used to get the optimal parameters obtained while automatically training :ocv:func:`CvSVM::train_auto`.
+This function may be used to get the optimal parameters obtained while automatically training :ocv:func:`SVM::train_auto`.
 
-CvSVM::get_support_vector
+SVM::getSupportVectors
 --------------------------
-Retrieves a number of support vectors and the particular vector.
+Retrieves all the support vectors
 
-.. ocv:function:: int CvSVM::get_support_vector_count() const
+.. ocv:function:: Mat SVM::getSupportVectors() const
 
-.. ocv:function:: const float* CvSVM::get_support_vector(int i) const
+The method returns all the support vector as floating-point matrix, where support vectors are stored as matrix rows.
 
-.. ocv:pyfunction:: cv2.SVM.get_support_vector_count() -> retval
+SVM::getDecisionFunction
+--------------------------
+Retrieves the decision function
 
-    :param i: Index of the particular support vector.
+.. ocv:function:: double SVM::getDecisionFunction(int i, OutputArray alpha, OutputArray svidx) const
 
-The methods can be used to retrieve a set of support vectors.
-
-CvSVM::get_var_count
+    :param i: the index of the decision function. If the problem solved is regression, 1-class or 2-class classification, then there will be just one decision function and the index should always be 0. Otherwise, in the case of N-class classification, there will be N*(N-1)/2 decision functions.
+    
+    :param alpha: the optional output vector for weights, corresponding to different support vectors. In the case of linear SVM all the alpha's will be 1's.
+    
+    :param svidx: the optional output vector of indices of support vectors within the matrix of support vectors (which can be retrieved by ``SVM::getSupportVectors``). In the case of linear SVM each decision function consists of a single "compressed" support vector.
+    
+The method returns ``rho`` parameter of the decision function, a scalar subtracted from the weighted sum of kernel responses.
+    
+Prediction with SVM
 --------------------
-Returns the number of used features (variables count).
 
-.. ocv:function:: int CvSVM::get_var_count() const
-
-.. ocv:pyfunction:: cv2.SVM.get_var_count() -> retval
+StatModel::predict(samples, results, flags) should be used. Pass ``flags=StatModel::RAW_OUTPUT`` to get the raw response from SVM (in the case of regression, 1-class or 2-class classification problem).