Merge pull request #16919 from alalek:backport_16860

2025-06-10 11:03:03 +08:00 · 2020-03-27 16:44:05 +00:00 · 2020-03-27 16:44:05 +00:00 · 222a48577f
commit 222a48577f
parent 353273579b 2740901378
1 changed files with 531 additions and 222 deletions
--- a/modules/calib3d/include/opencv2/calib3d.hpp
+++ b/modules/calib3d/include/opencv2/calib3d.hpp
@ -51,12 +51,136 @@
 /**
  @defgroup calib3d Camera Calibration and 3D Reconstruction
-The functions in this section use a so-called pinhole camera model. In this model, a scene view is
+The functions in this section use a so-called pinhole camera model. The view of a scene
-formed by projecting 3D points into the image plane using a perspective transformation.
+is obtained by projecting a scene's 3D point \f$P_w\f$ into the image plane using a perspective
 transformation which forms the corresponding pixel \f$p\f$. Both \f$P_w\f$ and \f$p\f$ are
 represented in homogeneous coordinates, i.e. as 3D and 2D homogeneous vector respectively. You will
 find a brief introduction to projective geometry, homogeneous vectors and homogeneous
 transformations at the end of this section's introduction. For more succinct notation, we often drop
 the 'homogeneous' and say vector instead of homogeneous vector.
-\f[s  \; m' = A [R|t] M'\f]
+The distortion-free projective transformation given by a  pinhole camera model is shown below.
-or
+\f[s \; p = A \begin{bmatrix} R|t \end{bmatrix} P_w,\f]
 where \f$P_w\f$ is a 3D point expressed with respect to the world coordinate system,
 \f$p\f$ is a 2D pixel in the image plane, \f$A\f$ is the intrinsic camera matrix,
 \f$R\f$ and \f$t\f$ are the rotation and translation that describe the change of coordinates from
 world to camera coordinate systems (or camera frame) and \f$s\f$ is the projective transformation's
 arbitrary scaling and not part of the camera model.
 The intrinsic camera matrix \f$A\f$ (notation used as in @cite Zhang2000 and also generally notated
 as \f$K\f$) projects 3D points given in the camera coordinate system to 2D pixel coordinates, i.e.
 \f[p = A P_c.\f]
 The camera matrix \f$A\f$ is composed of the focal lengths \f$f_x\f$ and \f$f_y\f$, which are
 expressed in pixel units, and the principal point \f$(c_x, c_y)\f$, that is usually close to the
 image center:
 \f[A = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1},\f]
 and thus
 \f[s \vecthree{u}{v}{1} = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1} \vecthree{X_c}{Y_c}{Z_c}.\f]
 The matrix of intrinsic parameters does not depend on the scene viewed. So, once estimated, it can
 be re-used as long as the focal length is fixed (in case of a zoom lens). Thus, if an image from the
 camera is scaled by a factor, all of these parameters need to be scaled (multiplied/divided,
 respectively) by the same factor.
 The joint rotation-translation matrix \f$[R|t]\f$ is the matrix product of a projective
 transformation and a homogeneous transformation. The 3-by-4 projective transformation maps 3D points
 represented in camera coordinates to 2D poins in the image plane and represented in normalized
 camera coordinates \f$x' = X_c / Z_c\f$ and \f$y' = Y_c / Z_c\f$:
 \f[Z_c \begin{bmatrix}
 x' \\
 y' \\
 1
 \end{bmatrix} = \begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 1 & 0
 \end{bmatrix}
 \begin{bmatrix}
 X_c \\
 Y_c \\
 Z_c \\
 1
 \end{bmatrix}.\f]
 The homogeneous transformation is encoded by the extrinsic parameters \f$R\f$ and \f$t\f$ and
 represents the change of basis from world coordinate system \f$w\f$ to the camera coordinate sytem
 \f$c\f$. Thus, given the representation of the point \f$P\f$ in world coordinates, \f$P_w\f$, we
 obtain \f$P\f$'s representation in the camera coordinate system, \f$P_c\f$, by
 \f[P_c = \begin{bmatrix}
 R & t \\
 0 & 1
 \end{bmatrix} P_w,\f]
 This homogeneous transformation is composed out of \f$R\f$, a 3-by-3 rotation matrix, and \f$t\f$, a
 3-by-1 translation vector:
 \f[\begin{bmatrix}
 R & t \\
 0 & 1
 \end{bmatrix} = \begin{bmatrix}
 r_{11} & r_{12} & r_{13} & t_x \\
 r_{21} & r_{22} & r_{23} & t_y \\
 r_{31} & r_{32} & r_{33} & t_z \\
 0 & 0 & 0 & 1
 \end{bmatrix},
 \f]
 and therefore
 \f[\begin{bmatrix}
 X_c \\
 Y_c \\
 Z_c \\
 1
 \end{bmatrix} = \begin{bmatrix}
 r_{11} & r_{12} & r_{13} & t_x \\
 r_{21} & r_{22} & r_{23} & t_y \\
 r_{31} & r_{32} & r_{33} & t_z \\
 0 & 0 & 0 & 1
 \end{bmatrix}
 \begin{bmatrix}
 X_w \\
 Y_w \\
 Z_w \\
 1
 \end{bmatrix}.\f]
 Combining the projective transformation and the homogeneous transformation, we obtain the projective
 transformation that maps 3D points in world coordinates into 2D points in the image plane and in
 normalized camera coordinates:
 \f[Z_c \begin{bmatrix}
 x' \\
 y' \\
 1
 \end{bmatrix} = \begin{bmatrix} R|t \end{bmatrix} \begin{bmatrix}
 X_w \\
 Y_w \\
 Z_w \\
 1
 \end{bmatrix} = \begin{bmatrix}
 r_{11} & r_{12} & r_{13} & t_x \\
 r_{21} & r_{22} & r_{23} & t_y \\
 r_{31} & r_{32} & r_{33} & t_z
 \end{bmatrix}
 \begin{bmatrix}
 X_w \\
 Y_w \\
 Z_w \\
 1
 \end{bmatrix},\f]
 with \f$x' = X_c / Z_c\f$ and \f$y' = Y_c / Z_c\f$. Putting the equations for instrincs and extrinsics together, we can write out
 \f$s \; p = A \begin{bmatrix} R|t \end{bmatrix} P_w\f$ as
 \f[s \vecthree{u}{v}{1} = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1}
 \begin{bmatrix}
@ -69,62 +193,81 @@ X_w \\
 Y_w \\
 Z_w \\
 1
 \end{bmatrix}.\f]
 If \f$Z_c \ne 0\f$, the transformation above is equivalent to the following,
 \f[\begin{bmatrix}
 u \\
 v
 \end{bmatrix} = \begin{bmatrix}
 f_x X_c/Z_c + c_x \\
 f_y Y_c/Z_c + c_y
 \end{bmatrix}\f]
-where:
+with
-   \f$(X_w, Y_w, Z_w)\f$ are the coordinates of a 3D point in the world coordinate space
+\f[\vecthree{X_c}{Y_c}{Z_c} = \begin{bmatrix}
-   \f$(u, v)\f$ are the coordinates of the projection point in pixels
+R|t
-   \f$A\f$ is a camera matrix, or a matrix of intrinsic parameters
+\end{bmatrix} \begin{bmatrix}
-   \f$(c_x, c_y)\f$ is a principal point that is usually at the image center
+X_w \\
-   \f$f_x, f_y\f$ are the focal lengths expressed in pixel units.
+Y_w \\
-
+Z_w \\
-Thus, if an image from the camera is scaled by a factor, all of these parameters should be scaled
+1
-(multiplied/divided, respectively) by the same factor. The matrix of intrinsic parameters does not
+\end{bmatrix}.\f]
 depend on the scene viewed. So, once estimated, it can be re-used as long as the focal length is
 fixed (in case of zoom lens). The joint rotation-translation matrix \f$[R|t]\f$ is called a matrix of
 extrinsic parameters. It is used to describe the camera motion around a static scene, or vice versa,
 rigid motion of an object in front of a still camera. That is, \f$[R|t]\f$ translates coordinates of a
 world point \f$(X_w, Y_w, Z_w)\f$ to a coordinate system, fixed with respect to the camera.
 The transformation above is equivalent to the following (when \f$z \ne 0\f$ ):
 \f[\begin{array}{l}
 \vecthree{X_c}{Y_c}{Z_c} = R  \vecthree{X_w}{Y_w}{Z_w} + t \\
 x' = X_c/Z_c \\
 y' = Y_c/Z_c \\
 u = f_x \times x' + c_x \\
 v = f_y \times y' + c_y
 \end{array}\f]
 The following figure illustrates the pinhole camera model.
 ![Pinhole camera model](pics/pinhole_camera_model.png)
-Real lenses usually have some distortion, mostly radial distortion and slight tangential distortion.
+Real lenses usually have some distortion, mostly radial distortion, and slight tangential distortion.
 So, the above model is extended as:
-\f[\begin{array}{l}
+\f[\begin{bmatrix}
-\vecthree{X_c}{Y_c}{Z_c} = R  \vecthree{X_w}{Y_w}{Z_w} + t \\
+u \\
-x' = X_c/Z_c \\
+v
-y' = Y_c/Z_c \\
+\end{bmatrix} = \begin{bmatrix}
-x'' = x'  \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + 2 p_1 x' y' + p_2(r^2 + 2 x'^2) + s_1 r^2 + s_2 r^4 \\
+f_x x'' + c_x \\
-y'' = y'  \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + p_1 (r^2 + 2 y'^2) + 2 p_2 x' y' + s_3 r^2 + s_4 r^4 \\
+f_y y'' + c_y
-\text{where} \quad r^2 = x'^2 + y'^2  \\
+\end{bmatrix}\f]
 u = f_x \times x'' + c_x \\
 v = f_y \times y'' + c_y
 \end{array}\f]
-\f$k_1\f$, \f$k_2\f$, \f$k_3\f$, \f$k_4\f$, \f$k_5\f$, and \f$k_6\f$ are radial distortion coefficients. \f$p_1\f$ and \f$p_2\f$ are
+where
-tangential distortion coefficients. \f$s_1\f$, \f$s_2\f$, \f$s_3\f$, and \f$s_4\f$, are the thin prism distortion
+
-coefficients. Higher-order coefficients are not considered in OpenCV.
+\f[\begin{bmatrix}
 x'' \\
 y''
 \end{bmatrix} = \begin{bmatrix}
 x' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + 2 p_1 x' y' + p_2(r^2 + 2 x'^2) + s_1 r^2 + s_2 r^4 \\
 y' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + p_1 (r^2 + 2 y'^2) + 2 p_2 x' y' + s_3 r^2 + s_4 r^4 \\
 \end{bmatrix}\f]
 with
 \f[r^2 = x'^2 + y'^2\f]
 and
 \f[\begin{bmatrix}
 x'\\
 y'
 \end{bmatrix} = \begin{bmatrix}
 X_c/Z_c \\
 Y_c/Z_c
 \end{bmatrix},\f]
 if \f$Z_c \ne 0\f$.
 The distortion parameters are the radial coefficients \f$k_1\f$, \f$k_2\f$, \f$k_3\f$, \f$k_4\f$, \f$k_5\f$, and \f$k_6\f$
 ,\f$p_1\f$ and \f$p_2\f$ are the tangential distortion coefficients, and \f$s_1\f$, \f$s_2\f$, \f$s_3\f$, and \f$s_4\f$,
 are the thin prism distortion coefficients. Higher-order coefficients are not considered in OpenCV.
 The next figures show two common types of radial distortion: barrel distortion
 (\f$ 1 + k_1 r^2 + k_2 r^4 + k_3 r^6 \f$ monotonically decreasing)
 and pincushion distortion (\f$ 1 + k_1 r^2 + k_2 r^4 + k_3 r^6 \f$ monotonically increasing).
 Radial distortion is always monotonic for real lenses,
-and if the estimator produces a non monotonic result,
+and if the estimator produces a non-monotonic result,
 this should be considered a calibration failure.
-More generally, radial distortion must be monotonic and the distortion function, must be bijective.
+More generally, radial distortion must be monotonic and the distortion function must be bijective.
 A failed estimation result may look deceptively good near the image center
 but will work poorly in e.g. AR/SFM applications.
 The optimization method used in OpenCV camera calibration does not include these constraints as
@ -134,22 +277,28 @@ See [issue #15992](https://github.com/opencv/opencv/issues/15992) for additional
 ![](pics/distortion_examples.png)
 ![](pics/distortion_examples2.png)
-In some cases the image sensor may be tilted in order to focus an oblique plane in front of the
+In some cases, the image sensor may be tilted in order to focus an oblique plane in front of the
 camera (Scheimpflug principle). This can be useful for particle image velocimetry (PIV) or
 triangulation with a laser fan. The tilt causes a perspective distortion of \f$x''\f$ and
-\f$y''\f$. This distortion can be modelled in the following way, see e.g. @cite Louhichi07.
+\f$y''\f$. This distortion can be modeled in the following way, see e.g. @cite Louhichi07.
-\f[\begin{array}{l}
+\f[\begin{bmatrix}
-s\vecthree{x'''}{y'''}{1} =
+u \\
 v
 \end{bmatrix} = \begin{bmatrix}
 f_x x''' + c_x \\
 f_y y''' + c_y
 \end{bmatrix},\f]
 where
 \f[s\vecthree{x'''}{y'''}{1} =
 \vecthreethree{R_{33}(\tau_x, \tau_y)}{0}{-R_{13}(\tau_x, \tau_y)}
 {0}{R_{33}(\tau_x, \tau_y)}{-R_{23}(\tau_x, \tau_y)}
-{0}{0}{1} R(\tau_x, \tau_y) \vecthree{x''}{y''}{1}\\
+{0}{0}{1} R(\tau_x, \tau_y) \vecthree{x''}{y''}{1}\f]
 u = f_x \times x''' + c_x \\
 v = f_y \times y''' + c_y
 \end{array}\f]
-where the matrix \f$R(\tau_x, \tau_y)\f$ is defined by two rotations with angular parameter \f$\tau_x\f$
+and the matrix \f$R(\tau_x, \tau_y)\f$ is defined by two rotations with angular parameter
-and \f$\tau_y\f$, respectively,
+\f$\tau_x\f$ and \f$\tau_y\f$, respectively,
 \f[
 R(\tau_x, \tau_y) =
@ -168,8 +317,8 @@ vector. That is, if the vector contains four elements, it means that \f$k_3=0\f$
 coefficients do not depend on the scene viewed. Thus, they also belong to the intrinsic camera
 parameters. And they remain the same regardless of the captured image resolution. If, for example, a
 camera has been calibrated on images of 320 x 240 resolution, absolutely the same distortion
-coefficients can be used for 640 x 480 images from the same camera while \f$f_x\f$, \f$f_y\f$, \f$c_x\f$, and
+coefficients can be used for 640 x 480 images from the same camera while \f$f_x\f$, \f$f_y\f$,
-\f$c_y\f$ need to be scaled appropriately.
+\f$c_x\f$, and \f$c_y\f$ need to be scaled appropriately.
 The functions below use the above model to do the following:
@ -181,8 +330,63 @@ pattern (every view is described by several 3D-2D point correspondences).
 -   Estimate the relative position and orientation of the stereo camera "heads" and compute the
 *rectification* transformation that makes the camera optical axes parallel.
 <B> Homogeneous Coordinates </B><br>
 Homogeneous Coordinates are a system of coordinates that are used in projective geometry. Their use
 allows to represent points at infinity by finite coordinates and simplifies formulas when compared
 to the cartesian counterparts, e.g. they have the advantage that affine transformations can be
 expressed as linear homogeneous transformation.
 One obtains the homogeneous vector \f$P_h\f$ by appending a 1 along an n-dimensional cartesian
 vector \f$P\f$ e.g. for a 3D cartesian vector the mapping \f$P \rightarrow P_h\f$ is:
 \f[\begin{bmatrix}
 X \\
 Y \\
 Z
 \end{bmatrix} \rightarrow \begin{bmatrix}
 X \\
 Y \\
 Z \\
 1
 \end{bmatrix}.\f]
 For the inverse mapping \f$P_h \rightarrow P\f$, one divides all elements of the homogeneous vector
 by its last element, e.g. for a 3D homogeneous vector one gets its 2D cartesian counterpart by:
 \f[\begin{bmatrix}
 X \\
 Y \\
 W
 \end{bmatrix} \rightarrow \begin{bmatrix}
 X / W \\
 Y / W
 \end{bmatrix},\f]
 if \f$W \ne 0\f$.
 Due to this mapping, all multiples \f$k P_h\f$, for \f$k \ne 0\f$, of a homogeneous point represent
 the same point \f$P_h\f$. An intuitive understanding of this property is that under a projective
 transformation, all multiples of \f$P_h\f$ are mapped to the same point. This is the physical
 observation one does for pinhole cameras, as all points along a ray through the camera's pinhole are
 projected to the same image point, e.g. all points along the red ray in the image of the pinhole
 camera model above would be mapped to the same image coordinate. This property is also the source
 for the scale ambiguity s in the equation of the pinhole camera model.
 As mentioned, by using homogeneous coordinates we can express any change of basis parameterized by
 \f$R\f$ and \f$t\f$ as a linear transformation, e.g. for the change of basis from coordinate system
 0 to coordinate system 1 becomes:
 \f[P_1 = R P_0 + t \rightarrow P_{h_1} = \begin{bmatrix}
 R & t \\
 0 & 1
 \end{bmatrix} P_{h_0}.\f]
@note
-   -   A calibration sample for 3 cameras in horizontal position can be found at
+    -   Many functions in this module take a camera matrix as an input parameter. Although all
        functions assume the same structure of this parameter, they may name it differently. The
        parameter's description, however, will be clear in that a camera matrix with the structure
        shown above is required.
    -   A calibration sample for 3 cameras in a horizontal position can be found at
        opencv_source_code/samples/cpp/3calibration.cpp
    -   A calibration sample based on a sequence of images can be found at
        opencv_source_code/samples/cpp/calibration.cpp
@ -527,10 +731,11 @@ CV_EXPORTS_W void composeRT( InputArray rvec1, InputArray tvec1,
 /** @brief Projects 3D points to an image plane.
-@param objectPoints Array of object points, 3xN/Nx3 1-channel or 1xN/Nx1 3-channel (or
+@param objectPoints Array of object points expressed wrt. the world coordinate frame. A 3xN/Nx3
-vector\<Point3f\> ), where N is the number of points in the view.
+1-channel or 1xN/Nx1 3-channel (or vector\<Point3f\> ), where N is the number of points in the view.
-@param rvec Rotation vector. See Rodrigues for details.
+@param rvec The rotation vector (@ref Rodrigues) that, together with tvec, performs a change of
-@param tvec Translation vector.
+basis from world to camera coordinate system, see @ref calibrateCamera for details.
@param tvec The translation vector, see parameter description above.
@param cameraMatrix Camera matrix \f$A = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{_1}\f$ .
@param distCoeffs Input vector of distortion coefficients
 \f$(k_1, k_2, p_1, p_2[, k_3[, k_4, k_5, k_6 [, s_1, s_2, s_3, s_4[, \tau_x, \tau_y]]]])\f$ of
@ -542,20 +747,21 @@ points with respect to components of the rotation vector, translation vector, fo
 coordinates of the principal point and the distortion coefficients. In the old interface different
 components of the jacobian are returned via different output parameters.
@param aspectRatio Optional "fixed aspect ratio" parameter. If the parameter is not 0, the
-function assumes that the aspect ratio (*fx/fy*) is fixed and correspondingly adjusts the jacobian
+function assumes that the aspect ratio (\f$f_x / f_y\f$) is fixed and correspondingly adjusts the
-matrix.
+jacobian matrix.
-The function computes projections of 3D points to the image plane given intrinsic and extrinsic
+The function computes the 2D projections of 3D points to the image plane, given intrinsic and
-camera parameters. Optionally, the function computes Jacobians - matrices of partial derivatives of
+extrinsic camera parameters. Optionally, the function computes Jacobians -matrices of partial
-image points coordinates (as functions of all the input parameters) with respect to the particular
+derivatives of image points coordinates (as functions of all the input parameters) with respect to
-parameters, intrinsic and/or extrinsic. The Jacobians are used during the global optimization in
+the particular parameters, intrinsic and/or extrinsic. The Jacobians are used during the global
-calibrateCamera, solvePnP, and stereoCalibrate . The function itself can also be used to compute a
+optimization in @ref calibrateCamera, @ref solvePnP, and @ref stereoCalibrate. The function itself
-re-projection error given the current intrinsic and extrinsic parameters.
+can also be used to compute a re-projection error, given the current intrinsic and extrinsic
 parameters.
-@note By setting rvec=tvec=(0,0,0) or by setting cameraMatrix to a 3x3 identity matrix, or by
+@note By setting rvec = tvec = \f$[0, 0, 0]\f$, or by setting cameraMatrix to a 3x3 identity matrix,
-passing zero distortion coefficients, you can get various useful partial cases of the function. This
+or by passing zero distortion coefficients, one can get various useful partial cases of the
-means that you can compute the distorted coordinates for a sparse set of points or apply a
+function. This means, one can compute the distorted coordinates for a sparse set of points or apply
-perspective transformation (and also compute the derivatives) in the ideal zero-distortion setup.
+a perspective transformation (and also compute the derivatives) in the ideal zero-distortion setup.
 */
 CV_EXPORTS_W void projectPoints( InputArray objectPoints,
                                 InputArray rvec, InputArray tvec,
@ -1280,44 +1486,48 @@ CV_EXPORTS_W bool findCirclesGrid( InputArray image, Size patternSize,
                                   OutputArray centers, int flags = CALIB_CB_SYMMETRIC_GRID,
                                   const Ptr<FeatureDetector> &blobDetector = SimpleBlobDetector::create());
-/** @brief Finds the camera intrinsic and extrinsic parameters from several views of a calibration pattern.
+/** @brief Finds the camera intrinsic and extrinsic parameters from several views of a calibration
 pattern.
@param objectPoints In the new interface it is a vector of vectors of calibration pattern points in
 the calibration pattern coordinate space (e.g. std::vector<std::vector<cv::Vec3f>>). The outer
-vector contains as many elements as the number of the pattern views. If the same calibration pattern
+vector contains as many elements as the number of pattern views. If the same calibration pattern
 is shown in each view and it is fully visible, all the vectors will be the same. Although, it is
-possible to use partially occluded patterns, or even different patterns in different views. Then,
+possible to use partially occluded patterns or even different patterns in different views. Then,
-the vectors will be different. The points are 3D, but since they are in a pattern coordinate system,
+the vectors will be different. Although the points are 3D, they all lie in the calibration pattern's
-then, if the rig is planar, it may make sense to put the model to a XY coordinate plane so that
+XY coordinate plane (thus 0 in the Z-coordinate), if the used calibration pattern is a planar rig.
 Z-coordinate of each input object point is 0.
 In the old interface all the vectors of object points from different views are concatenated
 together.
@param imagePoints In the new interface it is a vector of vectors of the projections of calibration
 pattern points (e.g. std::vector<std::vector<cv::Vec2f>>). imagePoints.size() and
-objectPoints.size() and imagePoints[i].size() must be equal to objectPoints[i].size() for each i.
+objectPoints.size(), and imagePoints[i].size() and objectPoints[i].size() for each i, must be equal,
-In the old interface all the vectors of object points from different views are concatenated
+respectively. In the old interface all the vectors of object points from different views are
-together.
+concatenated together.
@param imageSize Size of the image used only to initialize the intrinsic camera matrix.
-@param cameraMatrix Output 3x3 floating-point camera matrix
+@param cameraMatrix Input/output 3x3 floating-point camera matrix
 \f$A = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1}\f$ . If CV\_CALIB\_USE\_INTRINSIC\_GUESS
 and/or CALIB_FIX_ASPECT_RATIO are specified, some or all of fx, fy, cx, cy must be
 initialized before calling the function.
-@param distCoeffs Output vector of distortion coefficients
+@param distCoeffs Input/output vector of distortion coefficients
 \f$(k_1, k_2, p_1, p_2[, k_3[, k_4, k_5, k_6 [, s_1, s_2, s_3, s_4[, \tau_x, \tau_y]]]])\f$ of
 4, 5, 8, 12 or 14 elements.
-@param rvecs Output vector of rotation vectors (see Rodrigues ) estimated for each pattern view
+@param rvecs Output vector of rotation vectors (@ref Rodrigues ) estimated for each pattern view
-(e.g. std::vector<cv::Mat>>). That is, each k-th rotation vector together with the corresponding
+(e.g. std::vector<cv::Mat>>). That is, each i-th rotation vector together with the corresponding
-k-th translation vector (see the next output parameter description) brings the calibration pattern
+i-th translation vector (see the next output parameter description) brings the calibration pattern
-from the model coordinate space (in which object points are specified) to the world coordinate
+from the object coordinate space (in which object points are specified) to the camera coordinate
-space, that is, a real position of the calibration pattern in the k-th pattern view (k=0.. *M* -1).
+space. In more technical terms, the tuple of the i-th rotation and translation vector performs
-@param tvecs Output vector of translation vectors estimated for each pattern view.
+a change of basis from object coordinate space to camera coordinate space. Due to its duality, this
-@param stdDeviationsIntrinsics Output vector of standard deviations estimated for intrinsic parameters.
+tuple is equivalent to the position of the calibration pattern with respect to the camera coordinate
- Order of deviations values:
+space.
@param tvecs Output vector of translation vectors estimated for each pattern view, see parameter
 describtion above.
@param stdDeviationsIntrinsics Output vector of standard deviations estimated for intrinsic
 parameters. Order of deviations values:
 \f$(f_x, f_y, c_x, c_y, k_1, k_2, p_1, p_2, k_3, k_4, k_5, k_6 , s_1, s_2, s_3,
 s_4, \tau_x, \tau_y)\f$ If one of parameters is not estimated, it's deviation is equals to zero.
-@param stdDeviationsExtrinsics Output vector of standard deviations estimated for extrinsic parameters.
+@param stdDeviationsExtrinsics Output vector of standard deviations estimated for extrinsic
- Order of deviations values: \f$(R_1, T_1, \dotsc , R_M, T_M)\f$ where M is number of pattern views,
+parameters. Order of deviations values: \f$(R_0, T_0, \dotsc , R_{M - 1}, T_{M - 1})\f$ where M is
- \f$R_i, T_i\f$ are concatenated 1x3 vectors.
+the number of pattern views. \f$R_i, T_i\f$ are concatenated 1x3 vectors.
 @param perViewErrors Output vector of the RMS re-projection error estimated for each pattern view.
@param flags Different flags that may be zero or a combination of the following values:
 -   **CALIB_USE_INTRINSIC_GUESS** cameraMatrix contains valid initial values of
@ -1328,7 +1538,7 @@ estimate extrinsic parameters. Use solvePnP instead.
 -   **CALIB_FIX_PRINCIPAL_POINT** The principal point is not changed during the global
 optimization. It stays at the center or at a different location specified when
 CALIB_USE_INTRINSIC_GUESS is set too.
-   **CALIB_FIX_ASPECT_RATIO** The functions considers only fy as a free parameter. The
+-   **CALIB_FIX_ASPECT_RATIO** The functions consider only fy as a free parameter. The
 ratio fx/fy stays the same as in the input cameraMatrix . When
 CALIB_USE_INTRINSIC_GUESS is not set, the actual input values of fx and fy are
 ignored, only their ratio is computed and used further.
@ -1362,10 +1572,10 @@ supplied distCoeffs matrix is used. Otherwise, it is set to 0.
 The function estimates the intrinsic camera parameters and extrinsic parameters for each of the
 views. The algorithm is based on @cite Zhang2000 and @cite BouguetMCT . The coordinates of 3D object
 points and their corresponding 2D projections in each view must be specified. That may be achieved
-by using an object with a known geometry and easily detectable feature points. Such an object is
+by using an object with known geometry and easily detectable feature points. Such an object is
 called a calibration rig or calibration pattern, and OpenCV has built-in support for a chessboard as
-a calibration rig (see findChessboardCorners ). Currently, initialization of intrinsic parameters
+a calibration rig (see @ref findChessboardCorners). Currently, initialization of intrinsic
-(when CALIB_USE_INTRINSIC_GUESS is not set) is only implemented for planar calibration
+parameters (when CALIB_USE_INTRINSIC_GUESS is not set) is only implemented for planar calibration
 patterns (where Z-coordinates of the object points must be all zeros). 3D calibration rigs can also
 be used as long as initial cameraMatrix is provided.
@ -1384,11 +1594,11 @@ The algorithm performs the following steps:
    objectPoints. See projectPoints for details.
@note
-   If you use a non-square (=non-NxN) grid and findChessboardCorners for calibration, and
+    If you use a non-square (i.e. non-N-by-N) grid and @ref findChessboardCorners for calibration,
-    calibrateCamera returns bad values (zero distortion coefficients, an image center very far from
+    and @ref calibrateCamera returns bad values (zero distortion coefficients, \f$c_x\f$ and
-    (w/2-0.5,h/2-0.5), and/or large differences between \f$f_x\f$ and \f$f_y\f$ (ratios of 10:1 or more)),
+    \f$c_y\f$ very far from the image center, and/or large differences between \f$f_x\f$ and
-    then you have probably used patternSize=cvSize(rows,cols) instead of using
+    \f$f_y\f$ (ratios of 10:1 or more)), then you are probably using patternSize=cvSize(rows,cols)
-    patternSize=cvSize(cols,rows) in findChessboardCorners .
+    instead of using patternSize=cvSize(cols,rows) in @ref findChessboardCorners.
@sa
   findChessboardCorners, solvePnP, initCameraMatrix2D, stereoCalibrate, undistort
@ -1444,27 +1654,34 @@ CV_EXPORTS_W void calibrationMatrixValues( InputArray cameraMatrix, Size imageSi
                                           CV_OUT double& focalLength, CV_OUT Point2d& principalPoint,
                                           CV_OUT double& aspectRatio );
-/** @brief Calibrates the stereo camera.
+/** @brief Calibrates a stereo camera set up. This function finds the intrinsic parameters
 for each of the two cameras and the extrinsic parameters between the two cameras.
-@param objectPoints Vector of vectors of the calibration pattern points.
+@param objectPoints Vector of vectors of the calibration pattern points. The same structure as
 in @ref calibrateCamera. For each pattern view, both cameras need to see the same object
 points. Therefore, objectPoints.size(), imagePoints1.size(), and imagePoints2.size() need to be
 equal as well as objectPoints[i].size(), imagePoints1[i].size(), and imagePoints2[i].size() need to
 be equal for each i.
@param imagePoints1 Vector of vectors of the projections of the calibration pattern points,
-observed by the first camera.
+observed by the first camera. The same structure as in @ref calibrateCamera.
@param imagePoints2 Vector of vectors of the projections of the calibration pattern points,
-observed by the second camera.
+observed by the second camera. The same structure as in @ref calibrateCamera.
-@param cameraMatrix1 Input/output first camera matrix:
+@param cameraMatrix1 Input/output camera matrix for the first camera, the same as in
-\f$\vecthreethree{f_x^{(j)}}{0}{c_x^{(j)}}{0}{f_y^{(j)}}{c_y^{(j)}}{0}{0}{1}\f$ , \f$j = 0,\, 1\f$ . If
+@ref calibrateCamera. Furthermore, for the stereo case, additional flags may be used, see below.
-any of CALIB_USE_INTRINSIC_GUESS , CALIB_FIX_ASPECT_RATIO ,
+@param distCoeffs1 Input/output vector of distortion coefficients, the same as in
-CALIB_FIX_INTRINSIC , or CALIB_FIX_FOCAL_LENGTH are specified, some or all of the
+@ref calibrateCamera.
-matrix components must be initialized. See the flags description for details.
+@param cameraMatrix2 Input/output second camera matrix for the second camera. See description for
-@param distCoeffs1 Input/output vector of distortion coefficients
+cameraMatrix1.
-\f$(k_1, k_2, p_1, p_2[, k_3[, k_4, k_5, k_6 [, s_1, s_2, s_3, s_4[, \tau_x, \tau_y]]]])\f$ of
+@param distCoeffs2 Input/output lens distortion coefficients for the second camera. See
-4, 5, 8, 12 or 14 elements. The output vector length depends on the flags.
+description for distCoeffs1.
-@param cameraMatrix2 Input/output second camera matrix. The parameter is similar to cameraMatrix1
+@param imageSize Size of the image used only to initialize the intrinsic camera matrices.
-@param distCoeffs2 Input/output lens distortion coefficients for the second camera. The parameter
+@param R Output rotation matrix. Together with the translation vector T, this matrix brings
-is similar to distCoeffs1 .
+points given in the first camera's coordinate system to points in the second camera's
-@param imageSize Size of the image used only to initialize intrinsic camera matrix.
+coordinate system. In more technical terms, the tuple of R and T performs a change of basis
-@param R Output rotation matrix between the 1st and the 2nd camera coordinate systems.
+from the first camera's coordinate system to the second camera's coordinate system. Due to its
-@param T Output translation vector between the coordinate systems of the cameras.
+duality, this tuple is equivalent to the position of the first camera with respect to the
 second camera coordinate system.
@param T Output translation vector, see description above.
@param E Output essential matrix.
@param F Output fundamental matrix.
@param perViewErrors Output vector of the RMS re-projection error estimated for each pattern view.
@ -1473,8 +1690,8 @@ is similar to distCoeffs1 .
 matrices are estimated.
 -   **CALIB_USE_INTRINSIC_GUESS** Optimize some or all of the intrinsic parameters
 according to the specified flags. Initial values are provided by the user.
-   **CALIB_USE_EXTRINSIC_GUESS** R, T contain valid initial values that are optimized further.
+-   **CALIB_USE_EXTRINSIC_GUESS** R and T contain valid initial values that are optimized further.
-Otherwise R, T are initialized to the median value of the pattern views (each dimension separately).
+Otherwise R and T are initialized to the median value of the pattern views (each dimension separately).
 -   **CALIB_FIX_PRINCIPAL_POINT** Fix the principal points during the optimization.
 -   **CALIB_FIX_FOCAL_LENGTH** Fix \f$f^{(j)}_x\f$ and \f$f^{(j)}_y\f$ .
 -   **CALIB_FIX_ASPECT_RATIO** Optimize \f$f^{(j)}_y\f$ . Fix the ratio \f$f^{(j)}_x/f^{(j)}_y\f$
@ -1505,29 +1722,49 @@ the optimization. If CALIB_USE_INTRINSIC_GUESS is set, the coefficient from the
 supplied distCoeffs matrix is used. Otherwise, it is set to 0.
@param criteria Termination criteria for the iterative optimization algorithm.
-The function estimates transformation between two cameras making a stereo pair. If you have a stereo
+The function estimates the transformation between two cameras making a stereo pair. If one computes
-camera where the relative position and orientation of two cameras is fixed, and if you computed
+the poses of an object relative to the first camera and to the second camera,
-poses of an object relative to the first camera and to the second camera, (R1, T1) and (R2, T2),
+( \f$R_1\f$,\f$T_1\f$ ) and (\f$R_2\f$,\f$T_2\f$), respectively, for a stereo camera where the
-respectively (this can be done with solvePnP ), then those poses definitely relate to each other.
+relative position and orientation between the two cameras are fixed, then those poses definitely
-This means that, given ( \f$R_1\f$,\f$T_1\f$ ), it should be possible to compute ( \f$R_2\f$,\f$T_2\f$ ). You only
+relate to each other. This means, if the relative position and orientation (\f$R\f$,\f$T\f$) of the
-need to know the position and orientation of the second camera relative to the first camera. This is
+two cameras is known, it is possible to compute (\f$R_2\f$,\f$T_2\f$) when (\f$R_1\f$,\f$T_1\f$) is
-what the described function does. It computes ( \f$R\f$,\f$T\f$ ) so that:
+given. This is what the described function does. It computes (\f$R\f$,\f$T\f$) such that:
 \f[R_2=R R_1\f]
 \f[T_2=R T_1 + T.\f]
 Therefore, one can compute the coordinate representation of a 3D point for the second camera's
 coordinate system when given the point's coordinate representation in the first camera's coordinate
 system:
 \f[\begin{bmatrix}
 X_2 \\
 Y_2 \\
 Z_2 \\
 1
 \end{bmatrix} = \begin{bmatrix}
 R & T \\
 0 & 1
 \end{bmatrix} \begin{bmatrix}
 X_1 \\
 Y_1 \\
 Z_1 \\
 1
 \end{bmatrix}.\f]
 \f[R_2=R*R_1\f]
 \f[T_2=R*T_1 + T,\f]
 Optionally, it computes the essential matrix E:
-\f[E= \vecthreethree{0}{-T_2}{T_1}{T_2}{0}{-T_0}{-T_1}{T_0}{0} *R\f]
+\f[E= \vecthreethree{0}{-T_2}{T_1}{T_2}{0}{-T_0}{-T_1}{T_0}{0} R\f]
-where \f$T_i\f$ are components of the translation vector \f$T\f$ : \f$T=[T_0, T_1, T_2]^T\f$ . And the function
+where \f$T_i\f$ are components of the translation vector \f$T\f$ : \f$T=[T_0, T_1, T_2]^T\f$ .
-can also compute the fundamental matrix F:
+And the function can also compute the fundamental matrix F:
 \f[F = cameraMatrix2^{-T} E cameraMatrix1^{-1}\f]
 Besides the stereo-related information, the function can also perform a full calibration of each of
-two cameras. However, due to the high dimensionality of the parameter space and noise in the input
+the two cameras. However, due to the high dimensionality of the parameter space and noise in the
-data, the function can diverge from the correct solution. If the intrinsic parameters can be
+input data, the function can diverge from the correct solution. If the intrinsic parameters can be
 estimated with high accuracy for each of the cameras individually (for example, using
 calibrateCamera ), you are recommended to do so and then pass CALIB_FIX_INTRINSIC flag to the
 function along with the computed intrinsic parameters. Otherwise, if all the parameters are
@ -1563,15 +1800,25 @@ CV_EXPORTS_W double stereoCalibrate( InputArrayOfArrays objectPoints,
@param cameraMatrix2 Second camera matrix.
@param distCoeffs2 Second camera distortion parameters.
@param imageSize Size of the image used for stereo calibration.
-@param R Rotation matrix from the coordinate system of the first camera to the second.
+@param R Rotation matrix from the coordinate system of the first camera to the second camera,
-@param T Translation vector from the coordinate system of the first camera to the second.
+see @ref stereoCalibrate.
-@param R1 Output 3x3 rectification transform (rotation matrix) for the first camera.
+@param T Translation vector from the coordinate system of the first camera to the second camera,
-@param R2 Output 3x3 rectification transform (rotation matrix) for the second camera.
+see @ref stereoCalibrate.
@param R1 Output 3x3 rectification transform (rotation matrix) for the first camera. This matrix
 brings points given in the unrectified first camera's coordinate system to points in the rectified
 first camera's coordinate system. In more technical terms, it performs a change of basis from the
 unrectified first camera's coordinate system to the rectified first camera's coordinate system.
@param R2 Output 3x3 rectification transform (rotation matrix) for the second camera. This matrix
 brings points given in the unrectified second camera's coordinate system to points in the rectified
 second camera's coordinate system. In more technical terms, it performs a change of basis from the
 unrectified second camera's coordinate system to the rectified second camera's coordinate system.
@param P1 Output 3x4 projection matrix in the new (rectified) coordinate systems for the first
-camera.
+camera, i.e. it projects points given in the rectified first camera coordinate system into the
 rectified first camera's image.
@param P2 Output 3x4 projection matrix in the new (rectified) coordinate systems for the second
-camera.
+camera, i.e. it projects points given in the rectified first camera coordinate system into the
-@param Q Output \f$4 \times 4\f$ disparity-to-depth mapping matrix (see reprojectImageTo3D ).
+rectified second camera's image.
@param Q Output \f$4 \times 4\f$ disparity-to-depth mapping matrix (see @ref reprojectImageTo3D).
@param flags Operation flags that may be zero or CALIB_ZERO_DISPARITY . If the flag is set,
 the function makes the principal points of each camera have the same pixel coordinates in the
 rectified views. And if the flag is not set, the function may still shift the images in the
@ -1582,11 +1829,11 @@ scaling. Otherwise, the parameter should be between 0 and 1. alpha=0 means that
 images are zoomed and shifted so that only valid pixels are visible (no black areas after
 rectification). alpha=1 means that the rectified image is decimated and shifted so that all the
 pixels from the original images from the cameras are retained in the rectified images (no source
-image pixels are lost). Obviously, any intermediate value yields an intermediate result between
+image pixels are lost). Any intermediate value yields an intermediate result between
 those two extreme cases.
@param newImageSize New image resolution after rectification. The same size should be passed to
 initUndistortRectifyMap (see the stereo_calib.cpp sample in OpenCV samples directory). When (0,0)
-is passed (default), it is set to the original imageSize . Setting it to larger value can help you
+is passed (default), it is set to the original imageSize . Setting it to a larger value can help you
 preserve details in the original image, especially when there is a big radial distortion.
@param validPixROI1 Optional output rectangles inside the rectified images where all the pixels
 are valid. If alpha=0 , the ROIs cover the whole images. Otherwise, they are likely to be smaller
@ -1602,27 +1849,43 @@ as input. As output, it provides two rotation matrices and also two projection m
 coordinates. The function distinguishes the following two cases:
 -   **Horizontal stereo**: the first and the second camera views are shifted relative to each other
-    mainly along the x axis (with possible small vertical shift). In the rectified images, the
+    mainly along the x-axis (with possible small vertical shift). In the rectified images, the
    corresponding epipolar lines in the left and right cameras are horizontal and have the same
    y-coordinate. P1 and P2 look like:
-    \f[\texttt{P1} = \begin{bmatrix} f & 0 & cx_1 & 0 \\ 0 & f & cy & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\f]
+    \f[\texttt{P1} = \begin{bmatrix}
                        f & 0 & cx_1 & 0 \\
                        0 & f & cy & 0 \\
                        0 & 0 & 1 & 0
                     \end{bmatrix}\f]
-    \f[\texttt{P2} = \begin{bmatrix} f & 0 & cx_2 & T_x*f \\ 0 & f & cy & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} ,\f]
+    \f[\texttt{P2} = \begin{bmatrix}
                        f & 0 & cx_2 & T_x*f \\
                        0 & f & cy & 0 \\
                        0 & 0 & 1 & 0
                     \end{bmatrix} ,\f]
    where \f$T_x\f$ is a horizontal shift between the cameras and \f$cx_1=cx_2\f$ if
    CALIB_ZERO_DISPARITY is set.
 -   **Vertical stereo**: the first and the second camera views are shifted relative to each other
-    mainly in vertical direction (and probably a bit in the horizontal direction too). The epipolar
+    mainly in the vertical direction (and probably a bit in the horizontal direction too). The epipolar
    lines in the rectified images are vertical and have the same x-coordinate. P1 and P2 look like:
-    \f[\texttt{P1} = \begin{bmatrix} f & 0 & cx & 0 \\ 0 & f & cy_1 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\f]
+    \f[\texttt{P1} = \begin{bmatrix}
                        f & 0 & cx & 0 \\
                        0 & f & cy_1 & 0 \\
                        0 & 0 & 1 & 0
                     \end{bmatrix}\f]
-    \f[\texttt{P2} = \begin{bmatrix} f & 0 & cx & 0 \\ 0 & f & cy_2 & T_y*f \\ 0 & 0 & 1 & 0 \end{bmatrix} ,\f]
+    \f[\texttt{P2} = \begin{bmatrix}
                        f & 0 & cx & 0 \\
                        0 & f & cy_2 & T_y*f \\
                        0 & 0 & 1 & 0
                     \end{bmatrix},\f]
-    where \f$T_y\f$ is a vertical shift between the cameras and \f$cy_1=cy_2\f$ if CALIB_ZERO_DISPARITY is
+    where \f$T_y\f$ is a vertical shift between the cameras and \f$cy_1=cy_2\f$ if
-    set.
+    CALIB_ZERO_DISPARITY is set.
 As you can see, the first three columns of P1 and P2 will effectively be the new "rectified" camera
 matrices. The matrices, together with R1 and R2 , can then be passed to initUndistortRectifyMap to
@ -2029,35 +2292,47 @@ CV_EXPORTS_W Mat findEssentialMat( InputArray points1, InputArray points2,
@param R2 Another possible rotation matrix.
@param t One possible translation.
-This function decompose an essential matrix E using svd decomposition @cite HartleyZ00 . Generally 4
+This function decomposes the essential matrix E using svd decomposition @cite HartleyZ00. In
-possible poses exists for a given E. They are \f$[R_1, t]\f$, \f$[R_1, -t]\f$, \f$[R_2, t]\f$, \f$[R_2, -t]\f$. By
+general, four possible poses exist for the decomposition of E. They are \f$[R_1, t]\f$,
-decomposing E, you can only get the direction of the translation, so the function returns unit t.
+\f$[R_1, -t]\f$, \f$[R_2, t]\f$, \f$[R_2, -t]\f$.
 If E gives the epipolar constraint \f$[p_2; 1]^T A^{-T} E A^{-1} [p_1; 1] = 0\f$ between the image
 points \f$p_1\f$ in the first image and \f$p_2\f$ in second image, then any of the tuples
 \f$[R_1, t]\f$, \f$[R_1, -t]\f$, \f$[R_2, t]\f$, \f$[R_2, -t]\f$ is a change of basis from the first
 camera's coordinate system to the second camera's coordinate system. However, by decomposing E, one
 can only get the direction of the translation. For this reason, the translation t is returned with
 unit length.
 */
 CV_EXPORTS_W void decomposeEssentialMat( InputArray E, OutputArray R1, OutputArray R2, OutputArray t );
-/** @brief Recover relative camera rotation and translation from an estimated essential matrix and the
+/** @brief Recovers the relative camera rotation and the translation from an estimated essential
-corresponding points in two images, using cheirality check. Returns the number of inliers which pass
+matrix and the corresponding points in two images, using cheirality check. Returns the number of
-the check.
+inliers that pass the check.
@param E The input essential matrix.
@param points1 Array of N 2D points from the first image. The point coordinates should be
 floating-point (single or double precision).
@param points2 Array of the second image points of the same size and format as points1 .
-@param cameraMatrix Camera matrix \f$K = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1}\f$ .
+@param cameraMatrix Camera matrix \f$A = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1}\f$ .
 Note that this function assumes that points1 and points2 are feature points from cameras with the
 same camera matrix.
-@param R Recovered relative rotation.
+@param R Output rotation matrix. Together with the translation vector, this matrix makes up a tuple
-@param t Recovered relative translation.
+that performs a change of basis from the first camera's coordinate system to the second camera's
-@param mask Input/output mask for inliers in points1 and points2.
+coordinate system. Note that, in general, t can not be used for this tuple, see the parameter
-:   If it is not empty, then it marks inliers in points1 and points2 for then given essential
+described below.
-matrix E. Only these inliers will be used to recover pose. In the output mask only inliers
+@param t Output translation vector. This vector is obtained by @ref decomposeEssentialMat and
-which pass the cheirality check.
+therefore is only known up to scale, i.e. t is the direction of the translation vector and has unit
-This function decomposes an essential matrix using decomposeEssentialMat and then verifies possible
+length.
-pose hypotheses by doing cheirality check. The cheirality check basically means that the
+@param mask Input/output mask for inliers in points1 and points2. If it is not empty, then it marks
 inliers in points1 and points2 for then given essential matrix E. Only these inliers will be used to
 recover pose. In the output mask only inliers which pass the cheirality check.
 This function decomposes an essential matrix using @ref decomposeEssentialMat and then verifies
 possible pose hypotheses by doing cheirality check. The cheirality check means that the
 triangulated 3D points should have positive depth. Some details can be found in @cite Nister03.
-This function can be used to process output E and mask from findEssentialMat. In this scenario,
+This function can be used to process the output E and mask from @ref findEssentialMat. In this
-points1 and points2 are the same input for findEssentialMat. :
+scenario, points1 and points2 are the same input for findEssentialMat.:
@code
    // Example. Estimation of fundamental matrix using the RANSAC algorithm
    int point_count = 100;
@ -2089,20 +2364,24 @@ CV_EXPORTS_W int recoverPose( InputArray E, InputArray points1, InputArray point
@param points1 Array of N 2D points from the first image. The point coordinates should be
 floating-point (single or double precision).
@param points2 Array of the second image points of the same size and format as points1 .
-@param R Recovered relative rotation.
+@param R Output rotation matrix. Together with the translation vector, this matrix makes up a tuple
-@param t Recovered relative translation.
+that performs a change of basis from the first camera's coordinate system to the second camera's
 coordinate system. Note that, in general, t can not be used for this tuple, see the parameter
 description below.
@param t Output translation vector. This vector is obtained by @ref decomposeEssentialMat and
 therefore is only known up to scale, i.e. t is the direction of the translation vector and has unit
 length.
@param focal Focal length of the camera. Note that this function assumes that points1 and points2
 are feature points from cameras with same focal length and principal point.
@param pp principal point of the camera.
-@param mask Input/output mask for inliers in points1 and points2.
+@param mask Input/output mask for inliers in points1 and points2. If it is not empty, then it marks
-:   If it is not empty, then it marks inliers in points1 and points2 for then given essential
+inliers in points1 and points2 for then given essential matrix E. Only these inliers will be used to
-matrix E. Only these inliers will be used to recover pose. In the output mask only inliers
+recover pose. In the output mask only inliers which pass the cheirality check.
 which pass the cheirality check.
 This function differs from the one above that it computes camera matrix from focal length and
 principal point:
-\f[K =
+\f[A =
 \begin{bmatrix}
 f & 0 & x_{pp}  \\
 0 & f & y_{pp}  \\
@ -2119,19 +2398,26 @@ CV_EXPORTS_W int recoverPose( InputArray E, InputArray points1, InputArray point
@param points1 Array of N 2D points from the first image. The point coordinates should be
 floating-point (single or double precision).
@param points2 Array of the second image points of the same size and format as points1.
-@param cameraMatrix Camera matrix \f$K = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1}\f$ .
+@param cameraMatrix Camera matrix \f$A = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1}\f$ .
 Note that this function assumes that points1 and points2 are feature points from cameras with the
 same camera matrix.
-@param R Recovered relative rotation.
+@param R Output rotation matrix. Together with the translation vector, this matrix makes up a tuple
-@param t Recovered relative translation.
+that performs a change of basis from the first camera's coordinate system to the second camera's
-@param distanceThresh threshold distance which is used to filter out far away points (i.e. infinite points).
+coordinate system. Note that, in general, t can not be used for this tuple, see the parameter
-@param mask Input/output mask for inliers in points1 and points2.
+description below.
-:   If it is not empty, then it marks inliers in points1 and points2 for then given essential
+@param t Output translation vector. This vector is obtained by @ref decomposeEssentialMat and
-matrix E. Only these inliers will be used to recover pose. In the output mask only inliers
+therefore is only known up to scale, i.e. t is the direction of the translation vector and has unit
-which pass the cheirality check.
+length.
-@param triangulatedPoints 3d points which were reconstructed by triangulation.
+@param distanceThresh threshold distance which is used to filter out far away points (i.e. infinite
- */
+points).
@param mask Input/output mask for inliers in points1 and points2. If it is not empty, then it marks
 inliers in points1 and points2 for then given essential matrix E. Only these inliers will be used to
 recover pose. In the output mask only inliers which pass the cheirality check.
@param triangulatedPoints 3D points which were reconstructed by triangulation.
 This function differs from the one above that it outputs the triangulated 3D point that are used for
 the cheirality check.
 */
 CV_EXPORTS_W int recoverPose( InputArray E, InputArray points1, InputArray points2,
                            InputArray cameraMatrix, OutputArray R, OutputArray t, double distanceThresh, InputOutputArray mask = noArray(),
                            OutputArray triangulatedPoints = noArray());
@ -2162,22 +2448,27 @@ Line coefficients are defined up to a scale. They are normalized so that \f$a_i^
 CV_EXPORTS_W void computeCorrespondEpilines( InputArray points, int whichImage,
                                             InputArray F, OutputArray lines );
-/** @brief Reconstructs points by triangulation.
+/** @brief This function reconstructs 3-dimensional points (in homogeneous coordinates) by using
 their observations with a stereo camera.
-@param projMatr1 3x4 projection matrix of the first camera.
+@param projMatr1 3x4 projection matrix of the first camera, i.e. this matrix projects 3D points
-@param projMatr2 3x4 projection matrix of the second camera.
+given in the world's coordinate system into the first image.
-@param projPoints1 2xN array of feature points in the first image. In case of c++ version it can
+@param projMatr2 3x4 projection matrix of the second camera, i.e. this matrix projects 3D points
-be also a vector of feature points or two-channel matrix of size 1xN or Nx1.
+given in the world's coordinate system into the second image.
-@param projPoints2 2xN array of corresponding points in the second image. In case of c++ version
+@param projPoints1 2xN array of feature points in the first image. In the case of the c++ version,
 it can be also a vector of feature points or two-channel matrix of size 1xN or Nx1.
-@param points4D 4xN array of reconstructed points in homogeneous coordinates.
+@param projPoints2 2xN array of corresponding points in the second image. In the case of the c++
-
+version, it can be also a vector of feature points or two-channel matrix of size 1xN or Nx1.
-The function reconstructs 3-dimensional points (in homogeneous coordinates) by using their
+@param points4D 4xN array of reconstructed points in homogeneous coordinates. These points are
-observations with a stereo camera. Projections matrices can be obtained from stereoRectify.
+returned in the world's coordinate system.
@note
   Keep in mind that all input data should be of float type in order for this function to work.
@note
   If the projection matrices from @ref stereoRectify are used, then the returned points are
   represented in the first camera's rectified coordinate system.
@sa
   reprojectImageTo3D
 */
@ -2232,15 +2523,16 @@ CV_EXPORTS_W void validateDisparity( InputOutputArray disparity, InputArray cost
 /** @brief Reprojects a disparity image to 3D space.
@param disparity Input single-channel 8-bit unsigned, 16-bit signed, 32-bit signed or 32-bit
-floating-point disparity image.
+floating-point disparity image. The values of 8-bit / 16-bit signed formats are assumed to have no
-The values of 8-bit / 16-bit signed formats are assumed to have no fractional bits.
+fractional bits. If the disparity is 16-bit signed format, as computed by @ref StereoBM or
-If the disparity is 16-bit signed format as computed by
+@ref StereoSGBM and maybe other algorithms, it should be divided by 16 (and scaled to float) before
-StereoBM/StereoSGBM/StereoBinaryBM/StereoBinarySGBM and may be other algorithms,
+being used here.
-it should be divided by 16 (and scaled to float) before being used here.
+@param _3dImage Output 3-channel floating-point image of the same size as disparity. Each element of
-@param _3dImage Output 3-channel floating-point image of the same size as disparity . Each
+_3dImage(x,y) contains 3D coordinates of the point (x,y) computed from the disparity map. If one
-element of _3dImage(x,y) contains 3D coordinates of the point (x,y) computed from the disparity
+uses Q obtained by @ref stereoRectify, then the returned points are represented in the first
-map.
+camera's rectified coordinate system.
-@param Q \f$4 \times 4\f$ perspective transformation matrix that can be obtained with stereoRectify.
+@param Q \f$4 \times 4\f$ perspective transformation matrix that can be obtained with
@ref stereoRectify.
@param handleMissingValues Indicates, whether the function should handle missing values (i.e.
 points where the disparity was not computed). If handleMissingValues=true, then pixels with the
 minimal disparity that corresponds to the outliers (see StereoMatcher::compute ) are transformed
@ -2252,11 +2544,20 @@ The function transforms a single-channel disparity map to a 3-channel image repr
 surface. That is, for each pixel (x,y) and the corresponding disparity d=disparity(x,y) , it
 computes:
-\f[\begin{array}{l} [X \; Y \; Z \; W]^T =  \texttt{Q} *[x \; y \; \texttt{disparity} (x,y) \; 1]^T  \\ \texttt{\_3dImage} (x,y) = (X/W, \; Y/W, \; Z/W) \end{array}\f]
+\f[\begin{bmatrix}
 X \\
 Y \\
 Z \\
 W
 \end{bmatrix} = Q \begin{bmatrix}
 x \\
 y \\
 \texttt{disparity} (x,y) \\
 z
 \end{bmatrix}.\f]
-The matrix Q can be an arbitrary \f$4 \times 4\f$ matrix (for example, the one computed by
+@sa
-stereoRectify). To reproject a sparse set of points {(x,y,d),...} to 3D space, use
+   To reproject a sparse set of points {(x,y,d),...} to 3D space, use perspectiveTransform.
 perspectiveTransform .
 */
 CV_EXPORTS_W void reprojectImageTo3D( InputArray disparity,
                                      OutputArray _3dImage, InputArray Q,
@ -2463,11 +2764,19 @@ Check @ref tutorial_homography "the corresponding tutorial" for more details.
@param translations Array of translation matrices.
@param normals Array of plane normal matrices.
-This function extracts relative camera motion between two views observing a planar object from the
+This function extracts relative camera motion between two views of a planar object and returns up to
-homography H induced by the plane. The intrinsic camera matrix K must also be provided. The function
+four mathematical solution tuples of rotation, translation, and plane normal. The decomposition of
-may return up to four mathematical solution sets. At least two of the solutions may further be
+the homography matrix H is described in detail in @cite Malis.
-invalidated if point correspondences are available by applying positive depth constraint (all points
+
-must be in front of the camera). The decomposition method is described in detail in @cite Malis .
+If the homography H, induced by the plane, gives the constraint
 \f[s_i \vecthree{x'_i}{y'_i}{1} \sim H \vecthree{x_i}{y_i}{1}\f] on the source image points
 \f$p_i\f$ and the destination image points \f$p'_i\f$, then the tuple of rotations[k] and
 translations[k] is a change of basis from the source camera's coordinate system to the destination
 camera's coordinate system. However, by decomposing H, one can only get the translation normalized
 by the (typically unknown) depth of the scene, i.e. its direction but with normalized length.
 If point correspondences are available, at least two solutions may further be invalidated, by
 applying positive depth constraint, i.e. all points must be in front of the camera.
 */
 CV_EXPORTS_W int decomposeHomographyMat(InputArray H,
                                        InputArray K,