Add additional information about homogeneous transformations. Add quick formulas for conversions between physical focal length, sensor size, fov and camera intrinsic params.

This commit is contained in:
Souriya Trinh 2025-04-13 22:15:45 +02:00
parent 6ef5746391
commit 7f7be9bab0
2 changed files with 101 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

View File

@ -383,6 +383,107 @@ R & t \\
0 & 1
\end{bmatrix} P_{h_0}.\f]
<B> Homogeneous Transformations, Object frame / Camera frame </B><br>
Change of basis or computing the 3D coordinates from one frame to another frame can be achieved easily using
the following notation:
\f[
\mathbf{X}_c = \hspace{0.2em}
{}^{c}\mathbf{T}_o \hspace{0.2em} \mathbf{X}_o
\f]
\f[
\begin{bmatrix}
X_c \\
Y_c \\
Z_c \\
1
\end{bmatrix} =
\begin{bmatrix}
{}^{c}\mathbf{R}_o & {}^{c}\mathbf{t}_o \\
0_{1 \times 3} & 1
\end{bmatrix}
\begin{bmatrix}
X_o \\
Y_o \\
Z_o \\
1
\end{bmatrix}
\f]
For a 3D points (\f$ \mathbf{X}_o \f$) expressed in the object frame, the homogeneous transformation matrix
\f$ {}^{c}\mathbf{T}_o \f$ allows computing the corresponding coordinate (\f$ \mathbf{X}_c \f$) in the camera frame.
This transformation matrix is composed of a 3x3 rotation matrix \f$ {}^{c}\mathbf{R}_o \f$ and a 3x1 translation vector
\f$ {}^{c}\mathbf{t}_o \f$.
The 3x1 translation vector \f$ {}^{c}\mathbf{t}_o \f$ is the position of the object frame in the camera frame and the
3x3 rotation matrix \f$ {}^{c}\mathbf{R}_o \f$ the orientation of the object frame in the camera frame.
With this simple notation, it is easy to chain the transformations. For instance, to compute the 3D coordinates of a point
expressed in the object frame in the world frame can be done with:
\f[
\mathbf{X}_w = \hspace{0.2em}
{}^{w}\mathbf{T}_c \hspace{0.2em} {}^{c}\mathbf{T}_o \hspace{0.2em}
\mathbf{X}_o =
{}^{w}\mathbf{T}_o \hspace{0.2em} \mathbf{X}_o
\f]
Similarly, computing the inverse transformation can be done with:
\f[
\mathbf{X}_o = \hspace{0.2em}
{}^{o}\mathbf{T}_c \hspace{0.2em} \mathbf{X}_c =
\left( {}^{c}\mathbf{T}_o \right)^{-1} \hspace{0.2em} \mathbf{X}_c
\f]
The inverse of an homogeneous transformation matrix is then:
\f[
{}^{o}\mathbf{T}_c = \left( {}^{c}\mathbf{T}_o \right)^{-1} =
\begin{bmatrix}
{}^{c}\mathbf{R}^{\top}_o & - \hspace{0.2em} {}^{c}\mathbf{R}^{\top}_o \hspace{0.2em} {}^{c}\mathbf{t}_o \\
0_{1 \times 3} & 1
\end{bmatrix}
\f]
One can note that the inverse of a 3x3 rotation matrix is directly its matrix transpose.
![Perspective projection, from object to camera frame](pics/pinhole_homogeneous_transformation.png)
This figure summarizes the whole process. The object pose returned for instance by the @ref solvePnP function
or pose from fiducial marker detection is this \f$ {}^{c}\mathbf{T}_o \f$ transformation.
The camera intrinsic matrix \f$ \mathbf{K} \f$ allows projecting the 3D point expressed in the camera frame onto the image plane
assuming a perspective projection model (pinhole camera model). Image coordinates extracted from classical image processing functions
assume a (u,v) top-left coordinates frame.
\note
- for an online video course on this topic, see for instance:
- ["3.3.1. Homogeneous Transformation Matrices", Modern Robotics, Kevin M. Lynch and Frank C. Park](https://modernrobotics.northwestern.edu/nu-gm-book-resource/3-3-1-homogeneous-transformation-matrices/)
- the 3x3 rotation matrix is composed of 9 values but describes a 3 dof transformation
- some additional properties of the 3x3 rotation matrix are:
- \f$ \mathrm{det} \left( \mathbf{R} \right) = 1 \f$
- \f$ \mathbf{R} \mathbf{R}^{\top} = \mathbf{R}^{\top} \mathbf{R} = \mathrm{I}_{3 \times 3} \f$
- interpolating rotation can be done using the [Slerp (spherical linear interpolation)](https://en.wikipedia.org/wiki/Slerp) method
- quick conversions between the different rotation formalisms can be done using this [online tool](https://www.andre-gaschler.com/rotationconverter/)
<B> Intrinsic parameters from camera lens specifications </B><br>
When dealing with industrial cameras, the camera intrinsic matrix or more precisely \f$ \left(f_x, f_y \right) \f$
can be deduced, approximated from the camera specifications:
\f[
f_x = \frac{f_{\text{mm}}}{\text{pixel_size_in_mm}} = \frac{f_{\text{mm}}}{\text{sensor_size_in_mm} / \text{nb_pixels}}
\f]
In a same way, the physical focal length can be deduced from the angular field of view:
\f[
f_{\text{mm}} = \frac{\text{sensor_size_in_mm}}{2 \times \tan{\frac{\text{fov}}{2}}}
\f]
This latter conversion can be useful when using a rendering software to mimic a physical camera device.
<B> Additional references, notes </B><br>
@note
- Many functions in this module take a camera intrinsic matrix as an input parameter. Although all
functions assume the same structure of this parameter, they may name it differently. The