Appendix E
CAMERA CALIBRATION

   

As described in Section 11.2, a digital image is a discrete array of gray level values. The objective of camera calibration is to determine all of the parameters that are necessary to relate the pixel coordinates (r, c) to the world coordinates (x, y, z) of a point in the camera’s field of view. In other words, given the coordinates of a point P relative to the world coordinate frame, after we have calibrated the camera we will be able to compute (r, c), the image pixel coordinates for the projection of this point.

Camera calibration is a ubiquitous problem in computer vision. Numerous solution methods have been developed, many of which have been implemented in open-source software libraries (e.g., OpenCV [17] and Matlab’s Computer Vision Toolbox [26]). Here, we present an approach that is conceptually straightforward and relatively easy to implement.

E.1 The Image Plane and the Sensor Array

In order to relate digital images to the 3D world, we must first determine the relationship between the image-plane coordinates (u, v) and the pixel coordinates (r, c). We typically define the origin of the pixel array to be located at a corner of the image rather than at the center of the image. Let the pixel array coordinates of the pixel that contains the principal point be given by (or, oc). In general, the sensing elements in the camera will not be of unit size, nor will they necessarily be square. Denote by sx and sy the horizontal and vertical dimensions, respectively, of a pixel. Finally, it is often the case that the horizontal and vertical axes of the pixel array coordinate system point in opposite directions from the horizontal and vertical axes of the camera coordinate frame.1 Combining these, we obtain the following relationship between image-plane coordinates and pixel array coordinates

(E.1)numbered Display Equation

Note that the coordinates (r, c) will be integers, since they are the discrete indices into an array that is stored in computer memory. Therefore, this relationship is only an approximation. In practice, the value of (r, c) can be obtained by truncating or rounding the ratio on the left-hand side of these equations.

E.2 Extrinsic Camera Parameters

In Section 11.2.1, we considered only the case in which coordinates are expressed relative to the camera frame. In typical robotics applications, tasks are expressed in terms of the world coordinate frame. If we know the position and orientation of the camera frame relative to the world coordinate frame (i.e., if we know and , respectively), we can write

numbered Display Equation

or, if we know and wish to solve for ,

numbered Display Equation

In the remainder of this section, to simplify notation, we will define

numbered Display Equation

and we write

numbered Display Equation

Together, R and T are called the extrinsic camera parameters.

Cameras are typically mounted on tripods or on mechanical positioning units. In the latter case, a popular configuration is the pan/tilt head. A pan/tilt head has two degrees of freedom: a rotation about the world z-axis and a rotation about the pan/tilt head’s x-axis. These two degrees of freedom are analogous to those of a human head, which can easily look up or down, and can turn from side to side. In this case, the rotation matrix is given by

numbered Display Equation

where θ is the pan angle and α is the tilt angle. More precisely, θ is the angle between the world x-axis and the camera x-axis, about the world z-axis, while α is the angle between the world z-axis and the camera z-axis, about the camera x-axis.

E.3 Intrinsic Camera Parameters

The mapping from 3D world coordinates to pixel coordinates is obtained by combining Equations (11.4) and (E.1) to obtain

(E.2)numbered Display Equation

Thus, once we know the values of the parameters λ, sx, or, sy, oc we can determine (r, c) from (x, y, z), where (x, y, z) are coordinates relative to the camera frame. In fact, we don’t need to know all of λ, sx, sy; it is sufficient to know the ratios

numbered Display Equation

These parameters fx, or, fy, oc are known as the intrinsic camera parameters. They are constant for a given camera and do not change when the camera moves.

E.4 Determining the Camera Parameters

Of all the camera parameters, or, oc (the image pixel coordinates of the principal point) are the easiest to determine. This can be done by using the idea of vanishing points, which was introduced in Example 11.2. The vanishing points for three mutually orthogonal sets of parallel lines in the world will define a triangle in the image. The orthocenter of this triangle (that is, the point at which the three altitudes intersect) is the image principal point. Thus, a simple way to compute the principal point is to position a cube in the workspace, find the edges of the cube in the image (this will produce the three sets of mutually orthogonal parallel lines), compute the intersections of the image lines that correspond to each set of parallel lines in the world (this will produce three points in the image), and determine the orthocenter for the resulting triangle.

To determine the remaining camera parameters, we construct a system of equations in terms of the known coordinates of points in the world and the pixel coordinates of their projections in the image. The unknowns in this system are the camera parameters. The first step is to acquire a data set of the form {ri, ci, xi, yi, zi} for i = 1⋅⋅⋅N, in which ri, ci are the image pixel coordinates of the projection of a point in the world with coordinates xi, yi, zi relative to the world coordinate frame. This acquisition is often done manually, for example, by placing a small bright light at known (x, y, z) coordinates in the world and then hand selecting the corresponding image point.

Once we have acquired the data set, we proceed to set up a linear system of equations. The extrinsic parameters of the camera are given by

numbered Display Equation

With respect to the camera frame, the coordinates of a point in the world are thus given by

numbered Display Equation

Combining these three equations with Equation (E.2) we obtain

(E.3)numbered Display Equation

(E.4)numbered Display Equation

Since we know the coordinates of the principal point, we can simplify these equations by using the coordinate transformation

numbered Display Equation

We now write the two transformed projection equations as functions of the unknown variables rij, Tx, Ty, Tz, fx, fy. This is done by solving Equations (E.3) and (E.4) for zc, and setting the resulting equations to be equal to one another. In particular, ri, ci, xi, yi, zi we have

numbered Display Equation

Defining α = fx/fy, we can rewrite this as

numbered Display Equation

We can combine the N such equations into the matrix equation

(E.5)numbered Display Equation

in which

numbered Display Equation

and

numbered Display Equation

If is a solution for Equation (E.5) we only know that this solution is some scalar multiple of the desired solution x, namely,

numbered Display Equation

in which k is an unknown scale factor. In order to solve for the true values of the camera parameters, we can exploit constraints that arise from the fact that is a rotation matrix. In particular,

numbered Display Equation

and likewise

numbered Display Equation

Note that by definition, α > 0.

Our next task is to determine the sign of k. Using Equation (E.2) we see that rxc < 0 (recall that we have used the coordinate transformation rror). Therefore, we choose k such that r(r11x + r12y + r13z + Tx) < 0.

At this point we know the values for k, α, r21, r22, r23, r11, r12, r13, Tx, Ty, and all that remains is to determine Tz, fx, fy, since the third column of R can be determined as the vector cross product of its first two columns. Since α = fx/fy, we need only determine Tz and fx. Returning again to the projection equations, we can write

numbered Display Equation

Using an approach similar to that used above to solve for the first eight parameters, we can write this as the linear system

numbered Display Equation

which can easily be solved for Tz and fx.

Note

  1. 1 This is an artifact of our choice to place the center of projection behind the image plane. The directions of the pixel array axes may vary, depending on the particular software drivers used to acquire digital images from the camera.