[Depth Processing] Chapter VBI-16.   Depth Processing


This chapter does not appear in the book.


One of the claims to fame of the Kinect sensor is its depth processing capabilities, including the generation of depth maps. It's possible to implement similar functionality on a PC with two ordinary webcams, (after they've been calibrated). The picture on the right shows the left and right images from the cameras being rectified, using the calibration information to undistort and align the pictures. Those images are then transformed into a grayscale disparity map, 3D point cloud, and an anaglyph picture.

The disparity map indicates that the user's coffee cup is closest to the cameras since it's colored white, the user is a light gray and so a bit further away, and the background is a darker gray.

It's possible to click on the disparity map, to retrieve depth information (in millimeters). The next image shows the complete DepthViewer GUI, with a red dot and number marked on the map stating that the coffee cup is 614 mm away from the camera.

[Depth Info in GUI PIC]

Unfortunately, this information isn't particularly accurate (the actual distance is nearer 900 mm) due to reasons explained later.

A point cloud is a 3D representation of the depth information, stored in the popular PLY data format, which allows it to be loaded (and manipulated) by various 3D tools. The image below shows two screenshots of the point cloud of at the top of this page loaded into MeshLab.

[Point Cloud in MeshLab PIC]

The image on the right shows the point cloud rotated to the left so that the z-axis (the depth information) is more visible.

The anaglyph in the image at the top of this page is created by encoding the left and right rectified images using red and cyan filters and merging them into a single picture. The 3D effect becomes clear when the image is viewed through color-coded anaglyph glasses. An enlarged version of the anaglyph appears in the image below, along with an example of suitable glasses.

[Anaglyph and Glasses PIC]

The quality of the disparity map, point cloud, and anaglyph depend on the undistortion and rectification mapping carried out on the left and right input images. This mapping is generated during an earlier calibration phase, when a large series of paired images are processed by DepthViewer. These image pairs are collected using a separate application, called SnapPics, that deals with the two webcams independently of the complex tasks involved in depth processing.

The calibration technique supported by OpenCV requires the user to hold a chessboard picture. The next image shows one of the calibration image pairs.

[A Calibration Image Pair PIC]

In summary, depth processing consists of three stages:

  1. Multiple image pairs are snapped using the SnapPics application. At least 20 image pairs are needed, and usually a lot more, in order for the calibration process in stage 2 to produce decent results.
  2. The calibration phase performed by DepthViewer analyses the image pairs (all showing the user holding a chessboard in various poses). The result is a collection of undistortion and rectification matrices that are employed in stage 3.
  3. The depth processing phase, illustrated by Figure 1, converts a specific image pair into a disparity map, point cloud PLY file, and an anaglyph. At this stage, it's no longer necessary for the user to be holding a chessboard in the images.

I'll explain these stages in more detail during the course of this chapter. For more information on the underlying maths, I recommend chapters 11 and 12 of Learning OpenCV by Gary Bradski and Adrian Kaehler, O'Reilly 2008.




Dr. Andrew Davison
E-mail: ad@fivedots.coe.psu.ac.th
Back to my home page