Grayscale Correlation based 3D Model Fitting for Occupant Head Detection and Tracking

Occupants inside the vehicle can be deadly injured by the deployment of airbag at the time of crash. New collision safety technology requires classifying the occupant and tracking their position in real-time in order to adaptively deploy the air bag. This paper presents a fast 3D model fitting algorithm based on grayscale correlation of stereo disparity data, to detect and track occupant head position. The proposed system uses stereo vision with IR illumination for depth data acquisition. By detecting body center line and extra-near disparity calculation, this method is proven to be robust and accurate in variant lighting condition and occupant movement. Evaluation of the method shows over 98% correct head detection and near 100% correctness with head tracking.


Introduction
With the development of collision safety technology in recent years, delicate control of air bag deployment which adaptively deploys the airbag depending on occupants' body shape, weight and position, has being intensively studied during past few years.The main purpose of the smart air bag system is to deal with the threat that occupants may be seriously injured by the deployment of an air bag at the time of crash if the occupant is too near to the airbag.The National Highway Traffic Safety Administration (NHTSA) (Federal Motor Vehicle Safety Standards, 2000) specifies different classes for the occupancy including infants in rear facing infant seats, children and small adults, and out-of-position zones for the human occupants, on which the air bag deployment has to be controlled.Research on detecting the type and position of occupant can be divided into 3 main categories based on different sensing technologies: I) Weight sensors on the seat measure the pressure distribution and classify the occupant into different types (Kennedy 2006, Lasten 2006); II) Electric-magnetic or ultrasound sensors that detect the change in the electricmagnetic field to confirm occupant type and position (Seip 1999); III) Computer vision sensors that directly detect occupant head and body position with 2D or 3D information, and classify the occupants (Trivedi, 2002(Trivedi, -2005)).Category I and II are the most popular sensors in the market in current stage of air bag control, which requires a reliable classification of adults, children and rear-faced child seats.However they are not adaptable for precisely detection of occupant position and posture, which is vital to the delicate control of air bag deployment.
Vision sensor provides the richest information of occupant position and posture.Depending on the number of cameras used, these studies can be further divided into two categories: monocular camera based methods and stereo vision based methods.Monocular camera always employs edge, contour and other image features to detect ellipse-liked shapes for head detection.By combining with the infrared detector, single camera solution can also obtain satisfied result in some well-controlled environment.However, it suffers from strong shadows, hot weather and insufficient 3D information which is necessary for functions such as the out-of-position detection.Stereo vision based methods use two co-planar cameras to calculate the disparity data and detect occupant head position and posture.Many algorithms employ the general 3D model fitting method to detect the ellipsoid-like 3D shape from a range image obtained from the stereo rig.M. Trivedi (Trivedi 2002(Trivedi -2005) ) uses shape and size constraints to eliminate search regions for less computation purpose, which may Stereo Vision 92 have serious side-effects that the head region can be also eliminated when it appears relative smaller than other ellipsoid-like shapes such as waving arms and shoulders.B.Alefs (Alefs, 2004) uses depth data to recovery the occupant body surface and edge data to generate head candidate.Head recognition was carried out with a large trained dataset.To achieve real-time performance while keeping high accuracy of occupant head detection, this paper presents a fast 3D parametric model fitting algorithm based on grayscale correlation of range data.Comparing with the traditional 3D parametric model fitting algorithms, this method simplifies the problem of searching 3D model from depth image into 2D grayscale correlation problem, which simultaneously determine all parameters with the best fitting model.By applying the proposed algorithm into occupant head detection application, this paper also proposes a body centerline segmentation method as well as a multi-resolution disparity generation algorithm in order to deal with body occlusion and extra-near disparity calculation problems.In the remainder of this chapter we will present a brief overview of traditional 3D parametric model fitting and our new approach based on grayscale correlation of range image (Section 2), a detail implementation of our approach (Section 3), and experimental results in the purview of an occupant head detection system (Section 4).

3D parametric model fitting algorithm 2.1 Problem description
Given an image frame (e.g.range image or edge image), the 3D parametric model fitting problem is to find the 3D parameters (e.g.3D position and orientation, scale factor, intrinsic parameters, etc.) of the model.Figure 1 shows an example of finding an ellipsoid in a range image.The total number of ellipsoid 3D parameters is 9 including 3 rotation and 3 translation parameters, as well as 3 scaling factors along X, Y and Z-axis., 1991) for generic 3D parametric model fitting.Image formation is modeled as a mapping of a 3D model into the image.Although the inverse mapping is non-linear due to the trigonometric functions of perspective projection, the resulting image changes smoothly as the parameters are changed.Therefore, local linearity can be assumed and several iterative methods can be employed for solving non-linear equations (e.g.Newton's method).Upon finding the solution for one frame, the parameters are used as the initial values for the next frame and the fitting procedure is repeated.The traditional approach can be extremely time consuming and is not adaptive to the real-time required occupant head detection application.

www.intechopen.com
Grayscale Correlation based 3D Model Fitting for Occupant Head Detection and Tracking 93

Our algorithm
With the assumption of local linearity, we can prepare a lookup table of possible combination of all parameters except 3D position (X, Y, Z), which will be determined by the later process of grayscale correlation.Rotation and scale parameters are used to generate the LUT in the case of ellipsoid detection.To simplify the process, only certain combination of rotation and scale parameters are adopted by the constraint of occupant physical position and posture.Here, 3 rotation angles {0, +45°, -45°} along X and Z-axis are combined with 3 different ellipsoid shapes.Scale factors are defined by the possible movement range of the head.Equation of 3D ellipsoid is shown as follows: where are scale parameters and are the 3D world coordinates of ellipsoid center.Perspective projection equation ( 2) is adapted to project 3D ellipsoid surface points to the 2D image coordinates.Figure 2 shows some examples of models generated from the parameter LUT.

System implementation details
The system is designed as a co-planar stereo camera with constructive infrared illumination light source.The stereo rig is mounted on the center roof console near the back mirror.Generally it should have few centimeters baseline and wide-angle lens that can overview the whole passenger's cabinet.

Constructive illumination lighting system
A fast stereo algorithm (Konolige, 1997) is adapted to generate disparity map with two synchronized video source input at 30 frames per second.To overcome the uneven illumination and shadow problem for real outdoor environment, an infrared pulsed illumination lighting system is installed, combining with band-pass filtered lens to cutoff all un-necessary wavelength light.A disadvantage of block matching based dense disparity algorithm is the aperture problem.The aperture problem arises as a consequence of the ambiguity of one-dimensional intensity on left and right image through out the horizontal Epipolar line.No disparity data can be derived for an even intensity region like dark or over lighted regions.We tested different kinds of light patterns, and the cross pattern of light-dark-light with an angle of ±45 degree showed the best performance.Figure 3 shows an example of disparity map result without/with constructive light.

Background subtraction
To eliminate passenger's seat, door and other interior regions from the range image, background subtraction is carried out for every new frame.Background range image were generated as an average of 30 frames' range image for the empty seat.Automatic background generation will be further implemented according to the sensors' output of seat lateral position and reclining angle.
Post-processing includes binarizing, morphological process, and blob analysis.The biggest blob that satisfies the position and area constraints will be extracted as occupant body's candidate region.Figure 4 shows the background subtraction results.

Composition of multi-resolution disparity maps for near distance disparity
Fast stereo processing algorithms (Konolige, 1997) always use a fixed maximum disparity value to accelerate the matching process.For example, a maximum disparity of 32 pixels leads to the maximum searching distance of 32 pixels.Disparities over the maximum disparity will be omitted.
According to the basic equation of stereo disparity shown in ( 6), the maximum distance leads to the minimum detection distance, as the baseline b and lens focal f is unchanged.
Figure 5 shows an example of extra-near distance target that cannot obtain disparity data.
To enlarge the disparity range for extra-near target detection, we propose a composition algorithm of multiple resolution disparity maps.A lower resolution stereo image pair will generate a wider detection range disparity map since its pixel size is bigger than the general resolution image pair.Figure 6 shows the composition result of disparity maps generated from 160x120 and 320x240 stereo images.Step 3. If the slope angle of line C1C2 is less than threshold k (the occupant is in the normal seating position), then we can simply vertically cut off the regions that are further than a predefined distance to C1C2's middle point C3.An example is shown in Figure 8(b).
Step 4. If the slope angle is larger than threshold k (the occupant is in the leaning position), the cut-off lines will be parallel to line C1C2, while keeping the predefined distances.
Step 5. Segmented foreground region will be further filtered by the constraint of disparity.
The ideal disparity data on each row i can be calculated through the following linear interpolation equation.
This constraint will eliminate most of the outliers and other objects in front of the body ROI.An example is shown in Figure 8(c).Step 6.The result image will be further normalized for the 3D model fitting process described in Section II.

Experimental results
The proposed algorithm was tested under various sizes of passengers and different postures that occupants may behave during the normal driving situations.The stereo vision system was equipped with two gen-locked CCD cameras.Stereo images were captured by a Matrox Meteor2/MC frame grabber board and all processing was done by a Pentium IV 2.66GHz PC.The stereo baseline is 64 mm, and the lens focal is 2.8 mm.320x240 disparity maps were generated at the speed of 25 ms/frame with the maximum disparity of 32 pixels.Totally 16 adults testers including 12 males and 4 females were chosen for the test.With their height distributed from 153cm to 183cm and weight distributed from 50kg to 80kg, the testers were supposed to cover the main range of adult passenger sizes.They were asked to perform all kinds of postures that could be happened during the real driving situations, like readings, waving arms, drinking, etc.Each test was continuously captured for 1500 frames.
Table 1 shows the test results of different situations.Tester 1 to 8, who were sitting straightly in the normal position, showed the best performance near 100% correct detection rate.Tester 9 to 16, who were asked to perform different kinds of movement and postures, still showed a very high detection rate about 97.5%.The overall correct detection rate is 98.7%.Some very difficult situations like partially occluded target, extra-near target and multiple ambiguities were also correctly detected.Figure 9 shows some examples.
False detection (<0.2%) were happened under the situations of occlusion and head was not detected (<1.2%) mostly due to the situation that occupant was out of position.Figure 10 shows some false examples.Tracking of head position in both intensity image and disparity map will largely help to locate the head position even for fully occlusion case.Tracking can also reduce searching area by predicating head position.Some preliminary tests were carried out and showed very satisfied results.

Conclusion
Occupant head detection is sensitive to the variation of illumination, occupant posture and body size.To achieve real-time performance while keeping a high accuracy of occupant head detection, this paper presents a fast 3D parametric model fitting algorithm base on grayscale correlation of range data.Evaluation of the method shows over 98% correct head detection.Combining with head tracking algorithm on intensity image and disparity map, the proposed algorithm will perform near 100% correct detection.

Figure 1 .
Figure 1.Searching a 3D model in a range image Research on 3D model fitting leveraged earlier work done in(Lowe, 1991)  for generic 3D parametric model fitting.Image formation is modeled as a mapping of a 3D model into the image.Although the inverse mapping is non-linear due to the trigonometric functions of perspective projection, the resulting image changes smoothly as the parameters are changed.Therefore, local linearity can be assumed and several iterative methods can be employed for solving non-linear equations (e.g.Newton's method).Upon finding the solution for one frame, the parameters are used as the initial values for the next frame and the fitting procedure is repeated.The traditional approach can be extremely time consuming and is not adaptive to the real-time required occupant head detection application.
where camera's intrinsic parameters like lens focal are obtained from some preprocessing steps like camera calibration.Rotation matrix elements 33 11 ~R R are retrieved from the parameter LUT.To match with the disparity image, we use the following normalization equation to convert range data into intensity value.

Z
is the minimum distance from ellipsoid surface points to camera.N is the bitvalue of intensity image.

Fig. 2 .
Fig. 2. Parametric models of range data and the maximum disparity value in the region.N is the bit-value of disparity map.Result of our grayscale correlation algorithm presents not only the position but also the best model, which indicates the rotation and scale parameters simultaneously.
Fig. 3. Disparity map result without/with constructive light

Fig. 7 .
Fig. 7. Examples of ellipsoid-liked objects extracted from body ROI Assuming passenger is always sitting on the seat, so that the lower part of body's ROI is relatively stable and can be used as the reference part to segment the ROI.Detail steps are shown as follows: Step 1.After the preprocessing steps described in the above sections, calculate the so-called horizontal median points on each row of the binary ROI image based on Russakoff's algorithm.Step 2. Detect the upper center position C1 and lower center position C2 along the body center line, where C1 and C2 are on the rows of 1/5 and 4/5 of the ROI height respectively as shown in Figure 8(a).

Fig. 10 .
Fig. 10.Falsely Detected Examples False detection and miss detection generally happen within very short period of time.Tracking of head position in both intensity image and disparity map will largely help to locate the head position even for fully occlusion case.Tracking can also reduce searching area by predicating head position.Some preliminary tests were carried out and showed very satisfied results.

Table 1 .
Test results of different situations www.intechopen.com