Development of Pilot Assistance System with Stereo Vision for Robot Manipulation

The rescue robot "T-52 Enryu" has two arms to do various work and this robot can be operated directly or remotely. Particularly, for the remote control of this robot, the device with which a pilot can easily understand the position of the object and the robot in the working environment is necessary to help him and to improve the efficiency of work. Therefore, we developed a pilot assistance system that helps the pilot of Enryu by displaying the relation of the position between the grippers of arms and the work object in the touch panel monitor. The system tracks the points on the target objects specified by the pilot, and displays the distances to them at the same time. In this paper, we introduce the developed pilot assistance system, and the results of installing the system into the robot are described


Parallel stereo camera system
Two kinds of camera systems and those applications were developed. One is a suite of fixed parallel cameras and a 3D motion sensor ( Fig. 3(a)), and another is a suite of parallel cameras and a camera mount for adjusting the installation angles and positions of them ( Fig.  3 (b)). The former called " A-system" was composed to verify what visual application is effective to support of the pilot; the latter called " B-system" was developed to examine installation position and effective calibration method on the actual robot arm.
(a) A-system (b) B-system We selected the inexpensive USB cameras (QV-400R, USB CCD camera) and motion sensor in order to simplify the replacement of them even if breaking down by severe work in outdoor environment. This suite of camera system is mounted on a joint of gripper as shown in F ig. 4 (a) and (b), and they are connected to a laptop PC installed in the robot. The obtained images are transmitted to a remote control cockpit via wireless LAN for supplying the information around the right gripper and the image processing results (Fig. 5).
(a) (b) Fig. 4. Installation of the system: (a) the system mounted on the right gripper of T-52; (b) the system mounted on the right gripper of T-53. www.intechopen.com

Camera model
The USB cameras are modeled by using the basic pinhole camera model (e.g. Tsai et al. 1987;Hartley et al. 2003) and the geometry is shown in Fig. 6 and [ ] , 00 T uv i s t h e ca m e r a ce n t e r . T h e n u m be r of pi x e l s pe r u n i t di s t a n ce i n i m a ge coordinates are x m and y m in the x and y directions, f is focal length, and α = x fm and y fm β = represent the focal length of the camera in terms of pixel dimensions in x and y directions. Moreover the parameter θ is the angle between the vertical and horizontal axes in the image plane, namely it is referred to as the skew parameter.
The projection matrix E is including 11 independent variables. In general, the solution of p in Eqn. (7) is obtained as an eigenvector corresponds to the minimum eigen value which derived by singular value decomposition of the covariance matrix T EE . Here, the elements of p are normalized to be 34 1 p = . Moreover,

measurement of 3D position
Next, let the projected points of w ∈ Σ X on the left and the right camera image planes be . Moreover, let left P and right P be projection matrices, the following equations are derived, When the above mentioned expression are brought together about the variable P , the following expression is obtained.
The least squares solution X of Eqn. (12) is solved by the pseudo-inverse matrix + B of B as follows,

procedure of actual camera calibration
The distance measurement with enough high accuracy is necessary in the range of motion of the arm. Namely, the important measurement distances are as follows: about 1500 [mm] for rough scene understanding; about 1000 [mm] for searching of grasping point on the target object; about 500 [mm] for judgment to hold the target object by the gripper. Therefore, we developed a calibration rig where 30 target patterns were arranged as shown in Fig. 7. This calibration rig is set up to become a constant position to the installed cameras on the robot arm by using several tripods, and when the robot starts, it is used according to the following procedures if calibration is necessary. A comparatively high accuracy calibration can be achieved in the depth direction by using this calibration rig. The camera calibration by using the rig might be executed in outdoor; therefore the detection of the target patterns by steady accuracy in the various outdoor environments is necessary. In this research, we employ the LoG (Laplacian of Gaussian) filter (Marr, 1982) to search for the position of the target patterns from camera images including noise, brightness www.intechopen.com change, geometric transformation, etc. with stability and fast computing. It is well known that the LoG filter is isotropic band pass filter capable of reducing low and high frequency noise and completely removing both constant and linear-varying intensity. The LoG filter is represented by (* is left or right) represents a unit vector on the projection plane.
Namely, this filter eliminates high frequency noise and local brightness change by Gaussian, and eliminates low frequency brightness change by Laplacian. Fig. 8 shows the example of applying this filter to the image of the calibration rig. The target patterns are detected by the subpixel matching with template image after this filtering is applied in each, and the camera calibration is executed by the above mentioned procedure. The installation position and angles of the cameras were decided by the trial and error in consideration of the structure of the gripper of T-52, and then, the calibration accuracy according to the procedure is shown in Fig. 9. It is understand from this figure that the error www.intechopen.com of the calibration was within 100 [mm] in the range of motion of the arm, and this camera calibration procedure was able to decrease the error of the target pattern extraction and to achieve the highly accurate calibration under the outdoor environment. Here, these errors include the setting error of the rig. To cancel these errors, we are developing now the method of extraction of characteristic feature points of the gripper for automatic calibration. The camera parameters obtained by calibrations above are used to calculate the distance to the object in the following camera systems. Fig. 9. Calibration accuracy vs. measurement distance.

System interfaces
Here, two kinds of developed interfaces are described: one is a monitor system that provides the sensor information efficiently for the pilot, and another is an interaction display system that tracks and displays the distances of the places that are set by the pilot. The image processing algorithms of them are different; the former is designed to present overall information of surroundings: the latter is designed to provide specific information desired by the pilot and to execute the processing in video rate. In the following, these systems are explained respectively.

Display of overall information on scene
The pilot understands a situation around the robot from information of the installed various sensors and operates it remotely. Although it is desirable that a lot of information is presented to the pilot, on the other hand, it is desirable for the system to be intuitive on the presentation of information, and to be designed not to accumulate tiredness in the pilot for efficiency operation. Therefore, we developed an intuitive interface system that indicates posture of the arm to the ground and distance surroundings of a target object. Fig. 10 shows an example of a monitor image presented to the pilot in the remote place by this system.

Gravity indicator and disparity image
A gravity indicator that shows the direction of ground in the camera system was arranged at the upper part on the screen, and a horizontal indicator was arranged at the center part on the screen. These are displayed based on information obtained with the motion sensor: the rotation angles to the world coordinate system are calculated by integrating the acceleration x-y-z axes in the camera coordinate system, and they are re-displayed in the video rate, i.e. 30 [fps]. Moreover, a disparity image generated by a dynamic programming method (Birchfield et al. 1988) is displayed at the lower right of the screen, and the distance from the object is shown at the center part. Since the computational complexity of these processing based on the stereo vision is comparatively high, these indicators are re-displayed in the rate of 3-5 [fps]. The pilot can acquire various information by these displays without greatly changing the glance on the monitor. Examples of transition of display according to motion of the camera system are shown in Fig. 11. We see from Fig. 11 that the disparity images had been including the noise caused by matching error of the dynamic programming, and the property of the algorithm will be considered in the following.

Generation of disparity image in outdoor environments
The Birchfield's method is an algorithm to generate the disparity image efficiently by using a rectified parallel stereo camera (i.e. the image planes of two cameras are coplanar and their scanlines are parallel to the baseline). Moreover, the adjustments of five parameters, namely maximum disparity, occlusion penalty, match reward, reliability threshold, and reliability buffer factor are required for the execution of this algorithm. It is necessary to adjust these parameters in consideration of distance from target object, complexity of the scene, etc. by trial and error method after the installation of the camera system. Furthermore, the synchronization of the white balance and the shutter timing and the stable brightness images of the scene are needed for execution of this algorithm. The experimental results of this algorithm on some conditions are shown in Fig. 12. Although the disparity image was generable efficiently when a steady brightness images were able to be acquired in the indoor environment, much noise was caused in the disparity images generated from the images with "black outs or white outs" occurred due to shortages of dynamic range or with complex background. Namely, it has been understood that the condition concerning the environment might be limited for the use though this system efficiently provides the pilot various information.
(a) (b) (c) (d) Fig. 11. Example of transition of display according to motion of the camera system.

Target tracker and distance indicator
Next we aimed to construct an interface executable at high speed and robust against the change of the environmental condition. The outline of an operating flow of the developed system is as follows. First of all, the pilot touches the monitor to specify the target points for tracking, and small windows (e.g. 99 × pixels) are generated around the set positions on the monitor. Next, the tracking and the distance measurement of the set points are executed at once, and the results of them are reflected on the monitor in real time and the distance is displayed on the window frame. When the pilot touches the tracking window again, the set point is released. Moreover, the color of the window frame shows the following meanings: " green color" represents the success of the tracking and the measurement; " blue color" represents that the tracking point exists in the distance where it can be grasped by the gripper, i.e. the point exists near less than 1 [m]; " yellow color" represents the failure of the stereo matching and the displayed distance is untrustworthy. As mentioned above, this system can be used by an easy operation of the touch panel. The procedure of the processing of this system is described as follows.

Detection algorithm
Considering of the following items is necessary for the tracking and the distance measurement of specified points under the outdoor environment. Achromatic color and complex scenes at the disaster scene. Therefore, we developed an algorithm for simultaneous execution of the tracking and the stereo matching by using the Harris corner detector (Harris et al. 1988) and a fast feature tracking algorithm (Bouguet et al. 2000). The Bouguet's algorithm is a pyramidal implementation of the Lucas Kanade feature tracker description (Lucas et al. 1981), and it generates the optical flow between two images for the given set of points with sub-pixel accuracy. The proposed image processing algorithm is executed according to the following procedures.

Flow of target tracking and measurement
Let I t and J t be images captured by the left and the right camera at time t, and let I t (u) (u ≡ [x,y] T ) be a pixel on I t .
(e) (f) Fig. 14. Execution results of the proposed algorithm for simultaneous execution of the tracking and the stereo matching. Two small windows at the center part of images are shown the tracking windows, and the windows at the bottom of the images are shown parameters of the tracking points. The target points were not lost though the camera system was intensely shaken for five seconds. Although the stereo matching failed on two frames (the color of target window changed to yellow as shown in (d)), the distance measurement succeeded in almost all frames.

Experiment with robot
The result of the experiment conducted by installing the developed interface in the robot is shown in Fig. 15. These images were captured by the left camera of the B-system mounted on the joint of the gripper of T-52, and a rectangular lumber was set in front of the gripper. The tracking windows at the upper part of the image were specified by the pilot and a figure in the window shows the order of specifying the feature point (first and second points had www.intechopen.com been canceled). Moreover, the figure at upper part of the tracking window shows the measured distance, and the lower squares of the images and its inner figures show the order of the feature points and 3D position in a robot arm coordinate system, respectively. The color of the window changes according to the distance and the state of measurement as mentioned in 4.2. It was found from the experiment that the interface was able to provide the pilot with the measurement distance of the tracking points with stability, and the operation to grasp the rectangular lumber was executed smoothly.
(e) (f) Fig. 15. Results of an experiment of an operation to grasp a rectangular lumber by using Bsystem installed in the joint of gripper of T-52. These images were captured by the left camera. www.intechopen.com

Conclusion
In this research, we developed two kinds of pilot assistance systems for T-52 and T-53 working in a disaster district. One is an interface that provides the information from several sensors efficiently for the pilot. Another is an interface that tracks and displays the distances of the points that are set by the pilot. This system can tell the pilot the distance by the change in the color that understands intuitively easily, and the processing speed is also fast enough. The pilot switches the two kinds of proposed interfaces according to the situation. The former is suitable for searching of a target object by the camera installed in the robot arm, and the latter is suitable for the operation that grasps a specific position of the target object by the arm. Then, the basic experiments for the construction of each system were conducted, and the effectiveness of them was confirmed. From now on, a further various tests and examinations will be necessary to use of these system and interface efficiently in the disaster scene. Moreover, more interface development research according to disaster scene to achieve the many kinds of work by the rescue robot will be necessary. We are continuing the research for the solution of these research topics now, furthermore have started the development for automatic work system of the robots.