Bearing-only Simultaneous Localization and Mapping for Vision-Based Mobile Robots

To navigate successfully, a mobile robot must be able to estimate the spatial relationships of the objects of interest accurately. A SLAM (Simultaneous Localization and Mapping) system employs its sensor data to build incrementally a map of an unknown environment and to localize itself in the map simultaneously. Thanks to recent advances in computer vision and cheaper cameras, vision sensors have become popular to solve SLAM problems (Bailey, 2003; Costa et al., 2004; Davison et al., 2004; Goncavles et al., 2005; Jensfelt et al., 2006; Mouragnon et al., 2006). The proposed bearing-only SLAM system requires only a single camera which is simple and affordable for the navigation of domestic robots such as autonomous lawn-mowers and vacuum cleaners. Solutions to SLAM problems when the mobile robot is equipped with a sensor that provides both range and bearing measurements to landmarks are well developed (Leonard & Durrant-Whyte, 1991; Zunino & Christensen, 2001; Spero, 2005; Bailey & Durrant-Whyte, 2006). With a single camera, landmark bearings can be derived relatively easily from a grabbed image, however it is much more difficult to obtain accurate range estimates. Due to the low confidence in range estimates from vision data, it is desirable to solve SLAM problems with bearing only measurements. One of the fundamental tasks of a SLAM system is the estimation of the landmark positions in an unknown environment. This task is called Landmark Initialization. A typical bearingonly SLAM system requires multiple observations for landmark initialization through triangulation. With only one observation, a stereo vision can provide range measurements because its multiple cameras grab images from slightly different viewpoints. However the reliable vision range in a stereo vision is limited due to the distance between the two cameras. Several observations at different locations are required to provide a robust range estimate. Structure From Motion (SFM) is a process to construct the map of an environment with the video input from a moving camera. SFM allows a single camera to grab images at some vantage points for landmark initialization, such as a sufficient baseline and a straight movement. The requirement of SFM is well satisfied with a mobile robot, some recent works had utilized SFM to bearing-only SLAM (Goncavles et al., 2005; Jensfelt et al., 2006). Our method to bearing-only SLAM is inspired from the techniques used in both stereo vision and SFM.


Introduction
To navigate successfully, a mobile robot must be able to estimate the spatial relationships of the objects of interest accurately. A SLAM (Simultaneous Localization and Mapping) system employs its sensor data to build incrementally a map of an unknown environment and to localize itself in the map simultaneously. Thanks to recent advances in computer vision and cheaper cameras, vision sensors have become popular to solve SLAM problems (Bailey, 2003;Costa et al., 2004;Davison et al., 2004;Goncavles et al., 2005;Jensfelt et al., 2006;Mouragnon et al., 2006). The proposed bearing-only SLAM system requires only a single camera which is simple and affordable for the navigation of domestic robots such as autonomous lawn-mowers and vacuum cleaners. Solutions to SLAM problems when the mobile robot is equipped with a sensor that provides both range and bearing measurements to landmarks are well developed (Leonard & Durrant-Whyte, 1991;Zunino & Christensen, 2001;Spero, 2005;Bailey & Durrant-Whyte, 2006). With a single camera, landmark bearings can be derived relatively easily from a grabbed image, however it is much more difficult to obtain accurate range estimates. Due to the low confidence in range estimates from vision data, it is desirable to solve SLAM problems with bearing only measurements. One of the fundamental tasks of a SLAM system is the estimation of the landmark positions in an unknown environment. This task is called Landmark Initialization. A typical bearingonly SLAM system requires multiple observations for landmark initialization through triangulation. With only one observation, a stereo vision can provide range measurements because its multiple cameras grab images from slightly different viewpoints. However the reliable vision range in a stereo vision is limited due to the distance between the two cameras. Several observations at different locations are required to provide a robust range estimate. Structure From Motion (SFM) is a process to construct the map of an environment with the video input from a moving camera. SFM allows a single camera to grab images at some vantage points for landmark initialization, such as a sufficient baseline and a straight movement. The requirement of SFM is well satisfied with a mobile robot, some recent works had utilized SFM to bearing-only SLAM (Goncavles et al., 2005;Jensfelt et al., 2006). Our method to bearing-only SLAM is inspired from the techniques used in both stereo vision and SFM.
Existing approaches to bearing-only SLAM require the readings from an odometer to estimate the robot locations prior to landmark initialization. It can be argued that such approaches are not strictly bearing-only SLAM as they rely on odometric information. This chapter presents a new 2-dimensional bearing-only SLAM system that relies only on the bearing measurements from a single camera. Our proposed system does not require any other sensors like range sensors or wheel encoders. The trade-off is that it requires the robot to be able to move in a straight line for a short while to initialize the landmarks. Once the landmark positions are estimated, localization becomes easy. The induced map created by our method is only determined up to a scale factor as only bearing information is used (no range or odometry information). All the object coordinates in the map multiplied by a scale factor would not change the bearing values. The structure of this chapter is as follows. First, we introduce a direct localization method using only the bearings extracted from two panoramic views along a linear trajectory. We explain how to induce a Cartesian coordinate system with only two distinguishable landmarks. The method is then extended to landmark initialization with more landmarks in the environment. In general, vision sensors are noisy. Dealing with sensory noise is essential. Two different methods are presented to compute the spatial uncertainty of the objects: 1. A geometric method which computes the uncertainty region of each landmark as the intersection of two vision cones rooted at the observation points. 2. A probabilistic method which computes the PDFs (Probability Density Functions) of the landmark positions. Formulas are derived for computing the PDFs when an initial observation is made. The proposed SLAM system requires only a single camera, an interesting setup for domestic robots due to its low cost. It can be fitted to a wheeled robot as well as a legged robot.

Related work
The term SLAM was first introduced by Leonard and Durrant-Whyte (1991), it refers to Simultaneous Localization and Mapping. SLAM is one of the fundamental tasks in the navigation of an autonomous mobile robot. In robotic navigation, a map is a representation of the spatial relationship between the objects of interest in the environment. A map usually contains the positions of certain objects of interest, such as landmarks and obstacles. The process of a robot to determine its position in a map is called localization. GPS (Global Positioning System) is a popular localization system, in which the map is given for navigation. GPS is well suited for vehicles to navigate in a large scale outdoors environment, for instance, to navigate from city to city. For a domestic robot, however, a GPS is not accurate enough and does not work properly indoors and in some built-up areas. Further, the map of a particular environment may not be always available. A domestic robot cannot localize itself without a map. A SLAM system needs to build incrementally a map while it explores the environment and to determine its location in the map simultaneously. For localization the robot needs to know where the landmarks are, whereas to estimate the landmark positions the robot needs to know its own position with respect to the map. The problem of SLAM is considered as a "chicken-and-egg" problem (Siegwart & Nourbaksh, 2004). To predict the position of the robot, conventional SLAM systems rely on odometry. Unfortunately, the accumulation of odometric errors (due mainly to wheel slippage) makes the accuracy of the position estimates based only on odometry deteriorate rapidly. Updating the estimates with other sensory input is needed if the robot navigates for a long time. Solutions to SLAM can be found if both range and bearing measurements to landmarks are obtained through sensors (Leonard & Durrant-Whyte, 1991;Zunino & Christensen, 2001;Spero, 2005;Bailey & Durrant-Whyte, 2006). Such a sensor reading refers to a Full Observation. A full observation can be achieved by either a single sensor (i.e., a laser range finder) or a combination of two sensors (i.e., a sonar senor and a camera). Range and bearing measurements constitute a full state of the environment. The sensors which observe the full state of the environment (i.e., both range and bearing) are called range-bearing sensors. A full observation is sufficient to form an estimate, such as an uncertainty region, of a landmark position. A typical uncertainty region is a Gaussian distribution over the possible positions of a landmark. Updating an estimate can be achieved by fusing the estimates from the subsequent observations. However, a range-bearing sensor is too expensive for a domestic robot. Solving the SLAM problems with a cheaper sensor is desirable. A sensor reading with either range-only or bearing-only measurement to a landmark is called a Partial Observation. A partial observation is insufficient to completely determine a landmark position. A partial observation generates only a non-Gaussian distribution over an unbounded region for the landmark position (Bailey & Durrant-Whyte, 2006). Multiple observations from several vantage points are required to estimate the landmark position. A sensor reading obtained from a single camera constitutes only a partial observation because it provides bearing measurements but does not provide accurate range measurements. In general, a vision sensor is relatively cheaper than a range-bearing sensor. We wish to solve SLAM problems with bearing-only measurements. Next section reviews related work on vision based navigation for bearing-only SLAM.

Vision based navigation
Vision based navigation for a mobile robot had been investigated since early nineties of last century. Levitt and Lawton (1990) developed a Qualitative Navigator based on vision sensors. This navigator was able to explore the environment and to determine the relative positions of all the objects of interest. In general, an image contains very rich information for mapping the corresponding environment. A certain feature can be recognized through its specific color, shape and size. The frame rate up to 30 Hz from a video camera also enhances to SLAM, in particular to solve the data association problem. Landmark bearings can be derived from a panoramic image taken by an omni-directional vision sensor (for example, a single camera aiming at a catadioptric mirror). A panoramic image offers a 0 360 view of the environment. Because of the robustness of bearing estimates and the complete view of the environment, previous works have utilized omni-directional vision sensors in robotic navigation (Rizzi & Cassinis, 2001;Usher et al., 2003;Menegatti et al., 2004;Huang et al., 2005b). Stereo vision is another option used in robotic navigation. In addition to the bearing information, a stereo vision sensor can also measure the depth to a landmark (Murray & Jennings, 1998;Se et al., 2002;Sabe et al., 2004). A typical stereo vision sensor consists of two cameras, also known as a Binocular Vision. The disparity of the images taken from two slightly different viewpoints determines the landmark's range through triangulation. A Baseline in stereo vision is a line segment connecting the centres of two cameras' lens. Some stereo vision systems consist of three cameras, they are called Trinocular Visions. Common configuration of a trinocular vision is to put three cameras on a right angle polygonal line. A trinocular vision can achieve better results than a binocular vision because the second pair of cameras can resolve situations that are ambiguous to the first pair of cameras (Se et al., 2002;Wooden, 2006). The length of the baseline is essential in stereo vision, because it affects the precision of depth estimation and the exterior design of robotic hardware. QRIO (Sabe et al., 2004), a humanoid robot having a 5cm baseline in its binocular vision. The error of the depth measurement at a distance of 1.5m is over 80mm. The depth estimates of objects with the distances of 2m or more are omitted. LAGR (Wooden, 2006), an outdoor robot equipped with a trinocular vision of Point Grey Bumblebee. A maximum vision range of 6m was reported with a baseline of 12cm. To maximize the vision range of a stereo vision sensor, a longer baseline is required. Based on the mobility of a mobile robot, it is possible to extend the distance of any two viewpoints of a single camera (called a Monocular Vision). If a robot can move straight, the estimation of a landmark range from a monocular vision will be the same as the estimation in a stereo vision. Such approach was first proposed by Huang et al. (2005a). In this paper, a localization method with two observed bearings along a linear trajectory was presented. The method is particularly useful and accurate if the robot can move straight, i.e., the robot's yaw is toward to a specific landmark. In computer vision, Structure From Motion (SFM) refers to the process of building a 3D map of a static environment from the video input from a moving camera. This is very similar to stereo vision where a 3D map is built from two simultaneous images of the same landmark. In both cases, the same landmark is taken into multiple images and the disparities of images are used to compute the landmark location. In stereo vision, the images are taken at different viewpoints simultaneously. In SFM, due to the robot's motion, the images are taken at different viewpoints at different time steps. Visual odometry (Nister et al., 2004) employs SFM to estimate the motion of a stereo head or a single moving camera based on video data. The front end of this system is a feature tracker. Point features are matched between pairs of frames and linked into image trajectories at video rate. SFM presents significant advantages compared with a stereo vision due to the low cost of a monocular vision and the flexible baseline. However, SFM can build a map with respect to a static environment only because of the images are obtained at different time steps. Goncavles et al. (2005) presented a framework to bearing-only SLAM based on SFM from three observations. They utilized a wall corner as the landmark for guiding the robot to move straight. Three images were taken while the robot was moving toward the wall corner. Each image was taken when the robot had moved 20cm approximately. A similar work (Jensfelt et al., 2006) was using N images for landmark initialization, here N is a sufficient number to obtain an accurate estimation. To ensure a proper triangulation, the images were discarded if the robot had not moved more than 3cm (i.e., baseline under 3cm) or turned more than 1 degree (i.e., not a straight movement). Both of the approaches solve the bearing-only SLAM problem using a monocular vision. However, they require an odometer to determine robot's motion. Our method to bearing-only SLAM is similar to SFM with a monocular vision, but does not rely on odometric information.

Dealing with uncertainty
In general, vision sensors are noisy. Dealing with sensory noise is essential in robotic navigation. The uncertainty of an object location can be represented with a PDF (Probability Density Function). When a robot is initially placed in an unknown environment without any prior information, the PDF of the robot position is uniform over the whole environment. Once the robot starts to sense the environment, information gathered through the sensors is used to update the PDF. Smith and Cheeseman (1986) estimated the object locations by linking a series of observations through an approximate transformation. The transformation includes compounding and merging the uncertain spatial relationships from sensor readings. Stroupe et al. (2000) showed how to fuse a sequence of PDFs of 2-dimensional Gaussians estimated from noisy sensor readings. Robustness to sensory noise can be achieved with probabilistic methods such as Extended Kalman Filters (EKF) or Particle Filters (PF). The PF follows the Sampling Importance Resampling (SIR) algorithm, also known as the Monte Carlo Localization (MCL) algorithm in robotics (Fox et al., 1999). In PF, the number of particles is an important factor to the computing complexity. Montemerlo et al. (2003) proposed an efficient algorithm called FastSLAM based on PF with a minimized number of particles. Davison (2003) used a separate PF to estimate the distance from the observation point to the landmark with a single camera. The estimated distance is not correlated with other observations due to the limitation of the field of view. The follow-up work (Davison et al., 2004) improved the SLAM results by applying a wide-angle camera. In (Menegatti et al., 2004), omnidirectional images were employed to the image-based localization combining with MCL technique. Sim et al. (2005) solved SLAM problem with PF using a stereo vision sensor. EKF is computationally efficient for positional tracking. However, an initial estimate of Gaussian distribution over the landmark position is required. This estimate can be refined efficiently with the estimates from subsequent observations. It is important to have an initial estimate relatively close to the real solution. Many works have focused on the problem of landmark initialization. In (Bailey, 2003), previous poses of the robot were stacked in the memory to perform the landmark initialization. Once the landmarks were initialized, the batch of observations was used to refine the whole map. Costa et al. (2004) presented an iterative solution to the landmark initialization of bearing-only SLAM problem with unknown data association (i.e., all landmarks are visually identical). The authors estimated landmark positions through Gaussian PDFs that were refined as new observations arrived. Bundle adjustment is a process which adjusts iteratively the robot's pose and the landmark positions in order to obtain the optimal least squares solution. Combining EKF with bundle adjustment ensures a robust estimate. Such an optimization is usually calculated off line due to expensive in computation. In (Mouragnon et al., 2006), landmark initialization was carried out with a bundle adjustment in an incremental way, in the order of video frames. An incremental method can improve the computing efficiency compared with the usual hierarchical method. Landmark initialization based on memorizing previous measurements or iterative methods cause time delay for estimation. These methods belong to the delayed methods of landmark initialization . Some immediate initialization methods to bearing-only SLAM called undelayed methods of landmark initialization were introduced; Kwok and Dissanayake (2004) presented a multiple hypotheses approach to solve the problem in a computationally efficient manner. Each landmark was initialized in the form of multiple hypotheses distributed along the direction of the bearing measurement. The validity of each hypothesis was then evaluated based on the Sequential Probability Ratio Test (SPRT).  gave a new insight to the problem and presented a method by initializing the whole vision cone (see Figure 4(a)) that characterizes the direction of the landmark. This cone is covered by a sequence of ellipses that represent the likelihood of the landmark. Undelayed method of landmark initialization is efficient to identify the directions of all landmarks when the first bearing measurements are made. It does not state explicitly the locations of the landmarks. Further observations are still required to initialize the landmark positions. Lemaire et al. (2005) applied an undelayed initialization method to a 3D bearingonly SLAM. The landmark initialization is similar to the method proposed in (Kwok & Dissanayake, 2004) by maintaining a mixture of Gaussians. The updating process was done by comparing the likelihoods of subsequent observations. If the likelihood falls below a certain threshold then the Gaussian is removed. Once only a single Gaussian is left in the cone, the landmark is initialized and added into the map for EKF-SLAM.

Our approaches
This chapter presents two methods to compute the spatial uncertainties of the objects based solely on bearing measurements only: namely a geometric method and a probabilistic method. These methods are similar to the approach of the undelayed method of landmark initialization. Since the estimate based on a partial observation (known bearing but unknown range) is insufficient to completely determine a landmark position, a second observation from a vantage position is required to generate an explicit estimate. In the geometric method, we manipulate directly each vision cone as a polyhedron instead of a sequence of Gaussians. Each cone contains a landmark position. After a second observation in a linear trajectory, the uncertainty region (the set of possible landmark positions that are consistent with the first and second observations) of the landmark becomes the intersection of two cones rooted at the two observation points, see Figure 4(b). Depending on the difference of bearings, the intersection is either a quadrangle (four-side polygon) or an unbounded polyhedron. For each estimation, we change the bases from the local frame (the robot-based frame, denoted by R F ) into the global frame (the landmarkbased frame, denoted by L F ). The uncertainties of all objects are re-computed with respect to L F by the change of bases. A global map with the estimated positions of all objects and their associated uncertainties can be gradually built while the robot explores its environment. In the probabilistic method, a landmark position is represented by a PDF ) , ( α r p in a polar coordinate where r and α are independent. Formulas are derived for computing the PDF of landmark position when an initial observation is made. The updating of the PDF with the subsequent observations can be done by direct computing from the formulas. We select a number of sample points in the uncertainty region (computed from the geometric method) by the rejection method (Leydold, 1998). These sample points are used to represent the PDF in R F . By changing the bases from R F to L F , the PDFs of all object positions in the global frame L F can also be computed. Without range measurement, we assume the probability density of an object location is constant along the range. It is a more realistic assumption than the one made by other existing methods (Davison, 2003;Davison et al., 2004;Kwok & Dissanayake, 2004;Sola et al., 2005) which assume that the probability density of the object location is a Gaussian or a mixture of Gaussians. Indeed, if only bearing information is given, the probability that the landmark is between 4 and 5 metres should be the same as the probability that the landmark is between 5 and 6 metres. The representation with a Gaussian or a mixture of Gaussians fails in this constraint. With our PDF representation, the probability that the landmark is between 4 and 5 metres will be the same as the probability that the landmark is between 5 and 6 metres.

A direct localization method using only the bearings extracted from two panoramic views along a linear trajectory
In this section, we describe a direct method (in the sense it does not use an iterative search) based solely on vision for localizing a mobile robot in a 2-dimensional space. This method relies only on the bearings derived from two images taken along a linear trajectory. We only assume that the robot can visually identify landmarks and measure their bearings. This method does not require any other sensors (like range sensors or wheel encoders) or the prior knowledge of relative distances among the objects. This method can be adopted in a localization system which utilizes only a single camera as the sensor for navigation. Given its low cost, such a system is well suited for domestic robots such as autonomous lawnmowers and vacuum cleaners.

Method description
In order to describe our method we need to introduce some notation. The robot position at is not a single point but a whole line.
Experiments in simulation and on a real robot (see Section 3.2) indicate that the accuracy of this localization system is sensitive to the relative difference of bearings. Let e be the estimated error of landmark position. Figure 2 shows that ) sin( * ) sin( * We have

Empirical Evaluation
Our localization method was evaluated on a Khepera robot equipped with a color camera (176 x 255 resolution). The average error between the measured and actual bearings is about ± 2 degrees. In this experiment, the second landmark was placed 20 centimetres apart from the first landmark. Four different starting points were used, and 20 trials at each point were conducted. The moving distance in all cases was 30 centimetres. The moving directions were westwards parallel to the landmarks. The results are shown in Figure 3. In this figure, landmarks are denoted by stars, trajectories are shown as arrows, and the estimated positions by our localization method are displayed as scatter points. The localization error, average distances between the estimated positions and the actual positions, at positions a, b, c, and d (in Figure 3) are respectively 0.6, 1.2, 2.2, and 2.8 centimetres. The errors are small compared to the diameter of the robot (6 centimetres). Other experimental results have confirmed that the error is inversely proportional to the relative difference in bearings. When more than two landmarks are present, the localization accuracy can be further improved by fusing the estimated positions, giving more importance to the estimation returned by the pair of landmarks that has a larger relative difference in landmark bearings. In summary, we have introduced a novel effective approach for robot self-localization using only the bearings of two landmarks. This technique can be viewed as a form of stereo-vision. The method we propose is well suited for real-time system as it requires very little computation.
When more than two landmarks exist in the environment, the robot can determine the relative positions of the landmarks provided some weak visibility constraints are satisfied. This method enables a mobile robot to localize itself with only two observed bearings of two landmarks. Such a localization system will be invaluable to an indoor robot as well. As the bearings of the sides of a door frame can play the roles of the landmarks 1 L and 2 L and tell the robot exactly where it stands relative to the door. In next section, we employ this method to solve the landmark initialization problem in bearing-only SLAM.

Sensitivity Analysis to Landmark Initialization of Bearing-Only SLAM -A geometric approach
In this section, we propose a geometric method to solve the landmark initialization problem in bearing-only SLAM. The assumptions and the localization method are the same as in Section 3, with the exception that vision error is taken into consideration. The estimate of a landmark position becomes an uncertainty region instead of a single point. In particular, we show how the uncertainties of the measurements are affected by a change of frames. That is, we determine what can an observer attached to a landmark-based frame L F deduced from the information transmitted by an observer attached to the robot-based frame R F .

Method description
The notations in this section are the same as in Section 3.1. The uncertainty region of j L is  Figure 4(b) shows that a typical intersection is a 4-sided polygon. If the cones are almost parallel, their intersection can be an unbounded polyhedron. The spatial relationships in Figure 4(b) are expressed in R F . Since the robot is moving over time, the base of R F changes too. Therefore, it is necessary to change coordinate systems to express all positions in the global frame L F . Figure 5 illustrates the difficulty of expressing Bearing-only Simultaneous Localization and Mapping for Vision-Based Mobile Robots 345 the robot centred information in the global frame L F . The uncertainty on the landmarks prevents us from applying directly a change of bases. In next section, we will show how to solve this problem.  The uncertainty regions of 1 O and 2 O with respect to L F can be obtained by considering all possible hypotheses for the location of 2 L consistent with the observations. That is, we consider the set of possible coordinate vectors ) A of uncertainty regions with respect to L F (see Figure   6(b)).

Bearing-only Simultaneous Localization and Mapping for Vision-Based Mobile Robots 347
In the general case, when uncertainty exists for both 1 L and 2 L , to transfer the information from R F to L F , we consider simultaneously all the possible locations of 1 L and 2 L consistent with the observations. We hypothesize, The above constraints completely characterize 2 1 ,h h τ . For any point x , the coordinates transfer between the two frames is done with Equation (4).
In other words, the uncertainty region The computation in this example, we take the four vertices (the extreme points) k L 1 and k L 2 is not a polyhedron, in practice it can be approximated by a polyhedron. We have tested the proposed method both in simulation and on a real robot. These results are presented in next section.

Simulation
We tested the proposed method in simulation in an environment with four landmarks (at unknown positions to the localization system). The robot moves in a polygonal line around the centre with some randomness. Since we focus on landmark initialization, Figure 8 shows only the estimated positions of the landmarks.

Evaluation on a Real Robot
Our method was evaluated using a Khepera robot. The Khepera robot has a 6 centimetre diameter and is equipped with a color camera (176 x 255 resolution). A kheperaSot robot soccer playing field, 105 x 68 square centimetres, was used as the experimental arena (see Figure 15). There were four artificial landmarks in the playing field. Only one landmark was distinct from the others. The second landmark was placed 20 centimetres apart from the first landmark.
During the experiments, the robot moved in a polygonal line. Two panoramic images were taken in each straight motion. Landmark bearings were extracted from the panoramic images using a color thresholding technique. Bearings from each pair of observations were used to estimate the landmark positions. The vision error ε is limited to ±2 degrees.  We carried out another experiment to study the sensitivity of vision error ε . The top chart of Figure 10 shows the relationship between ε and the uncertainties of 3 L . The amount of the vision error ε was varied from 2 to 7 degrees. The uncertainties are proportional to ε in a linear manner. The bottom chart of Figure 10 shows that the estimated error might not decrease monotonically. This is because we assign the centroid of the uncertainty region as the estimated landmark position. In this section, we introduced a method for analyzing how uncertainty propagates when information is transferred from one observer attached to a robot-based frame to an observer attached to a landmark-based frame. The accuracy of this method was demonstrated both in simulation and on a real robot. In next section we will employ a probabilistic method to compute the uncertainties of object positions.

Sensitivity Analysis to Landmark Initialization of Bearing-Only SLAM -A probabilistic approach
In this Section, we describe a probabilistic method to solve the landmark initialization problem in bearing-only SLAM. The assumptions in this method are the same as Section 3. We characterize ) , when r and α are independent. Let β denote the measured landmark bearing. Assume that the error range for the bearing is ε ± . The landmark position is contained in the vision cone which is formed by two rays rooted at the observation point with respect to two bearings ε β − and ε β + (see Figure 11).
In Equation (7), max R and min R are the bounds of the vision range interval. We define ) (R F as the probability of the landmark being in the area can be represented as:

352
We define ) , ( ∆ Ψ R as the probability of the landmark being in the dotted area in Figure 11.
If the range r and the angle α are independent, then . From Equation (9), we derive Because of the independence of α and r , can be factored as Without loss of generality, we impose that = 1 ) ( . After factoring, Equation (8) becomes . Because of the property of the integration, we have From Equations (10) and (12), we deduce that . By making ∆ goes to zero, we obtain The equality can be re-written as After integrating both sides, we obtain From Equations (7) and (11), ξ can be calculated and thus min max can be re-written as If we use a Gaussian function for ) (α g with mean β and standard deviation σ , the PDF can be re-written as Equation (14). Figure 12 shows the PDF of ) , . Figure 12. The PDF of the landmark position following Equation (14)

Utilization of the PDF for bearing-only SLAM
We illustrate the application of the PDF for our bearing-only SLAM system. Section 5.2-A describes how the PDF can be updated with a second observation. In Section 5.2-B, we present experimental results on a real robot.

A. Updating the PDF with a second observation
When a second observation is made after a linear motion, the landmark position falls in the uncertainty region which is the intersection of two vision cones rooted at the first and the second observation points 1 O and 2 O . We denote with 1 p and 2 p the PDFs of the landmark positions computed from Equation (14)  We want to approximate p with a Gaussian distribution q . To compute the parameters of q , we generate a set S according to the PDF p by the Rejection Method (Leydold, 1998). We determine the maximum probability density max p of p by computing 2 1 p p at the intersection of two bearings. The sampling process selects uniformly a sample point s and a random number , s is rejected, otherwise s is accepted and added to S . The sampling process is repeated until enough points are accepted. Figure 13 shows the generated samples in the uncertainty regions of four landmarks.
The mean x and the covariance matrix C of q are obtained by computing the mean and the covariance matrix of S as previously done by Smith & Cheeseman (1986) and Stroupe et al. (2000). In Figure 13, the contour plots present the PDFs of landmark positions. The estimated PDFs in Figure 13 are expressed in the robot-based frame R F . Since the robot is moving over time, its frame changes too. Therefore, it is necessary to change the coordinate systems to express all the estimations in the global frame L F . We use the method introduced in Section 4 to transfer the samples in S from R F to L F . After the change of frames, the uncertainties of 1 L and 2 L are transferred to other objects. The samples of other objects are taken to approximate the PDFs of the object positions in L F . Figure 14 shows the distribution of the samples after the change of frames. The contour plots present the PDFs of the object positions in the global frame L F associated to the points ) , ( 2 1 L L .

B. Experimental Results
Our method was evaluated using a Khepera robot equipped with a colour camera (176 x 255 resolution). The Khepera robot has a 6 centimetres diameter. A KheperaSot robot soccer playing field, 105 x 68 square centimetres, was used for the experiment arena (see Figure 15). There were four artificial landmarks in the playing field. The first and second landmarks were placed at the posts of a goal, 30 centimetres apart from each other. The objective of the experiment is to evaluate the accuracy of the method by estimating the positions of the third and the fourth landmarks. At each iteration, the robot was randomly placed in the field. The robot took a panoramic image and then moved in a straight line and captured a second panoramic image. Landmark bearings were extracted from the panoramic images using colour thresholding. A total of 40 random movements were performed in this experiment. min R was set to 3 centimetres, max R was set to 120 centimetres, and the vision error ε was ± 3 degrees. Figure   16(b) shows the errors of the estimated landmark positions. For 3 L , the estimated error was reduced from approximately 9 centimetres at the first iteration to less than 1 centimetre at the last iteration. For 4 L , the error was reduced from 14 centimetres to 2.77 centimetres. The experiment shows that the estimated error of landmark position is sensitive to the relative distance with respect to 1 L and 2 L . We made another experiment to test the sensitivity of the errors of the landmark positions with respect to the different directions of the robot's moving trajectories. We let the robot move in four different directions with respect to three landmarks. In Figure 17(a), stars denote the landmark positions and arrows denote the moving trajectories. The robot repeated 10 iterations for each trajectory. The errors on 3 L in four trajectories after the tenth iteration were 2.12, 1.17, 1.51, and 13.99 centimetres respectively. The error of the fourth trajectory is large because the robot moves along a line that is close to 3 L . Therefore, the vision cones at the first and the second observations are nearly identical.
The estimation of the landmark position is more accurate when the intersection of two vision cones is small. This is the case of the second trajectory where the intersection is the smallest among all trajectories. L for the first and the third trajectories are the same, the intersecting areas of 1 L and 2 L for the first trajectory are much bigger than the areas from the third trajectory. This is the reason why the estimated error from the third trajectory is smaller than the one for the first trajectory.

Conclusion
In this chapter, we proposed a vision-based approach to bearing-only SLAM in a 2dimensional space. We assumed the environment contained several visually distinguishable landmarks. This approach is inspired from techniques used in stereo vision and Structure From Motion. Our landmark initialization method relies solely on the bearing measurements from a single camera. This method does not require information from an odometer or a range sensor. All the object positions can be estimated in a landmark-based frame. The trade-off is that this method requires the robot to be able to move in a straight line for a short while to initialize the landmarks. The proposed method is particularly accurate and useful when the robot can guide itself in a straight line by visually locking on static objects. Since the method does not rely on odometry and range information, the induced map is up to a scale factor only. In our method, the distance || || 2 1 L L − of two landmarks is taken as the measurement unit of the map. The selection of 1 L and 2 L is critical for the accuracy of the map. In Section 3.1, the mathematical derivation shows that the estimated error of a landmark position is proportional to the range of the landmark and the inverse of the relative change in landmark bearings. Choosing 1 L and 2 L with larger change in bearings produces a more accurate mapping of the environment. In the sensitivity analysis, we showed how the uncertainties of the objects' positions are affected by a change of frames. We determine how an observer attached to a landmarkbased frame L F can deduce the uncertainties in L F from the uncertainties transmitted by an observer attached to the robot-based frame R F . Each estimate of landmark uncertainties requires a pair of the observations in a straight movement. The simulation in Section 4.3 shows how the uncertainties of landmark positions are refined when the robot moves in a polygonal line. With dead reckoning, the error of the estimated robot's location increases with time because of cumulated odometric errors. In our method, we set O and 2 O by construction. In practice, the robot's movement may not be perfectly straight. However, the non-straight nature of the trajectory can be compensated by increasing the size of the confidence interval of the bearing. The induced map created by our method can be refined with EKF or PF algorithms. With EKF, the uncertainty region computed from the geometric method can be translated into a Gaussian PDF. With PF, the weights of the samples can be computed with the formulas derived in Section 5.1. Since the samples are drawn from the uncertainty region, the number of samples is minimized. The accuracy of our method was evaluated with simulations and experiments on a real robot. Experimental results demonstrate the usefulness of this approach for a bearing-only SLAM system. We are currently working on the unknown data association when all landmarks are visually identical. In future work, we will deal with the problems of object occlusion and non-planar environments. That is, the system will be extended from a 2dimensional to a 3-dimensional space.