An efficient data association approach to simultaneous localization and map building

We present an efficient integer programming (IP) based data association approach to simultaneous localization and mapping (SLAM). In this approach, the feature based SLAM data association problem is formulated as a 0-1 IP problem. The IP problem is approached by first solving a relaxed linear programming (LP) problem. Based on the optimal LP solution, a suboptimal solution to the IP problem is then obtained by applying an iterative heuristic greedy rounding (IHGR) procedure. Unlike the traditional nearest-neighbor (NN) algorithm, the proposed algorithm deals with a global matching between existing features and measurements of each scan and is more robust for an environment of high density features which is usually the case in outdoor environments. We provide a simulation study where the NN algorithm fails whereas our proposed algorithm performs satisfactorily. Experimental results also demonstrate the effectiveness and efficiency of our approach.


Introduction
A feature based approach to simultaneous localization and map building (SLAM) is to use the information obtained by sensors mounted on a vehicle to build and update a map of the environment and compute the vehicle location in that map. One of the critical problems in obtaining a robust SLAM solution is data association, i.e. relating sensor measurements to features in the map that has been built thus far (Guivant, 2000). SLAM relies on correct correspondence between data obtained from the robot sensors and the data currently stored in the map. There have been numerous approaches to data association. In stochastic mapping, the simplest method is the NN algorithm which is a classical technique in tracking problems (Bar-shalom & Fortmann, 1988). The great advantage of NN is its o(mn) computational complexity in addition to its conceptual simplicity (Here m is the number of sensor measurements and n is the number of existing features in the map). It performs satisfactorily when clutter density is low and sensor accuracy is high. However, during the process of SLAM, especially in complex outdoor environments, clutter level is usually high and the innovations in matching different observations obtained from the same vehicle position are correlated. In this situation, the NN algorithm may accept a wrong matching, which leads to divergence in state estimation. In order to improve the robustness of data association, Neira and Tardos (Neira & Tardos, 2001) presented an approach using a joint compatibility test based on the branch and bound search with a high computational cost. Juan Nieto et al. (Nieto et al., 2003) give a fast SLAM algorithm for data association by applying the multiple hypotheses tracking method in a variety of outdoor environments. The experimental complexity estimates show that if the number of features in one scan is large, these algorithms will not be fast enough for real time implementation. In other approaches, Bailey et al. consider relative distances and angles between points and lines in two laser scans and use graph theory to find the largest number of compatible pairings between the measurements and existing features (Bailey et al., 2000). The work of Lim and Leonard (Leonard & Lim, 2000) applies a hypotheses test to implement data association of the relocation in SLAM using geometric constraints. Castellanos and Tardos (Castellanos et al., 1999) use binary constraints to localize the robot with an a priori map using an interpretation tree. In these methods, geometric constraints among features are used to obtain hypotheses with pairwise compatible parings. However, pairwise compatibility doesn't guarantee joint compatibility, and additional validations are required. Data association based on a 0-1 integer programming (IP) problem for multi-sensor multi-target tracking was firstly proposed in (Morefield, 1977) and later improved in (Poore & Robertson, 1995;Poore & Robertson, 1997;Poore & Robertson, 1994). In these articles, the data association and tracking problem was formulated as a multi-dimensional assignment problem. By extending the multi-sensor and multi-target tracking problem, the data association in SLAM is formulated as a 0-1 integer programming problem in (Zhang et al., 2004b;Perera et al., 2003;Perera et al., 2004). In (Perera et al., 2003;Perera et al., 2004), the data association in SLAM is first formulated as a three dimensional assignment problem where two frames of scan data are used to establish the correspondence between the measurements and the features stored in SLAM. This algorithm can track features very reliably especially in high clutter density environments. However, three dimensional assignment problem is NP-hard (Hochbaum, 1997) and only heuristic optimization algorithms are available for an approximation solution because of the computational complexity. In (Zhang et al., 2004b), a parallel development of data association in SLAM is formulated as a two dimensional assignment problem. The chapter reports the detail of this work. In this chapter we present an efficient integer programming (IP) based data association approach to SLAM. In this approach, the feature based SLAM data association problem is formulated as a 0-1 IP problem that considers only 1 frame of scan data. Therefore, it is formulated as a two dimensional assignment problem for which many optimization algorithms such as those in (Poor & Robertson, 1994;Poor & Robertson, 1997;Miller & Franz, 1993;Miller & Franz, 1996;Storms & Spieksma, 2003) can be applied. The IP problem is approached by first solving a relaxed linear programming (LP) problem. In order to reduce the computational burden, a validation gate is applied to reduce the size of the solution space. An iterative heuristic greedy rounding (IHGR) process based on linear programming techniques (Miller & Franz, 1993;Miller & Franz, 1996;Storms & Spieksma, 2003) is then proposed to obtain a suboptimal solution to the integer programming problem. The algorithm has moderate computational requirements. Detailed simulation and experimental results show that the proposed method gives a much higher success rate of data association for environments of high density features than the NN algorithm while the cost of computation is moderately higher than the latter. Further, experiments in a real outdoor environments show that the NN algorithm leads to a diverged vehicle pose estimate whereas the proposed algorithm performs satisfactorily. As compared to other existing methods such as the JCBB algorithm, our approach has a lower computational complexity and provides a good trade-off between accuracy and computational cost. The chapter is organized as follows: Section 2 is devoted to an IP formulation for data association in SLAM. Section 3 presents an iterative heuristic greedy rounding algorithm for the IP problem. Section 4 shows some simulation and experimental results. Some conclusions are drawn in Section 5.

Problem Formulation
In this section we formulate the data association of SLAM as a 0-1 integer programming problem similar to (Perera et al., 2003;Perera et al., 2004). A mathematical framework of SLAM which is based on the extended Kalman filter (EKF) will be applied. Data association in SLAM is the decision process of associating measurements (observations) with existing features in the stochastic map. It should be noted that the term "measurements" (observations) in this chapter refers to the observed features after feature extraction rather than the raw sensor measurements. Generally, the number of the measurements obtained in each scan is not equal to the number of features whose positions are estimated by the EKF. Each measurement may either (1) belong to a previously known geometric feature or (2) be a new geometric feature or (3) be a spurious measurement (also called a false alarm). On the other hand, there also exist features that do not have associated measurements in the current scan. A dummy element is applied to denote the case of a false alarm or a new feature or a feature that does not have an associated measurement here. Assume that there are M measurements from the latest scan which are to be assigned to N existing features in the map built based on the previous scans. It is also assumed that the measurements are independent. Typically, N M ≠ . We define the binary assignment variable implies that the measurement m is not assigned to any of the existing N features, but instead, assigned to a dummy feature-false alarm or newly initialized feature. In the data association process, we make the reasonable assumption that one measurement originates from at most one feature, and one feature can produce at most one measurement. Therefore, the following constraints can be imposed to the association variables: (2)

(3)
Our goal is to match the sensor's observations with the features by providing estimates of the features' positions relative to the vehicle pose at the time of the current scan. In order to formulate the 2-D assignment problem, a generalized likelihood ratio which involves feature state estimates for the candidate associations is used to assi5ogn a cost to each association. Similarly to the multi-target tracking problem, we maximize a likelihood function LH as follows: where m z is the m-th measurement of the scan, ˆn z is the predicted relative position of the n-th feature by the EKF, S is the covariance matrix ofm n zz − , and nm E is the set of all possible assignment pairs. The likelihood ratio (,) mn zf Λ is in fact the probability that the mth measurement matches the n-th feature in the current sensor scan. In order to constitute a 2Dassignment optimization problem, instead of maximizing the product of matching probabilities, we can minimize the negative log-likelihood ratio. To this end, we define: Then, an equivalent cost function for equation (4) can be written as follows: wher. Thus, data association in SLAM can be formulated as the following 0-1 integer programming problem: where and (10) In this algorithm, two points should be noted: • If a measurement does not fall into the 3σ region of any of the existing N features (see the gating process in Subsection 3.1), we assume that it is a new feature and add it to the SLAM map directly. It will no longer be considered in the above 2D-assignment.

•
If an existing feature does not have any measurement that falls into its 3σ region, we consider that this feature is not observed in the current scan. No further matching will be carried out for this feature

The IHGR Based Data Association Algorithm
In this section, we apply an LP based algorithm to solve the data association problem formulated in the last section. The method is a combined IHGR and LP algorithm. In order to reduce the computational burden, a validation gate is applied first to reduce the above global association to several local associations.

Gating
In order to reduce the solution space, gating is first applied. Only measurements that are close enough to the predicted state of an existing feature are considered possible candidates of association with the feature. The criterion of gating is given by:

Relaxation and Rounding Technique
Many combinatorial optimization problems can be attacked with approximation algorithms that yield a feasible solution in polynomial time with cost close enough to the optimal one (Hochbaum, 1997). Such approximation algorithms can be loosely categorized as combinatorial approximation algorithms and LP based approximation algorithms. Here, we are interested in the latter category. In order to solve the IP problem, we often firstly relax it into a LP problem (Cormen et al., 2001). In general, the relaxation refers to the action of relaxing the integer requirement of a linear IP to turn it into an LP. However, the optimal solution to the LP problem in general does not coincide with the solution to the initial IP problem. One of the basic techniques which is widely exploited to derive a LP based approximation algorithm is LP-based rounding. It refers to how to construct a feasible solution for the IP from the LP (Cormen et al., 2001;Parker & Rardin 1988). Thus, the LPbased approximation algorithm first relaxes the IP problem to a LP problem, solves the LP problem and then converts the fractional optimal solution of LP to an integer solution. The heuristic algorithm applied here is based on the relaxation and rounding algorithms as described in (Miller & Franz, 1993;Miller & Franz, 1996;Storms & Spieksma, 2003). These algorithms are used for other applications. It is noted that the two dimensional assignment problem can also be solved by other optimization techniques.

IHGR Procedure
By changing the integer constraint {0,1} nm x ∈ to 01 nm x ≤≤ , the IP problem is relaxed to a LP one. The LP problem can be solved by basic LP algorithms, such as the Simplex algorithm (Vajda, 1981). If the optimal solution op x of the LP-relaxation is fully integer-valued (in this case all decision variables will have the value of either 0 or 1) then the solution op x is optimal for the 0-1 IP problem in Equation (7) (Parker & Rardin, 1988). Otherwise, we apply the IHGR procedure (see, e.g. (Miller & Franz, 1996)). Observe that the larger the decision variable nm x , the higher the probability that the m -th measurement associates with the n -th feature.
Hence, the algorithm starts by setting the maximum decision variable (with a value close to 1) to 1 and all other entries in the same row and column to zero to meet the constraints (8) and (9). Then, solve the LP problem for the remaining assignment matrix and repeat the IHGR procedure to decide the next pairing of measurements and features. The process is continued until all measurements have been assigned. In this manner, a feasible (but not necessarily optimal) solution for the original IP problem is constructed.
In the IHGR procedure, when nm x is set to 1, all variables in the column and row associated with the specific set in nm E must be set to 0. Once a variable is forced to a certain value, it is not allowed to change any more. To achieve this, all rounded variables and all implicated variables are discarded from the IHGR procedure. In this way, the IHGR will never set the value of a variable twice. This deletion of variables also applies to the initial LP solution, i.e. all variables with value 1 and all zero-valued variables implicated by them, are removed. The IHGR algorithm repeats the actions of selection, rounding and deletion until there are no variables left. The outcome will then be a feasible solution to (7). The algorithm described above can be summarized in the following 4 steps: Step 1: Relax the IP problem to a LP problem and solve the LP problem. If the solution is fully integer valued, then stop. Otherwise, go to Step 2. • Step 2: Set the maximum decision variable to 1 and other entries in the same row and column to zero.

•
Step 3: Delete the elements that have been decided, go back to step 1 for the rest of the assignment matrix.

•
Step 4: Repeat the above relaxation, selection, rounding, and deletion steps until all the elements are assigned. Observe that the IHGR can be implemented efficiently, but it is clear that there is no guarantee that this heuristic procedure yields the optimal solution to the IP problem. However, the simulations and experiments to be discussed in the following section show that the IHGR does result in acceptable feature-measurement assignments of which the achieved cost is close to the optimal cost.

Algorithm Complexity
Due to the application of the gating process that is affected by random factors, we cannot give an exact description of the complexity. However, we know that in any fixed dimension, LP can be solved in polynomial linear time (linear in the input size) (Karmarkar, 1984). For our case, the input size is MN × . Thus, we can roughly know the worst-case complexity of the proposed algorithm is Neira and Tardos (Neira & Tardos, 2001) presented a data association approach-JCBB. JCBB performs incremental construction and search of an interpretation tree of joint association hypotheses. The gating determines acceptable hypotheses and performs branch and bound pruning of the search space. The discussion in (Neira & Tardos, 2001) does not provide any theoretical bound, but gives an empirical complexity estimate of (1.53 ) , where N is the number of observed features. When N is large, the algorithm will have a high computational complexity. For example, when 30 N = , where M =30 that is the worst case.
Therefore, when the observed feature number is large (such as more than 30), our algorithm is faster than JCBB. In the simulation, we also found that when N is large, for example 30 N ≥ , JCBB is too slow to work while the algorithm proposed can work well (The example can be seen in the section of experimental results).

Simulation and Experimental Results
The algorithm presented was tested in two different environments, an artificial environment and a real outdoor environment. In these two environments, SLAM was implemented by using the data association algorithm proposed in the last section. The first test environment is established by randomly generating some features and assuming that the vehicle's trajectory is a circle whose radius is 62 meters. The robot moves at a constant speed and the heading angle changes 1 degree at each sampling instant. The environments involve 105 features which are randomly distributed in a region of 120 meters by 120 meters. The standard deviations of the simulated laser sensor are taken as 0.01 We assume that the features' positions are unknown which is the case in SLAM and match the features with the observations by using the NN and the IHGR algorithms, respectively. In order to compare the successful rates of these two data association algorithms, we change feature positions except one feature 200 times to implement the SLAM. The features' positions are randomly generated each time with uniform distribution in space. We apply the NN data association algorithm and the IHGR data association method to perform the SLAM processes, respectively. In the simulation, we only fix the position of one feature and investigate the successful rate of the matching of this feature with its observation in one scan (scan 15) during the SLAM. The data association successful rate for this feature when using IHGR algorithm is 96.5% (193/200) while it is only 81% (162/200) for the NN algorithm. When the features are closely located, the NN algorithm fails. Figures 1 and 2 show the SLAM results where the NN algorithm fails whereas the proposed IHGR algorithm performs well. In this case, the positions of 3 features in the environment are fixed and they are located at (27, 20.5), (26, 19.5) and (26.5, 19), respectively. The remaining features are randomly distributed. It can be observed from Figure 1 that the NN algorithm leads to diverged estimates of vehicle pose and feature positions. On the other hand, our IHGR method performs very well as observed in Figure 2. In fact, the vehicle's true path is almost overlapped with the estimated one. The global error between the estimated vehicle's position and the ground truth can be seen in Figure 3. A comparison on execution time between the NN algorithm and the IHGR based data association algorithm versus the number of features observed in one scan is shown in Figure  4 (the algorithms are run on Pentium IV PC, 1.7GHz) for the cases when the NN algorithm is able to give successful SLAMs. In the figure, the mean execution time means the average CPU time used for 50 Monte Carlo runs for data association algorithm. In the simulation, we extract data at each scan under different feature number and assume that the number of measurements is the same as that of the existing features (this is the most time consuming case). The result shows that our IHGR based method has moderate computational requirement and is implementable in real-time applications. Fig. 2. The mapping and vehicle path when applying IHGR data association method in the SLAM process. Fig. 3. The vehicle's position errors in global coordinates between the ground truth (generated data) and the estimated position using IHGR data association algorithm in the SLAM process.  In Table 1, we show the average global errors of the vehicle's position estimate versus the number of features observed (observations) in each scan in the same environment mentioned above. We fixed the number of observations in each SLAM process which lasted for a few thousand steps but changed the number of observations for different SLAM processes. In the table, the errors are calculated as follows: where i x E and i y E are the absolute values of the i th time step vehicle position errors in the X and Y directions, respectively, and K is the total number of steps during SLAM. It is found that the errors decrease when the number of observations increases. When the number of observations is more than 35, we found that the errors change very little. Therefore, we only show the results when the number of observations is smaller than 35. In order to examine the robustness of the IHGR based algorithm with respect to sensor noise, we also carried out the simulations under various sensor range variances. From (Zhang et al., 2004a;Zhang et al., 2003;Adams et al., 2004), we know that for any range finder, the range noise covariance can vary depending on the received signal amplitude while the angular uncertainty is relatively very small and little changed compared to the range noise. In the simulation, we fix 0.0005 θ σ = radian. The sensor used here is a LADAR (Laser Detection and Ranging) sensor, therefore, the standard deviation of the range noise r σ can be typically from 0.01 meters to 0.25 meters (Adams, 1999) for different sensors. Table 2 shows the average performance of 10 SLAM processes for the same environments mentioned earlier with 12 observations. In Subsection 3.3 we have compared the computational burden of our method with that of the JCBB algorithm (Neira & Tardos, 2001). Here we shall compare the accuracy of these two methods. Note that the JCBB algorithm which uses the branch and bound enumeration algorithm gives an optimal solution to the data association problem. In this simulation, we use 30 features in the large scale map because the speed of the JCBB algorithm becomes quite slow when the feature number is too large. From Table 3 one can see that with the IHGR based data association method a near optimal solution to SLAM has been achieved. Observe that when the number of features is 5, the same performance as the JCBB algorithm which gives the optimal solution is obtained in this experiment.

Real Outdoor Environment 4.2.1 SLAM with Artificial Beacon
In order to implement the IHGR data association algorithm during SLAM in a real environment, we firstly use the experimental data set from  obtained by Guivant and Nebot. The testing site is a car park at Sydney University. The vehicle is equipped with a GPS, a laser sensor and wheel encoders. A kinematic GPS system of 2 cm accuracy was used to evaluate the ground truth. Thus, the true navigation map was available for comparison purposes. Wheel encoders give an odometric measurement of the vehicle location. The dead reckoning sensors and laser range sensor are combined together to predict the vehicle's trajectory using the extended Kalman filter and to build up the map at the same time. In this experiment, the features used are artificial beacons. The feature detection was done by using a geometric analysis of the range measurement to obtain the most likely centers of tree trunks using intensity information. The laser scans are processed using Guivant's algorithm (Guivant & Nebot, 2000) to detect tree trunks' centers and estimate their radii.  We ran continuous SLAM for more than 5000 time steps and obtained the map shown in Figure 5 using our proposed IHGR data association method. It can be seen that the IHGR method performs well in real time. Figures 6 and 7 give the measurement (range and angle) innovation and their 95% confidence bounds during the SLAM process, respectively. The results show that the IHGR based algorithm works well during the SLAM process.
In order to check the effectiveness of the IHGR data association method, we randomly choose two scans (scan 68 and scan 87) and show the matching matrix nm x after the IP problem is solved. In scan 68, the laser sensor obtained 2 measurements and the existing feature number has accumulated to 11. As mentioned, the term "measurement" means the extracted features. As seen in Table 4, measurement 1 is associated with feature 3 and measurement 2 is associated with feature 2. The rest of the features are all undetected in this scan. In Table 5, for scan 87, measurement 1 is matched with a dummy element which means this is a new feature or false alarm. Since the probability of false alarm for the laser sensor is low, we regard the measurement as a new feature. The other measurements and features are similar to those in scan 68 which can be seen in Table 4.  In this environment, the features are sparsely distributed. By checking the experimental data, we know that the number of observations per step ranges from 1 to 5. Therefore, the environment is relatively simple as far as data association is concerned. In this case, the NN algorithm also works well. We ran SLAM with the NN algorithm and IHGR algorithm respectively and found that the computational cost is similar although the NN algorithm is slightly faster than our method. For NN data association, the whole SLAM process (involving more than 5000 time steps), where the vehicle travelled several hundred meters, took 40.9680 seconds while the IHGR algorithm took 58.1420 seconds on the same computer (the algorithms were run on Pentium IV PC, 1.7GHz, and we calculated the CPU time for each algorithm).

SLAM with Natural Features in a real Environment
In order to verify the effectiveness of the IHGR algorithm, a more complex real environment is explored using our car-like Cycab vehicle (see Figure 9). In this case, SLAM is implemented with natural features rather than artificial beacons in the last experiment. The features extracted from the complex campus environment are obtained by using the feature extraction algorithm given in (Zhang et al. 2004a). The experimental data used was obtained from a 2-D scanning range laser (SICK LMS200) mounted on the front of the vehicle. The laser returns a 180 degree planar sweep of range measurements in 0.5 degree intervals (i.e., 361 range values in an anti-clockwise order) with a range resolution of 50mm. Similar to the last experiment, the vehicle is equipped with a D-GPS, a laser sensor and encoders. The D-GPS system is used to evaluate the ground truth. Thus, the true navigation map is available for comparison purposes. The testing site is a long walk way around Hall 7 at Nanyang Technological University of Singapore. The real campus map, including this road, is shown in Figure 8. The dead reckoning sensors and laser range sensor are combined together to predict the vehicle's trajectory using the extended Kalman filter and to build up the map at the same time. During SLAM, in order to improve our map accuracy, we attempted to detect two types of features, namely the point features and circular features and used the IHGR algorithm to carry out data association. The starting and ending of the trajectory are indicated by red dots. Figure 9 shows part of the experimental environment. Figure 10 gives a typical laser scan during the run and shows the main features that are extracted from the scan data. Figure 11 shows the experimental results of SLAM. In this figure, there is a break in the GPS data. This is because the vehicle is under a building which blocks the GPS signal (the GPS data coordinate is vacant roughly between (-10, -120) and (-30, -115)). That is, no ground truth is available in this segment of road. From this figure, we can see that the estimated path is close to the GPS reading except the part of the road that is near to the break because the GPS reading is not accurate for this segment of the trajectory. Figures 12 and 13 indicate the observation innovations and their 95 % confidence bounds during the whole process which lasted for 450 seconds. These figures show that the SLAM process is consistent using the natural features. It is also important to mention that the average positional error of the entire SLAM process except the segment where the ground truth is unavailable is smaller than 0.5 meter. We also carried out the SLAM process with the NN data association algorithm. Unfortunately, the NN algorithm results in a diverged vehicle path estimate as shown in Figure 14, due to the complex environment with features of high density. In Figure 14, point A is the starting point of the SLAM process as in the previous experiment with the IHGR algorithm.

Conclusions
This chapter presented a data association algorithm for SLAM which offers a good trade-off between accuracy and computational requirements. We first formulated the data association problem in SLAM as a two dimensional assignment problem. In our work, only 1 frame of scan data is considered. The data association problem in SLAM is formulated as a two dimensional assignment problem rather than a three dimensional one which is an NP hard problem and is computationally more efficient. Further, since only one step prediction is involved, the effect of the vehicle model uncertainty is smaller as compared to the data association methods using two frame scan data. In order to obtain a fast solution, the 0-1 IP problem was firstly relaxed to an LP problem. Then we proposed to use the IHGR procedure in conjunction with basic LP algorithms to obtain a feasible solution of the data association problem. Both the simulation and experiment results demonstrated that the proposed algorithm is implementable and gives a better performance (higher successful rate) than the commonly used NN algorithm for complex (outdoor) environments with high density of features. Compared to the optimal JCBB algorithm, the proposed algorithm has lower computational complexity and is more suitable for real-time implementation.