Scaling Effects for Synchronous vs. Asynchronous Video in Multi-robot Search

Camera guided teleoperation has long been the preferred mode for controlling remote robots with other modes such as asynchronous control only used when unavoidable. Because controlling multiple robots places additional demands on the operator we hypothesized that removing the forced pace for reviewing camera video might reduce workload and improve performance. In an earlier experiment participants operated four teams performing a simulated urban search and rescue (USAR) task using a conventional streaming video plus map interface or an experimental interface without streaming video but with the ability to store panoramic images on the map to be viewed at leisure. Search performance was somewhat better using the conventional interface; however, ancillary measures suggested that the asynchronous interface succeeded in reducing temporal demands for switching between robots. This raised the possibility that the asynchronous interface might perform better if teams were larger. In this experiment we evaluate the usefulness of asynchronous video for teams of 4, 8, or 12 robots. As in our earlier study we found a slight advantage in accuracy in marking victim locations for streaming video but overall performance was very similar.

. Viewpoints for control from (Wickens & Hollands, 1999) gives the illusion of being flat when viewed from a camera mounted on a platform traversing that surface . For fixed cameras the operator's ability to survey a scene is limited by the mobility of the robot and his ability to retain viewed regions of the scene in memory as the robot is manoeuvred to obtain views of adjacent regions. A pan-tiltzoom (PTZ) camera resolves some of these problems but introduces new ones involving discrepancies between the robots heading and the camera view that frequently lead to operational mishaps . A tethered "camera" (B, C) provides an oblique view of the scene showing both the platform and its 3D environment. A 3rd person fixed view (C) is akin to an operator's view controlling slot cars and has been shown effective in avoiding roll-overs and other teleoperation accidents (McGovern, 1990) but can't be used anywhere an operator's view might be obstructed such as within buildings or in rugged terrain. The tethered view (B) in which a camera "follows" an avatar (think Mario Brothers©) is widely favored in virtual environments (Milgram, 1997;Tan et al., 2001) for its ability to show the object being controlled in relation to its environment by showing both the platform and an approximation of the scene that might be viewed from a camera mounted on it. This can be simulated for robotic platforms by mounting a camera on a flexible pole giving the operator a partial view of his platform in the environment (Yanco & www.intechopen.com . Because of restriction in field of view and the necessity of pointing the camera downward, however, this strategy is of little use for surveying a scene although it can provide a view of the robot's periphery and nearby obstacles that could not be seen otherwise. The exocentric views show a 2 dimensional version of the scene such as might be provided by an overhead camera and cannot be obtained from an onboard camera. This type of "overhead" view can, however, be approximated by a map. For robots equipped with laser range finders, generating a map and localizing the robot on that map provides a method for approximating an exocentric view of the platform. If this view rotates with the robot (heading up) it is a type D plan view. If it remains fixed (North up) it is of type E. An early comparison at Sandia Laboratory between viewpoints for robot control (McGovern, 1990) investigating accidents focused on the most common of these: (A) egocentric from onboard camera and (C) 3rd person. The finding was that all accidents involving rollover occurred under egocentric control while 3rd person control led to bumping and other events resulting from obstructed or distanced views.

Multi-robot search
Remotely controlled robots for urban search and rescue (USAR), robots are typically equipped with both a PTZ video camera for viewing the environment and a laser range finder for building a map and localizing the robot on that map. The video feed and map are usually presented in separate windows on the user interface and intended to be used in conjunction. While (Casper & Murphy, 2003) reporting on experiences in searching for victims at the World Trade Center observed that it was very difficult for an operator to handle both navigation and exploration of the environment from video information alone,  found that first responders using a robot to find victims in a mock environment made little use of the generated map. (Nielsen & Goodrich, 2006) by contrast, have attempted to remedy this through an ecological interface that fuses information by embedding the video display within the map. The resulting interface takes the 2D map and extrudes the identified surfaces to derive a 3D version resembling a world filled with cubicles. The robot is located on this map with the video window placed in front of it at the location being viewed. Result shows that search generated maps to be superior in assisting operators to escape from a maze. When considering such potential advantages and disadvantages of viewpoints it is important to realize that there are two, not one, important subtasks that are likely to engage operators (Tan et al., 2001). The escape task was limited to navigation, the act of explicitly moving the robot to different locations in the environment. In many applications search, the process of acquiring a specific viewpoint-or set of viewpoints-containing a particular object may be of greater concern. Because search relies on moving a viewpoint through the environment to find and better view target objects, it is an inherently egocentric task. This is not necessarily the case for navigation which does not need to identify objects but only to avoid them. Search, particularly multi-robot search, presents the additional problem of assuring that areas the robot has traversed have been thoroughly searched for targets. This requirement directly conflicts with the navigation task which requires the camera to be pointed in the direction of travel in order to detect and avoid objects and steer toward its goal. These difficulties are accentuated by the need to switch attention among robots which may increase the likelihood that a view containing a target will be missed. In earlier studies (Wang & Lewis, 2007a;Wang & Lewis 2007b) we have demonstrated that success in search is directly related to the frequency with which the operator shifts attention between robots and hypothesized that this might be due to victims missed while servicing other robots. Recent data , however, suggests that other effects involving situation awareness may be involved.

Asynchronous Imagery
To combat these problems of attentive sampling among cameras, incomplete coverage of searched areas, and difficulties in associating camera views with map locations we are investigating the potential of asynchronous control techniques previously used out of necessity in NASA applications as a solution to multi-robot search problems. Due to limited bandwidth and communication lags in interplanetary robotics camera views are closely planned and executed. Rather than transmitting live video and moving the camera about the scene, photographs are taken from a single spot with plans to capture as much of the surrounding scene as possible. These photographs taken with either an omnidirectional overhead camera (camera faces upward to a convex mirror reflecting 360•) and dewarped (Murphy, 1995, Shiroma et al., 2004 or stitched together from multiple pictures from a ptz camera (Volpe, 1999) provide a panorama guaranteeing complete coverage of the scene from a particular point. If these points are well chosen, a collection of panoramas can cover an area to be searched with greater certainty than imagery captured with a ptz camera during navigation. For the operator searching within a saved panorama the experience is similar to controlling a ptz camera in the actual scene, a property that has been used to improve teleoperation in a low bandwidth high latency application (Fiala, 2005). In our USAR application which requires finding victims and locating them on a map we merge map and camera views as in (Ricks, Nielsen, & Goodrich, 2004). The operator directs navigation from the map being generated with panoramas being taken at the last waypoint of a series. The panoramas are stored and accessed through icons showing their locations on the map. The operator can find victims by asynchronously panning through these stored panoramas as time becomes available. When a victim is spotted the operator uses landmarks from the image and corresponding points on the map to record the victim's location. By changing the task from a forced paced one with camera views that must be controlled and searched on multiple robots continuously to a self paced task in which only navigation needs to be controlled in realtime we hoped to provide a control interface that would allow more thorough search with lowered mental workload. The reductions in bandwidth and communications requirements (Bruemmer et al., 2005) are yet another advantage offered by this approach.

Pilot experiment
In a recent experiment reported in (Velagapudi et al., 2008) we compared performance for operators controlling 4 robot teams at a simulated USAR task using either streaming or asynchronous video displays. Search performance was somewhat better using the conventional interface with operators marking slightly more victims closer to their actual location at each degree of relaxation. This superiority, however, might have occurred simply because streaming video users had the opportunity to move closer to victims thereby improving their estimates of distance in marking the map. A contrasting observation was that frequency of shifting focus between robots, a practice we have previously found related to search performance (Scerri et al., 2004) was correlated with performance for streaming video participants but not for participants using asynchronous panoramas. Because operators using asynchronous video did not need to constantly switch between camera views to avoid missing victims we hypothesized that for larger team sizes where forced pace search might exceed the operator's attentional capacity asynchronous video might offer an advantage. The present experiment tests this hypothesis.

USARSim and MrCS
The experiment was conducted in the high fidelity USARSim robotic simulation environment  developed as a simulation of urban search and rescue (USAR) robots and environments intended as a research tool for the study of human-robot interaction (HRI) and multi-robot coordination. The MrCS (Multi-robot Control System), a multirobot communications and control infrastructure with accompanying user interface developed for experiments in multi-robot control and RoboCup competition (Wang & Lewis, 2007a) was used with appropriate modifications in both experimental conditions. MrCS provides facilities for starting and controlling robots in the simulation, displaying camera and laser output, and supporting inter-robot communication through Machinetta (Scerri et al., 2004) a distributed mutiagent system. The distributed control enables us to scale robot teams from small to large. Figures 2 and 3 show the elements of the MrCS involved in this experiment. In the standard MrCS (Fig. 2) the operator selects the robot to be controlled from the colored thumbnails at the top of the screen. Robots are tasked by assigning waypoints on a heading-up map through a teleoperation widget. The current locations and paths of the robots are shown on the Map Data Viewer. In the Panorama interface thumbnails are blanked out and images are acquired at the terminal point of waypoint sequences.   (Balakirsky et al., 2007) was selected for use in the experiment. The environment consisted of maze like halls with many rooms and obstacles, such as chairs, desks, cabinets, and bricks. Victims were evenly distributed throughout the environments. Robots were started at different locations leading to exploration of different but equivalent areas of the environment. A third simpler environment was used for training. The experiment followed a between groups design with participants searching for victims using either panorama or streaming video modes. Participants searched over three trials beginning with 4 robots, then searching with 8, and finally 12. Robots were started from different locations within a large environment making learning from previous trials unlikely.

Participants and procedure
29 paid participants were recruited from the University of Pittsburgh community. None had prior experience with robot control although most were frequent computer users. Approximately a quarter of the participants reported playing computer games for more than one hour per week. After collecting demographic data the participant read standard instructions on how to control robots via MrCS. In the following 15~20 minute training session, the participant practiced control operations for either the panorama or streaming video mode and tried to find at least one victim in the training environment under the guidance of the experimenter. Participants then began three testing sessions in which they performed the search task controlling 4, 8, and 12 robots.

Results
Data were analyzed using a repeated measures ANOVA comparing streaming video performance with that of asynchronous panoramas. On the performance measures, victims found and area covered, the groups showed nearly identical performance with victim identification peaking sharply at 8 robots accompanied by a slightly less dramatic maximum for search coverage (Fig. 4).

Fig. 4. Area Explored as a function of N robots (2 m)
The differences in precision for marking victims observed in the pilot study were found again. For victims marked within 2m, the average number of victims found in the panorama condition was 5.36 using 4 robots, 5.50 for 8 robots, but dropping back to 4.71 when using 12 robots. Participants in the Streaming condition were significantly more successful at this range, F 1,29 = 3.563, p < .028, finding 4.8, 7.07 and 4.73 victims respectively (Fig. 5). www.intechopen.com A similar advantage was found for victims marked within 1.5m, with the average number of victims found in the panorama condition dropping to 3.64, 3.27 and 2.93 while participants in the streaming condition were more successful, F 1,29 = 6.255, p < .0025, finding 4.067, 5.667 and 4.133 victims respectively (Fig. 6).

Fig. 6. Victims Found as a function of N robots (within 1.5 m)
Fan-out (Olsen & Wood, 2004) is a model-based estimate of the number of robots an operator can control. While Fan-out was conceived as an invariant measure, operators are noticed to adjust their criteria for adequate performance to accommodate the available robots Humphrey et al., 2006 ). We interpret Fan-out as a measure of attentional reserves. If Fan-out is greater than the number of robots, there are remaining reserves. If Fan-out is less than the number of robots, capacity has already been exceeded. Fan-out for the panorama conditions increased from 4.1, 7.6 and 11.1 for 4 to 12 robots. Fan-out, however, was uniformly higher in the streaming video condition, F 1,29 = 3.355, p < .034, with 4.4, 9.12 and 13.46 victims respectively (Fig.7). Number of robots had a significant effect on every dependent measure collected except waypoints per mission (a Mission means all the waypoints which the user issued for a robot with a final destination), which next lowest N switches in focus robot, F 2, 54 = 16.74, p < .0001. The streaming and panorama conditions were easily distinguished by some process measures. Both streaming and panorama operators followed the same pattern issuing the fewest waypoints per Mission to command 8 robots, however, panorama participants in the 8 robot condition issued observably fewer (2.96 vs. 3.16) waypoints (Fig.8).

Fig. 8. Waypoints issued per Mission
The closely related pathlength/mission measure follows a similar pattern with no interaction but significantly shorter paths (5.07 m vs. 6.19 m) for panorama participants, F 2,54 = 3.695, p = .065 (Fig. 9). The other measures like number of missions and switches between robots in focus by contrast were nearly identical for the two groups showing only the recurring significant effect for N robots. A similar closeness is found for NASA-TLX workload ratings which rise together monotonically for N robots (Fig. 10).

Discussion
The most unexpected thing about these data is how similar the performance of streaming and asynchronous panorama participants was. The tasks themselves appear quite dissimilar. In the panorama condition participants direct their robots by adding waypoints to a map without getting to see the robots' environment directly. Typically they tasked robots sequentially and then went back to look at the panoramas that had been taken. Because panorama participants were unable to see the robot's surrounding except at terminal waypoints, paths needed to be shorter and contain fewer waypoints in order to maintain situation awareness and avoid missing potential victims. Despite fewer waypoints and shorter paths, panorama participants managed to cover the same area as streaming video participants within the same number of missions. Ironically, this greater efficiency may have resulted from the absence of distraction from streaming video  and is consistent with (Nielsen & Goodrich, 2006) in finding maps especially useful for navigating complex environments. Examination of pauses in the streaming video condition failed to support our hypothesis that these participants would execute additional maneuvers to examine victims. Instead, streaming video participants seemed to follow the same strategy as panorama participants of directing robots to an area just inside the door of each room. This leaves panorama participants' inaccuracy in marking victims unexplained other than through a general loss of situation awareness. This explanation would hold that lacking imagery leading up to the panorama, these participants have less context for judging victim location within the image and must rely on memory and mental transformations.
Panorama participants also showed lower Fan-out perhaps as a result of issuing fewer waypoints for shorter paths leading to more frequent interactions. While differences in switching focus among robots were found in our earlier study (Wang & Lewis, 2007b) the present data (figure 7) show performance to be almost identical. Our original motivation for developing a panorama mode for MrCS was to address restrictions posed by a communications server added to RoboCup Rescue competition to simulate bandwidth limitations and drop-outs due to attenuation from distance and obstacles. Although the panorama mode was designed to drastically reduce bandwidth and allow operation despite intermittent communications our system was so effective we decided to test it under conditions most favorable to a conventional interface. Our experiment shows that under such conditions allowing uninterrupted, noise free, streaming video a conventional interface leads to somewhat equal or better search performance. Furthermore, while we undertook this study to determine whether asynchronous video might prove beneficial to larger teams we found performance to be essentially equivalent to the use of streaming video at all team sizes with a small sacrifice of accuracy in marking victims. This surprising finding suggests that in applications that may be too bandwidth limited to support streaming video or involve substantial lags; map-based displays with stored panoramas may provide a useful display alternative without seriously compromising performance.

Future work
The reported experiment is one of a series exploring human control over increasingly large robot teams. We are seeking to discover and develop techniques and strategies for allocating tasks among teams of humans and robots in ways that improve overall efficiency. By analogy to computational complexity we have argued that command tasks can also be classified by complexity. Some task-centric rather than platform-centric commands such specifying an area to be searched would have a complexity of O(1) since they are independent of the number of UVs. Others such as authorizing a target or responding to a request for assistance that involve commanding individual UVs would be O(n). Still others that require UVs to be coordinated would have higher levels of complexity and rapidly exceed human capabilities. Framing the problem this way leads to the design conclusion that commanders should be issuing task-centric commands, UV operators should be handling independent UV specific tasks (perhaps for multiple UVs), and coordination among UVs (in accordance with the commander's intent) should be automated to as great an extent as possible. The reported experiment is one of a series investigating O(n) control of multiple robots. We model robots as being controlled in a round robin fashion (Crandall et al., 2004) with additional robots imposing an additive load on the operator's cognitive resources until they are exceeded. Because O(n) tasks are independent, the number of robots can safely be increased either by adding additional operators or increasing the autonomy of individual robots. In a recent study (Wang et al., 2009a) we showed that if operators are relieved of the need to navigate they could successfully command more than 12 UVs. Conversely, teams of operators might command teams of robots more efficiently if robots' needs for interaction could be scheduled across operators. A recent experiment (Wang et al., 2009b) showed that without additional automation, operators commanding 24 robots were slightly more effective controlling 12 independently. In a planned experiment we will compare these two conditions with navigation automated. In other work we are investigating both O(1) control and interaction with autonomously coordinating robots. We envision multirobot systems requiring human input at all of these levels to provide tools that can effectively follow their commander's intent.