Generating natural motion in an android by mapping human motion

One of the main aims of humanoid robotics is to develop robots that are capable of interacting naturally with people. However, to understand the essence of human interaction, it is crucial to investigate the contribution of behavior and appearance. Our group's research explores these relationships by developing androids that closely resemble human beings in both aspects. If humanlike appearance causes us to evaluate an android's behavior from a human standard, we are more likely to be cognizant of deviations from human norms. Therefore, the android's motions must closely match human performance to avoid looking strange, including such autonomic responses as the shoulder movements involved in breathing. This paper proposes a method to implement motions that look human by mapping their three-dimensional appearance from a human performer to the android and then evaluating the verisimilitude of the visible motions using a motion capture system. This approach has several advantages over current research, which has focused on copying a person's moving joint angles to a robot: (1) in an android robot with many degrees of freedom and kinematics that differs from that of a human being, it is difficult to calculate which joint angles would make the robot's posture appear similar to the human performer; and (2) the motion that we perceive is at the robot's surface, not necessarily at its joints, which are often hidden from view.


I. INTRODUCTION
Much effort in recent years has focused on the development of such mechanical-looking humanoid robots as Honda's Asimo and Sony's Qrio with the goal of partnering them with people in daily situations.Just as an industrial robot's purpose determines its appearance, a partner robot's purpose will also determine its appearance.Partner robots generally adopt a roughly humanoid appearance to facilitate communication with people, because natural interaction is the only task that requires a humanlike appearance.In other words, humanoid robots mainly have significance insofar as they can interact naturally with people.Therefore, it is necessary to discover the principles underlying natural interaction to establish a methodology for designing interactive humanoid robots.
Kanda et al. [1] have tackled this problem by evaluating how the behavior of the humanoid robot "Robovie" affects human-robot interaction.But Robovie's machine-like appearance distorts our interpretation of its behavior because of the way the complex relationship between appearance and behavior influences the interaction.Most research on interactive robots has not evaluated the effect of appearance (for exceptions, see [2] [3]) -and especially not in a robot that closely resembles a person .Thus, it is not yet clear whether the most comfortable and effective humanrobot communication would come from a robot that looks mechanical or human.However, we may infer a humanlike appearance is important from the fact that human beings have developed neural centers specialized for the detection and interpretation of hands and faces [4] [5] [6].A robot that closely resembles humans in both looks and behavior may prove to be the ultimate communication device insofar as it can interact with humans the most naturally. 1We refer to such a device as an android to distinguish it from mechanicallooking humanoid robots.When we investigate the essence of how we recognize human beings as human, it will become clearer how to produce natural interaction.Our study tackles the appearance and behavior problem with the objective of realizing an android and having it be accepted as a human being [7].
Ideally, to generate humanlike movement, an android's kinematics should be functionally equivalent to the human musculoskeletal system.Some researchers have developed a joint system that simulates shoulder movement [8] and a muscle-tendon system to generate humanlike movement [9].However, these systems are too bulky to be embedded in an android without compromising its humanlike appearance.Given current technology, we embed as many actuators as possible to provide many degrees of freedom insofar as this does not interfere with making the android look as human as possible [7].Under these constraints, the main issue concerns how to move the android in a natural way so that its movement may be perceived as human.
A straightforward way to make a robot's movement more humanlike is to imitate human motion.Kashima and Isurugi [10] extracted essential properties of human arm trajectories and designed an evaluation function to generate robot arm trajectories accordingly.Another method is to copy human motion as measured by a motion capture system to a humanoid robot.Riley et al. [11] and Nakaoka et al. [12] calculated a performer's joint trajectories from the measured positions of markers attached to the body and fed them to the joints of a humanoid robot.In these studies the authors assumed the kinematics of the robot to be similar to that of a human body.However, the more complex the robot's kinematics, the more difficult it is to calculate which joint angles will make the robot's posture similar to the performer's joint angles as calculated from motion capture data.Therefore, it is possible the assumption that the two joint systems are comparable results in visibly different motion in some cases.This is especially a risk for androids because their humanlike form makes us more sensitive to deviations from human ways of moving.Thus, slight differences could strongly influence whether the android's movement is perceived as natural or human.Furthermore, these studies did not evaluate the naturalness of robot motions.Hale et al. [13] proposed several evaluation functions to generate a joint trajectory (e.g., minimization of jerk) and evaluated the naturalness of generated humanoid robot movements according to how human subjects rated their naturalness.In the computer animation domain, researchers have tackled a motion synthesis with motion capture data (e.g., [14]).However, we cannot apply their results directly; we must instead repeat their experiment with an android because the results from an android testbed could be quite different from those of a humanoid testbed.For example, Mori described a phenomenon he termed the "uncanny valley" [15], [16], which relates to the relationship between how humanlike a robot appears and a subject's perception of familiarity.According to Mori, a robot's familiarity increases with its similarity until a certain point is reached at which slight "nonhuman" imperfections cause the robot to appear repulsive (Fig. 1).This would be an issue if the similarity of androids fell into the chasm.(Mori believes mechanicallooking humanoid robots lie on the left of the first peak.)This nonmonotonic relationship can distort the evaluation proposed in existing studies.Therefore, it is necessary to develop a motion generation method in which the generated "android motion" is perceived as human.
This paper proposes a method to transfer human motion measured by a motion capture system to the android by copying changes in the positions of body surfaces.This method is called for because the android's appearance demands movements that look human, but its kinematics is sufficiently different that copying joint-angle information would not yield good results.Comparing the similarity of the android's visible movement to that of a human being enables us to develop more natural movements for the android.
In the following sections, we describe the developed android and mention the problem of motion transfer and our basic idea about the way to solve it.Then we describe the proposed method in detail and show experimental results from applying it to the android.II.THE ANDROID Fig. 2 shows the developed android called Repliee Q2.The android resembles an Asian woman because it is modeled after a Japanese woman.The standing height is about 160 cm.The skin is composed of a kind of silicone that has a humanlike feel and neutral temperature.The silicone skin covers the upper torso, neck, head, and forearms with clothing covering other body parts.Unlike Repliee R1 [17], [7], silicone skin does not cover the entire body so as to facilitate flexibility and a maximal range of motion.The soft skin gives the android a human look and enables natural tactile interaction.To lend realism to the android's appearance, we took a cast of a person to mold the android's skin.Forty-two highly sensitive tactile sensors composed of piezo diaphragms  are mounted under the android's skin and clothes throughout the body, except for the shins, calves, and feet.Since the output value of each sensor corresponds to its deforming rate, the sensors can distinguish different kinds of touch ranging from stroking to hitting.The android is driven by air actuators that give it 42 degrees of freedom (DoFs) from the waist up.(The legs and feet are not powered.)The configuration of the DoFs is shown in Table I.The android can generate a wide range of motions and gestures as well as various kinds of micro-motions such as the shoulder movements typically caused by human breathing.The DoFs of the shoulders enable them to move up and down and backwards and forwards.Furthermore, the android can make some facial expressions and mouth shapes, as shown in Fig. 3.The compliance of the air actuators makes for a safer interaction with movements that are generally smoother.Because the android has servo controllers, it can be controlled by sending desired joint positions from a host computer.Parallel link mechanisms adopted in some parts complicate the kinematics of the android.

A. The basic idea
One method to realize humanlike motion in a humanoid robot is through imitation.Thus, we consider how to map human motion to the android.Most previous research assumes the kinematics of the human body is similar to that of the robot except for the scale.Thus, they aim to reproduce human motion by reproducing kinematic relations across time and, in particular, joint angles between links.For example, the three-dimensional locations of markers attached to the skin are measured by a motion capture system, the angles of the body's joints are calculated from these positions, and these angles are transferred to the joints of the humanoid robot.It is assumed that by using a joint angle space (which does not represent link lengths), morphological differences between the human subject and the humanoid robot can be ignored.
However, there is potential for error in calculating a joint angle from motion capture data.The joint positions are assumed to be the same between a humanoid robot and the human performer who serves as a model; however, the kinematics in fact differs.For example, the kinematics of Repliee Q2's shoulder differs significantly from those of human beings.Moreover, as human joints rotate, each joint's center of rotation changes, but joint-based approaches generally assume this is not so.These errors are perhaps more pronounced in Repliee Q2, because the android has many degrees of freedom and the shoulder has a more complex kinematics than existing humanoid robots.These errors are more problematic for an android than a mechanical-looking humanoid robot because we expect natural human motion from something that looks human and are disturbed when the motion instead looks inhuman.
To create movement that appears human, we focus on reproducing positional changes at the body's surface rather than changes in the joint angles.We then measure the postures of a person and the android using a motion capture system and find the control input to the android so that the postures of person and android become similar to each other.

B. The method to transfer human motion
We use a motion capture system to measure the postures of a human performer and the android.This system can measure the three-dimensional positions of markers attached to the surface of bodies in a global coordinate space.First, some markers are attached to the android so that all joint motions can be estimated.The reason for this will become ,x@ Fig. 5.The feedback controller with and without the estimation of the android's joint angle clear later.Then the same number of markers are attached to corresponding positions on the performer's body.We must assume the android's surface morphology is not too different from the performer's.
We use a three-layer neural network to construct a mapping from the performer's posture to the android's control input, which is the desired joint angle.The reason for the network is that it is difficult to obtain the mapping analytically.To train a neural network to map from x h to q a would require thousands of pairs of x h , q a as training data, and the performer would need to assume the posture of the android for each pair.We avoid this prohibitively lengthy task in data collection by adopting feedback error learning (FEL) to train the neural network.Kawato et al. [18] proposed feedback error learning as a principle for learning motor control in the brain.This employs an approximate way of mapping sensory errors to motor errors that subsequently can be used to train a neural network (or other method) by supervised learning.Feedbackerror learning neither prescribes the type of neural network employed in the control system nor the exact layout of the control circuitry.We use it to estimate the error between the postures of the performer and the android and feed the error back to the network.Fig. 4 shows the block diagram of the control system, where the network mapping is shown as the feedforward controller.The weights of the feedforward neural network are learned by means of a feedback controller.The method has a two-degrees-of-freedom control architecture.The network tunes the feedforward controller to be the inverse model of the plant.Thus, the feedback error signal is employed as a teaching signal for learning the inverse model.If the inverse model is learned exactly, the output of the plant tracks the reference signal by feedforward control.The performer and android's marker positions are represented in their local coordinates x h , x a ∈ R 3m ; the android's joint angles q a ∈ R n can be observed by a motion capture system and a potentiometer, where m is the number of markers and n is the number of DoFs of the android.
The feedback controller is required to output the feedback control input ∆q b so that the error in the marker's position ∆x d = x a − x h converges to zero (Fig. 5(a)).However, it is difficult to obtain ∆q b from ∆x d .To overcome this, we assume the performer has roughly the same kinematics as the android and obtain the estimated joint angle qh simply by calculating the Euler angles (hereafter the transformation from marker positions to joint angles is described as T ). 2onverging qa to q h does not always produce identical postures because qh is an approximate joint angle that may include transformation error (Fig. 5(b)).Then we obtain Performer Android Marker Fig. 6.The marker positions corresponding to each other the estimated joint angle of the android qa using the same transformation T and the feedback control input to converge qa to qh (Fig. 5(c)).This technique enables x a to approach x h .The feedback control input approaches zero as learning progresses, while the neural network constructs the mapping from x h to the control input q d .We can evaluate the apparent posture by measuring the android posture.
In this system we could have made another neural network for the mapping from x a to q a using only the android.As long as the android's body surfaces are reasonably close to the performer's, we can use the mapping to make the control input from x h .Ideally, the mapping must learn every possible posture, but this is quite difficult.Therefore, it is still necessary for the system to evaluate the error in the apparent posture.

A. Experimental setting
To verify the proposed method, we conducted an experiment to transfer human motion to the android Repliee Q2.We used 21 of the android's 42 DoFs by excluding the 13 DoFs of the face, the 4 of the wrists, and the 4 of the fingers (n = 21).We used a Hawk Digital System, 3 which can track more than 50 markers in real-time.The system is highly accurate with a measurement error of less than 1 mm.Twenty markers were attached to the performer and another 20 to the android as shown in Fig. 6 (m = 20).Because the android's waist is fixed, the markers on the waist set the frame of reference for an android-centered coordinate space.To facilitate learning, we introduce a representation of the marker position x h , x a as shown in Fig. 7.The effect of waist motions are removed with respect to the markers on the head.To avoid accumulating the position errors at the end of the arms, vectors connecting neighboring pairs of markers represent the positions of the markers on the arms.We used arc tangents for the transformation T , in which the joint angle is an angle between two neighboring links where a link consists of a straight line between two markers.
The feedback controller outputs ∆q b = K∆q d , where the gain K consists of a diagonal matrix.There are 60 nodes in the input layer (20 markers × x, y, z), 300 in the hidden layer, and 21 in the output layer (for the 21 DoFs).Using 300 units in the hidden layer provided a good balance between computational efficiency and accuracy.Using significantly fewer units resulted in too much error, while using significantly more units provided only marginally higher accuracy but at the cost of slower convergence.The error signal to the network is t = α∆q b , where the gain α is a small number.The sampling time for capturing the marker positions and controlling the android is 60 ms.Another neural network which has the same structure previously learned the mapping from x a to q a to set the initial values of the weights.We obtained 50,000 samples of training data (x a and q a ) by moving the android randomly.The learned network is used to set the initial weights of the feedforward network.

B. Experimental results and analysis 1) Surface similarity between the android and performer:
The proposed method assumes a surface similarity between the android and the performer.However, the male performer whom the android imitates in the experiments was 15 cm taller than the women after whom the android was modeled.
To check the similarity, we measured the average distance between corresponding pairs of markers when the android  and performer make each of the given postures; the value was 31 mm (see the Fig. 6).The gap is small compared to the size of their bodies, but it is not small enough.
2) The learning of the feedforward network: To show the effect of the feedforward controller, we plot the feedback control input averaged among the joints while learning from the initial weights in Fig. 8.The abscissa denotes the time step (the sampling time is 60 ms.)Although the value of the ordinate does not have a direct physical interpretation, it corresponds to a particular joint angle.The performer exhibited various fixed postures.When the performer started to make the posture at step 0, error increased rapidly because network learning had not yet converged.The control input decreases as learning progresses.This shows that the feedforward controller learned so that the feedback control input converges to zero.Fig. 9 shows the average position error of a pair of corresponding markers.The performer also gave an arbitrary fixed posture.The position errors and the feedback control input both decreased as the learning of the feedforward network converged.The result shows the feedforward network learned the mapping from the performer's posture to the android control input, which allows the android to adopt the same posture.The android's posture could not match Fig. 10.The step response of the android the performer's posture when the weights of the feedforward network were left at their initial values.This is because the initial network was not given every possible posture in the pre-learning phase.The result shows the effectiveness of the method to evaluate the apparent posture.
3) Performance of the system at following fast movements: To investigate the performance of the system, we obtained a step response using the feedforward network after it had learned enough.The performer put his right hand on his knee and quickly raised the hand right above his head.Fig. 10 shows the height of the fingers of the performer and android.The performer started to move at step 5 and reached the final position at step 9, approximately 0.24 seconds later.In this case the delay is 26 steps or 1.56 seconds.The arm moved at roughly the maximum speed permitted by the hardware.The android arm cannot quite reach the performer's position because the performer's position was outside of the android's range of motion.Clearly, the speed of the performer's movement exceeds the android's capabilities.This experiment is an extreme case.For less extreme gestures, the delay will be much less.For example, for the sequence in Fig. 11, the delay was on average seven steps or 0.42 seconds.
4) The generated android motion: Fig. 11 shows the performer's postures during a movement and the corresponding postures of the android.The value denotes the time step.The android followed the performer's movement with some delay (the maximum is 15 steps, that is, 0.9 seconds).The trajectories of the positions of the android's markers are considered to be similar to those of the performer, but errors still remain, and they cannot be ignored.While we can recognize that the android is making the same gesture as the performer, the quality of the movement is not the same.There are a couple of major causes of this: • The kinematics of the android is too complicated to represent with an ordinary neural network.To avoid this limitation, it is possible to introduce the constraint of the body's branching in the network connections.Another idea is to introduce a hierarchical representation of the mapping.A human motion can be decomposed into a dominant motion that is at least partly driven consciously and secondary motions that are mainly nonconscious (e.g., contingent movements to maintain balance, such autonomic responses as breathing).We are trying to construct a hierarchical representation of motion not only to reduce the computational complexity of learning but to make the movement appear more natural.
• The method deals with a motion as a sequence of postures; it does not precisely reproduce higher order properties of motion such as velocity and acceleration because varying delays can occur between the performer's movement and the android's imitation of it.If the performer moves very quickly, the apparent motion of the android differs.Moreover, a lack of higher order properties prevents the system from adequately compensating for the dynamic characteristics of the android and the delay of the feedforward network.• The proposed method is limited by the speed of motion.
It is necessary to consider the properties to overcome the restriction, although the android has absolute physical limitations such as a fixed compliance and a maximum speed that is less than that of a typical human being.
Although physical limitations cannot be overcome by any control method, there are ways of finessing them to ensure movements still look natural.For example, although the android lacks the opponent musculature of human beings, which affords a variable compliance of the joints, the wobbly appearance of such movements as rapid waving, which are high in both speed and frequency, can be overcome by slowing the movement and removing repeated closed curves in the joint angle space to eliminate lag caused by the slowed movement.If the goal is humanlike movement, one approach may be to query a database of movements that are known to be humanlike to find the one most similar to the movement made by the performer, although this begs the question of where those movements came from in the first place.Another method is to establish criteria for evaluating the naturalness of a movement [10].This is an area for future study.

C. Required improvement and future work
In this paper we focus on reproducing positional changes at the body's surface rather than changes in the joint angles to generate the android's movement.Fig. 5(a) is a straightforward method to implement the idea.This paper has adopted the transformation T from marker positions to estimated joint angles because it is difficult to derive a feedback controller which produces the control input ∆q b only from the error in the marker's positional error ∆x d analytically.We actually do not know which joints should be moved to remove a positional error at the body's surface.This relation must be learned, however, the transformation T could disturb the learing.Hence, it is not generally guaranteed that the feedback controller which converges the estimated joint angle qa to qh enables the marker's position x a to approach x h .The assumption that the android's body surfaces are reasonably close to the performer's could avoid this problem, but the feedback controller shown in Fig. 5(a) is essentially necessary for mapping the apparent motion.It is possible to find out how the joint changes relate to the movements of body surfaces by analyzing the weights of the neural network of the feedforward controller.A feedback controller could be designed to output the control input based on the error in the marker's position with the analyzed relation.Concerning the design of the feedback controller, Oyama et al. [22], [23], [24] proposed several methods for learning both of feedback and feedforward controllers using neural networks.This is one potential method to obtain the feedback controller shown in Fig. 5(a).Assessment of and compensation for deformation and displacement of the human skin, which cause marker movement with respect to the underlying bone [25], are also useful in designing the feedback controller.
We have not dealt with the android's gaze and facial expressions in the experiment; however, if gaze and facial expressions are unrelated to hand gestures and body movements, the appearance is often unnatural, as we have found in our experiments.Therefore, to make the android's movement appear more natural, we have to consider a method to implement the android's eye movements and facial expressions.

V. CONCLUSION
This paper has proposed a method of implementing humanlike motions by mapping their three-dimensional appearance to the android using a motion capture system.By measuring the android's posture and comparing it to the posture of a human performer, we propose a new method to evaluate motion sequences along bodily surfaces.Unlike other approaches that focus on reducing joint angle errors, we consider how to evaluate differences in the android's apparent motion, that is, motion at its visible surfaces.The experimental results show the effectiveness of the evaluation: the method can transfer human motion.However, the method is restricted by the speed of the motion.We have to introduce a method to deal with the dynamic characteristics and physical limitations of the android.We also have to evaluate the method with different performers.We would expect to generate the most natural and accurate movements using a female performer who is about the same height as the original woman on which the android is based.Moreover, we have to evaluate the human likeness of the visible motions by the subjective impressions the android gives experimental subjects and the responses it elicits, such as eye contact [26], [27], autonomic responses, and so on.Research in these areas is in progress.

Fig. 3 .Fig. 4 .
Fig. 3. Examples of motion and facial expressions Error estimation with the androids joint angle measured by the potentiometer (c) Error estimation with the androids joint angle estimated from the androids marker position

Fig. 7 .
Fig. 7.The representation of the marker positions.A marker's diameter is about 18 mm.

Fig. 8 .
Fig. 8.The change of the feedback control input with learning of the network

Fig. 9 .
Fig. 9.The change of the position error with learning of the network

Fig. 11 .
Fig. 11.The generated android's motion compared to the performer's motion.The number represents the step.