Exploring and Understanding the High Dimensional and Sparse Image Face Space: a Self-Organized Manifold Mapping

Face recognition has motivated several research studies in the last years owing not only to its applicability and multidisciplinary inherent characteristics, but also to its important role in human relationship. Despite extensive studies on face recognition, a number of related problems has still remained challenging in this research topic. It is well known that humans can overcome any computer program in the task of face recognition when artefacts are present such as changes in pose, illumination, occlusion, aging and etc. For instance, young children can robustly identify their parents, friends and common social groups without any previous explicit teaching or learning. Some recent research in Neuroscience (Kandel et al., 2000; Bakker et al., 2008) has shown that there is some new information about how humans deal with such high dimensional and sparse visual recognition task, indicating that the brain does not memorize all details of the visual stimuli (images) to perform face recognition (Brady et al., 2008). Instead, our associative memory tends to work essentially on the most expressive information (Bakker et al., 2008; Oja, 1982). In fact, theoretical models (Treves and Rolls, 1994; O’Reilly and Rudy, 2001; Norman and O’Reilly, 2003) have indicated that the ability of our memory relies on the capability of orthogonalizing (pattern separation) and completing (pattern prototyping) partial patterns in order to encode, store and recall information (O’Reily and McClelland, 1994; Kuhl et al., 2010). Therefore, subspace learning techniques have a close biological inspiration and reasonability in terms of computational methods to possibly exploring and understanding the human behaviour of recognizing faces. The aim of this chapter is to study the non-supervised subspace learning called SelfOrganizing Map (SOM) (Kohonen, 1982; Kohonen, 1990) based on the principle of prototyping face image observations. Our idea with this study is not only to seek a low dimensional Euclidean embedding subspace of a set of face samples that describes the intrinsic similarities of the data (Kitani et al., 2006; Giraldi et al., 2008; Thomaz et al., 2009; Kitani et al., 2010), but also to explore an alternative mapping representation based on manifold models topologically constrained.


Introduction
Face recognition has motivated several research studies in the last years owing not only to its applicability and multidisciplinary inherent characteristics, but also to its important role in human relationship. Despite extensive studies on face recognition, a number of related problems has still remained challenging in this research topic. It is well known that humans can overcome any computer program in the task of face recognition when artefacts are present such as changes in pose, illumination, occlusion, aging and etc. For instance, young children can robustly identify their parents, friends and common social groups without any previous explicit teaching or learning. Some recent research in Neuroscience (Kandel et al., 2000;Bakker et al., 2008) has shown that there is some new information about how humans deal with such high dimensional and sparse visual recognition task, indicating that the brain does not memorize all details of the visual stimuli (images) to perform face recognition (Brady et al., 2008). Instead, our associative memory tends to work essentially on the most expressive information (Bakker et al., 2008;Oja, 1982). In fact, theoretical models (Treves and Rolls, 1994;O'Reilly and Rudy, 2001;Norman and O'Reilly, 2003) have indicated that the ability of our memory relies on the capability of orthogonalizing (pattern separation) and completing (pattern prototyping) partial patterns in order to encode, store and recall information (O'Reily and McClelland, 1994;Kuhl et al., 2010). Therefore, subspace learning techniques have a close biological inspiration and reasonability in terms of computational methods to possibly exploring and understanding the human behaviour of recognizing faces. The aim of this chapter is to study the non-supervised subspace learning called Self-Organizing Map (SOM) (Kohonen, 1982;Kohonen, 1990) based on the principle of prototyping face image observations. Our idea with this study is not only to seek a low dimensional Euclidean embedding subspace of a set of face samples that describes the intrinsic similarities of the data (Kitani et al., 2006;Giraldi et al., 2008;Thomaz et al., 2009;Kitani et al., 2010), but also to explore an alternative mapping representation based on manifold models topologically constrained.
More specifically, the purpose of this work is to navigate on the locally optimal pathways composed of the SOM neurons to minimize inappropriate mappings where the standard SOM might show significant discontinuities and compare such visualization procedures on the original image space to understand the most important information captured by the non-supervised model. To minimize image variations that are not necessarily related to differences between the faces, we will carry out experiments on frontal face images available from two distinct public face databases that have been previously aligned using affine transformations and the directions of the eyes as a measure of reference. In this way, the pixel-wise features extracted from the images correspond roughly to the same location across all subjects. In addition, in order to reduce the surrounding illumination and some image artefacts due to distinct hairstyle and adornments, all the frontal images have been cropped to the size of 193x162 pixels, had their histograms equalized and have been converted to 8-bit gray scale. Our experimental results on the two distinct face image sets show that although the standard SOM can explain the general information extracted by its neurons, its intrinsic self-organized manifolds can be better described by an algorithm based on the principle of the locally optimal pathways and the idea of navigating on the graphs composed of the standard SOM neurons. The remaining of this chapter is organized as follows. In the next section, we briefly review some literature about perceptual and cognitive processes related to human memory and the mechanisms of pattern completion and pattern separation. Next, in the third section, we provide some background definition of SOM and highlight shortly its biological principle of organization that inspired Kohonen in the early eighty's. Also, in the same section, we introduce the standard SOM algorithm based on the competitive learning rule. The main contribution of the chapter is then presented in the subsequent subsection entitled A Self-Organized Manifold Mapping (SOMM) Algorithm. In this subsection, we describe a new algorithm that is able to understand the information extracted from the data, identifying and explaining the nature of the groups or clusters defined by the SOM manifolds. The two distinct public face databases used to carry out the experiments are described in the fourth section. Next, in the fifth section, we show several experimental results to demonstrate the effectiveness of the SOMM algorithm on providing an intuitive explanation of the topologically constrained manifolds modelled by SOM in well-framed face image analysis. Finally, in the last section of the chapter, we conclude this work, summarizing its main points.

Neurological and psychological aspects
Several perceptual and cognitive processes guide the task of face recognition in humans. However, one of the most important processes is the memory. Humans do not memorize all the details and features received by the sensory system (Purves et al., 2001). In fact, the human brain has an outstanding capability of forgetting useless information (Brady et al., 2008, Purves et al., 2001. Basically, human memory can be divided into two groups: declarative and non-declarative memory (Purves et al., 2001). Declarative memory is related to memorizing facts and events and can be accessed for conscious recollection. Facts are information learned during a high level cognition process, such as studying some specific subject. Events are information that one has had as a life experience, for example: birthday, wedding, etc. Episodes at nondeclarative memories, on the other hand, are information that cannot be accessed formally.
In other words, it cannot be explained explicitly by words and neither how it occurs nor happens. Examples of non-declarative memory are: physical skills such as swimming, riding a bicycle, or emotional responses such as fear, happiness, etc. Additionally, memories are also categorized as short-term-memory and long-term-memory (Purves et al., 2001). Short-term memories have a limited capacity to hold information and consequently retain it during short period of time (Anderson;2005), but long-term ones tend to retain it permanently. The process that converts information into long-term memory is known as memory consolidation (Bear, Connors, Paradiso;. The memory consolidation is part of our learning process and is strongly necessary, for instance, to the face-matching task (Kandel et al., 2000). The brain area responsible for storing the declarative memory is called the Medial Temporal Lobe (MTL) (Bear, Connors, Paradiso;. The MTL is a complex interconnected systems of the brain and one of its most important structures is the hippocampus. Recent experiments carried out on rats have showed that lesions at the hippocampus might affect our capability of learning and retaining information (Bear, Connors, Paradiso;. Yet, in the past, a computational model presented by Treves & Rolls (Treves and Rolls, 1994) had already indicated that some parts of the hippocampus seem to create a sparse and orthogonalized representation of our sensory input and episodic memories. Currently, there is no doubt that the hippocampus plays an important role to encode new episodic memories and, additionally, to prevent the risk of forgetting past memories (Kandel et al., 2000, Kuhl et al., 2010. Using high-resolution (1.5 millimeters isotropic voxels) functional Magnetic Resonance Imaging (fMRI), Bakker et al. (Bakker et al., 2008) have studied the activity in the human brain MTL area on a set of pattern visualization experiments. The experiments consisted of presenting to each one of a total of eighteen volunteers a sequence of pictures of common objects, such as apples, toys duck, thread balls, wall outlet and etc. The set of pictures used is composed of 144 subsets of slightly different images of the same object, with essentially variation in pose and rotation. The authors have noticed that several brain structures of the MTL area, especially a specific area of hippocampus named CA1, have been activated when pictures of the same object have been presented repetitively and in a interleaved way. In fact, our brain process of retrieving information can be further described by two main mechanisms: pattern completion and pattern separation (Kuhl et al., 2010). The mechanism of pattern completion is essentially related to the problem where the incoming pattern of some sensory input and the pattern stored in the memory are not exactly the same, but share some similarities. In the mechanism of pattern separation, the similarities between the incoming and stored patterns, if do exist, are minimal and both patterns have, in contrast, a strong degree of dissimilarities that can be mathematically considered as non-correlated or orthogonal. This work focuses on the mechanism of pattern completion and the role of the human brain hippocampus as an associative memory to propose a new algorithm for the SOM competitive neural network proposed by Kohonen (Kohonen;. Since this pioneering work, it has been argued that SOM is not only a computational approach for data mining and clustering, but also a credible framework at the functional and neural levels to create a self-organization of the input space  and model the human memory activities of encoding and retrieving information.

Self-Organizing Map (SOM)
A formal definition of organization is quite complex because it depends on the context. Some crystal structures are considered highly organized due to their symmetry and structural repetition. Functions and hierarchy organize all biological structures, such as the nervous system, digestive system, circulatory system, etc (Kandel et al., 2000). However, in both cases, the definitions of "organization" are ambiguous. For crystal structures, one finds symmetries and redundancy; on the other hand, a biological system is organized by functions. However, both definitions have in common the sense of similarity that allows us to cluster and hierarchize input patterns. In other words, organization is an association and composition of parts to explore a whole structure or behavior (Asby, 1962;Atlan, 1974). According to the definition above, clustering is quite related with similarities or even dissimilarities. SOM is an unsupervised neural network developed by Kohonen (Kohonen, 1982;Kohonen, 1990) based on the biological principle of somatosensory organization. According to Kandel et al. (Kandel et al., 2000), there is a functional organization of perception and movement in human and mammals brain. There is also a specialized area in the brain cortex that organizes information coming from sensory pathways or going to motor control. Somatosensory cortex is the area accounting for organizing stimulus coming from different sensory systems, grouping them according to their similarities. In a similar fashion, motor cortex has surfaces dedicated to controlling parts of the body related to movement.
This organization in substructures by functions is well-known by neuroscientists, however, why the brain creates this organization remains unclear (Purves et al., 2001). Based on the biological principle of organization, Kohonen postulates that there are some reasons to have this organization: a) grouping similar stimulus minimizes neural wiring; b) creates a robust and logical structure in the brain, avoiding "crosstalk"; c) from information organized by attributes a natural manifold structure from input patterns can emerge; and d) reduces dimensionality by creating representations (codebooks vectors) that preserves neighborhood relationship between input patterns. Each codebook, also known as BMU (Best Match Unit), retains the most important invariant features that represent a group of input patterns, characterizing an arguable but intuitively analogous behaviour to the pattern completion mechanism of the human brain.

The Standard SOM algorithm
SOM can be defined as an unsupervised artificial neural network that maps a nonlinear relationship between input patterns in high dimensional space and makes this relationship an ordered and smoothed mapping of input data manifold. SOM has a competitive learning rule, but does not have a rule of convergence or function to minimize. Instead, the algorithm of SOM works with a number of interactions during weight adaptation. Figure 1 illustrates a Kohonen network of 33  output neurons fully connected to the input layer composed by only two neurons. The network is created from a 2D lattice of 'nodes' composed of the output neurons and the input layer. Each output neuron has a specific position   2 , xy   and contains a vector of weights of the same dimension as the input vector. That is, if the network has m output neurons and the training set consists of vectors

A Self-organized manifold mapping algorithm
Several studies have provided us with some insight about how to interpret the output of SOMs (Brugger et al., 2008;Bauer & Pawelzik, 1992;Kiviluoto, 1995). One of the bestknown tools in this regard is the U-Matrix (Ultsch, 2003) that gives us a quantitative summary of the topological relationships between similar data samples. The result of the U-Matrix map is a complex image (coloured or monochromatic) indicating peaks and valleys that represent Euclidean distances between neighboured neurons. Essentially, the resulting map preserves the topological distribution at the input space of the entire sample data considered. Figure 2 illustrates an example of a coloured U-Matrix map and its hexagonal 54  SOM, where each neuron ij w , 001 , iM jm     , has been arbitrary identified by a number. It is possible to see at least two groups of patterns in blue separated by a central chain in red. The chain of high values in the U-matrix indicated by the reddish colours is a representation of some prototypes that are far from both groups and probably describe some data outliers with distinct information about the dataset considered. However, to understand the relationship between the information captured in the U-Matrix and the samples, as well as to identify and explain the nature of the groups or clusters defined by the manifolds, it would be helpful to represent all the SOM neurons and their corresponding similarities and dissimilarities on the original data space. Based on the principle of the locally optimal pathway and the idea of navigating on the neurons that compose the SOM, we propose an algorithm named Self-Organized Manifold Mapping (SOMM) that seeks the pathways or manifolds described by the standard SOM. The SOMM algorithm can be described as follows: a. Calculate the SOM composed of k neurons using the standard Kohonen´s algorithm.
Create the list A simple way to explain this algorithm is to understand the output neurons, represented by the weights ij w computed in the SOM algorithm, as a set of nodes of a fully connected graph (Cormen et al. 2001;Pölzlbauer, Rauber, Dittenbach, 2005;Mayer, Rauber, 2010 ) in the parameter space. Each edge in this graph has a cost given by the Euclidean distance between its ends. Therefore, the kk  matrix calculated in steps (b)-(c) is a symmetric one holding the edge costs in the graph. More specifically, in step (d) it is created a list V and in step (d.1) the algorithm inserts in V each visited node. Given a node r , the step (d.3) seeks for the closest neuron s  such that ; that means, s  does not belong to the last visited edge of the graph.This step implements a greedy algorithm that makes the locally optimal choice at each stage generating a locally optimal pathway that connects a subset of SOM neurons. This is necessary because the idea is to generate a pathway that crosses different clusters but without losing the notion of similarity in the parameter space. If   * sV  then we have a loop, like the one exemplified in Figure 3. In this case, the pathway that starts at node 1 ends in the loop         11 3 4 11  . The step (e) completes the pathway, which in this figure is composed by the sequence: , whereas the second path starts in the node 13 and ends in the same loop.

www.intechopen.com
Additionally, the step (g) identifies that there are nodes still not visited by the algorithm. Following the idea of crossing different clusters we must allow that a node rAV might be connected with a node sV  , like node 12 r  shown in Figure 3. In terms of the algorithm, it is equivalent to consider V as a node in a new graph (steps (g.1)-(g.2)), compute the new distances in step (b) and seek for another pathway as before. Therefore, this novel algorithm brings the possibility of uncovering clusters not visible by U-Matrix technique or the standard SOM approach.

Face databases
We have used frontal images of two distinct face databases publicly available to carry out the experiments. The first database is maintained by the Department of Electrical Engineering of FEI, São Paulo, Brazil (Thomaz and Giraldi, 2010). In this dataset, the number of subjects is equal to 200 (100 men and 100 women) and each subject has two frontal images (one with a neutral or non-smiling expression and the other with a smiling facial expression), so there is a total of 400 images with no significant differences in skin colour to perform the high dimensional and sparse image face analysis. The second dataset is the well-known FERET (Philips et al., 1998) database. In the FERET database, we have considered only 200 subjects (107 men and 93 women) and each subject has two frontal images (one with a neutral or non-smiling expression and the other with a smiling facial expression), providing a total of 400 images with significant differences in skin colour to perform as well the experiments. To minimize image variations that are not necessarily related to differences between the faces, we previously aligned all the frontal face images using affine transformations and the directions of the eyes as a measure of reference so that the pixel-wise features extracted from the images correspond roughly to the same location across all subjects. Also, in order to reduce the surrounding illumination and some image artefacts owing to distinct hairstyle and adornments, all the frontal images were cropped to the size of 193x162 pixels, had their histograms equalized and were then converted to 8-bit gray scale. Figure 4 illustrates some samples of the FEI (top row) and FERET (bottom row) datasets, highlighting samples of distinct gender, age, facial expression and ethnicity. Fig. 4. Some samples of the FEI (top row) and FERET (bottom row) frontal images used in the experiments after the pre-processing procedure that aligned, cropped and equalized all the original images to the size of 193x162 pixels.

Experimental results
All the experiments have been carried out using the well-known SOM-Toolbox for Matlab created and released by CIS-Helsinki University of Technology (Vesanto, 1999). To address the memory issues related to computing the SOM on high-dimensional datasets, instead of analysing the SOMM algorithm directly on the pre-processed FEI and FERET face images, Principal Component Analysis (PCA) (Fukunaga, 1990) has been applied first to provide dimensionality reduction. However, in order to reproduce the total variability of the sample data, we have composed the PCA transformation matrix by selecting all the principal components with non-zero eigenvalues. Although some of these principal components might represent non-relevant information to understand the differences between the data samples, we are able to represent and further reconstruct the original images without adding any dimensionality reduction artefacts (Kitani et al., 2010). We have divided our experimental results into two parts. Firstly, we have carried out some face image analyses to understand and visualize the pathways found by the SOMM algorithm where there are subtle differences between the data samples. Thus, we have used a subset of the FEI database composed of non-smiling and smiling face images of females only. Then, in the second part, we have investigated the usefulness of the SOMM algorithm on exploring and understanding the high dimensional and sparse image face space where the differences between the samples are not only related to facial expression but also to gender, ethnicity and age. The goal of the second experiment is to pose an alternative analysis where the differences between the samples are evident, using the whole two FEI and FERET datasets described in the previous section. Figure 5 illustrates the standard SOM (top left), the pathways described by the SOMM algorithm (bottom left) and their corresponding visualization (top and bottom right) on the original face space using a subset of the FEI database composed of non-smiling and smiling face images of females only. It is important to highlight that since the SOMM navigation is based on the principle of the locally optimal path, it is only possible to visit a new neuron when its distance is minimal regarding all the other neurons previously visited. Therefore, the algorithm explicitly describes the discontinuities present at the high dimensional face image space due to the limited number of input samples. In other words, it is possible to see that SOMM could not find a unique graph that defines a single locally optimal path from non-smiling to smiling female face images. In fact, as shown on the bottom right part of Figure 5, we can see three feasible pathways or clusters: (1) samples that describe a definite smiling facial expression; (2) samples that describe the visual differences from nonconvincing to convincing smiling facial expressions; (3) samples that describe the visual differences from non-convincing to convincing non-smiling facial expressions. In the next two figures, we show the behaviour of the SOMM algorithm on navigating at high dimensional and sparse image face spaces where the differences between the samples are not only related to facial expression but also to gender, ethnicity and age. Figure 6 illustrates the standard SOM (top left), the pathways described by the SOMM algorithm (bottom left) and their corresponding visualization (top and bottom right) on the original face space using the whole set of frontal face images of the FEI database with both gender and facial expression differences. Analogously to the previous results, three clusters have been found by the SOMM algorithm. Despite the gender differences available on this dataset, SOM has not clearly extracted this information on its standard mapping and neither SOMM has described it in a separated pathway or cluster. The smallest SOMM cluster, composed of 6 neurons, shows samples that describe a definite smiling facial expression with slightly more male facial www.intechopen.com traits than female ones. A similar description is valid for the second smallest SOMM cluster, composed of 8 neurons, but rather with more female facial traits. However, the largest cluster clearly shows that the most expressive information captured by SOMM has been related to changes in facial expression, no matter the gender of the subjects analysed. The last experimental results using the FERET dataset are presented in Figure 7. It can be seen that the main expressive information captured by SOM have been based on ethnicity and facial expression changes. The visualization of the standard SOM, illustrated on the top right part of Figure 7, shows clearly how the data set has been generally spread along the high dimensional face image space. It is possible to see that when we move from top to bottom we are able to see differences related mainly to ethnicity, no matter the facial expression or gender of the subjects. Besides, navigation on the SOM neurons from left to right highlights essentially information about changes on facial expression with minor differences related to gender and ethnicity features. However, not all these pathways are feasible due to the discontinuities of the high dimensional and sparse image face space. In fact, as described by the SOMM algorithm, there are only five clusters possible to move along based on the principle of the locally optimal path. Therefore, although the standard SOM can explain the general information extracted by its neurons, its intrinsic selforganized manifolds have been only explicitly explained by the SOMM algorithm. www.intechopen.com

Conclusion
In this chapter, we proposed and implemented a self-organized manifold mapping algorithm that allows a better understanding of the information captured by the standard SOM neurons. The method is able not only to identify and explain the nature of the clusters defined by the SOM manifolds, but also to represent all the SOM neurons and their corresponding similarities and dissimilarities on the original data space. To describe the possible self-organized pathways to navigate on the high dimensional and sparse image face space, we constructed a neighbourhood graph on the SOM neurons based on the principle of the locally optimal path. Such graph visualization method explicitly provides information about the number of clusters that describes the sample data under investigation, as well as the specific features extracted and explained by them. We believe that the algorithm proposed might be a powerful tool in SOM analysis, providing an intuitive explanation of the topologically constrained manifolds modelled by SOM and highlighting some perceptual properties commonly present in well-framed face image analysis such as facial expression, ethnicity and gender.

Acknowledgment
Portions of research in this paper use subsets of the FERET database of facial images collected under the FERET program. As a baby, one of our earliest stimuli is that of human faces. We rapidly learn to identify, characterize and eventually distinguish those who are near and dear to us. We accept face recognition later as an everyday ability. We realize the complexity of the underlying problem only when we attempt to duplicate this skill in a computer vision system. This book is arranged around a number of clustered themes covering different aspects of face recognition. The first section presents an architecture for face recognition based on Hidden Markov Models; it is followed by an article on coding methods. The next section is devoted to 3D methods of face recognition and is followed by a section covering various aspects and techniques in video. Next short section is devoted to the characterization and detection of features in faces. Finally, you can find an article on the human perception of faces and how different neurological or psychological disorders can affect this.

How to reference
In order to correctly reference this scholarly work, feel free to copy and paste the following: Edson C. Kitani, Emilio M. Hernandez, Gilson A. Giraldi and Carlos E. Thomaz (2011). Exploring and Understanding the High Dimensional and Sparse Image Face Space: a Self-Organized Manifold Mapping, New Approaches to Characterization and Recognition of Faces, Dr. Peter Corcoran (Ed.), ISBN: 978-953-307-515-0, InTech, Available from: http://www.intechopen.com/books/new-approaches-to-characterization-andrecognition-of-faces/exploring-and-understanding-the-high-dimensional-and-sparse-image-face-space-a-selforganized-manifo