Face and ECG Based Multi-Modal Biometric Authentication

A biometric system is essentially a pattern recognition system. This system measures and analyses human body physiological characteristics, such as face and facial features, fingerprints, eye, retinas, irises, voice patterns or behavioral characteristic for enrollment, verification or identification (Bolle & Pankanti, 1998). Uni-modal biometric systems have poor performance and accuracy, and over last few decades the multi-modal biometric systems have become very popular. The main objective of multi biometrics is to reduce one or more false accept rate, false reject rate and failure to enroll rate. Face Recognition (FR) is still considered as one of the most challenging problems in pattern recognition. The FR systems try to recognize the human face in video sequences as 3D object (Chang et al., 2003; 2005), in unconstrained conditions, in comparison to the early attempts of 2D frontal faces in controlled conditions. Despite the effort spent on research today there is not a single, clearly defined, solution to the problem of Face Recognition, leaving it an open question. One of the key aspects of FR is its application, which also acts as the major driving force for research in that area. The applications range from law enforcement to human-computer interactions (HCI). The systems used in these applications fall into two major categories: systems for identification and systems for verification (Abate et al., 2007). The first group attempts to identify the person in a database of faces, and extract personal information. These systems are widely used, for instance, in police departments for identifying people in criminal records. The second group finds its main application in security, for example to gain access to a building, where face is used as more convenient biometric. The more general HCI systems include not only identification or verification, but also tracking of a human in a complex environment, interpretation of human behavior and understanding of human emotions. Another biometric modality that we use in our approach is the electrocardiogram (ECG). The modern concept for ECG personal identification is to extract the signal features using transform methods, rather than parameters in time domain (amplitudes, slopes, time intervals). The proper recognition of the extracted features and the problem of combining different biometric modalities in intelligent video surveillance systems are the novel steps that we introduce in this work. 4


Recognition of facial images 2.1 Framework for face recognition
In real case scenario, human faces often appear in scenes with complex background, rather than as a single object. In addition, they have varying appearance due to different lightning conditions, changes in pose, human expressions etc. Thus, a reliable system for FR must be robust to noise, variations and be able to work in real time. To meet these requirements we are proposing a framework for recognition of facial images depicted on Fig. 1. This framework consists of three stages, namely Face Detection (FD), Subspace Projection (SP) and Classification.

Fig. 1. Pictorial depiction of facial recognition framework
The purpose of the FD is to locate a human face in a scene and extract it as a single image. In this work, we propose a combination of two classifiers for rapid and accurate FD. The first one is faster but less precise, while the second, compensates for the imprecision of the first classifier. The second stage of the proposed framework, namely SP, is used for dimensionality reduction of the detected facial images, when represented as vectors in high-dimensional Euclidean space. Thus, it is necessary to transform them from the original high-dimensional space to a low dimensional one for alleviating the curse of dimensionality. The SP is based on Principal Component Analysis (PCA) and Spectral Regression (SR) algorithms. The PCA discovers the subspace which contains all vectors of facial images and we use it mainly to remove noises. PCA also preserves Euclidean distances between vector pairs in the subspace. Based on this, further dimensionality reduction is done by using the SR algorithm. This algorithm is robust with respect to the variation of lightning conditions and human expressions. Finally we perform classification using Support Vector Machines classifier in the subspace. In the following, the three stages of the proposed FR framework we will be discussed in details.

Face detection
A combination of rapid cascaded classifier and accurate monolithic one is used as a two level Face Detection algorithm. The first level is represented by the Haar-like features' cascade of weak classifiers, which is responsible for fast detection of face-like objects. The second level is a Convolutional Neural Network (CNN) used for filtration of falsely detected faces. The Haar-like features' cascade of weak classifiers allows detecting face candidates very quickly. It consists of a cascade of one or more weak classifiers. The weak classifier's input is represented by Haar-like feature with a value (Viola & Jones, 2004): Feat(x)=s w × SUM w + s b × SUM b (1) where x is input image's sub-window, s w and s b -whole rectangle's and its black part's weights accordingly, SUM w and SUM b -whole rectangle's and its black part's sums of pixels. A weak classifier's output value is: where Θ -weak classifier's threshold. The cascade of weak classifiers is a linear combination of weak classifiers (Viola & Jones, 2004): where T -weak classifiers' number, η t -t-weak classifier's weight. The AdaBoost algorithm (Freund & Schapire, 1997) is used for training of the cascade of weak classifiers and the selection of the most important Haar-like features. The second level uses the Convolutional Neural Network (Lecun et al., 1998) which is more robust to variations of the input image, compared to other known classifiers. The output value of a neuron with bipolar sigmoid transfer function and with the coordinates (m, n) of p-plane and l-layer is (Kurylyak et al., 2009): where x is input face candidate's image, WSUM -neuron's weighted sum calculated by (Kurylyak et al., 2009): Here K is input planes' number (as well as convolutional kernels), R and C are convolutional kernel's height and width, w l,p,k r,c is synaptic weight with coordinates (r, c) in the convolutional kernel between k-plane of the (l − 1)-layer and p-plane of the l-layer, b l,p is neurons' bias of the p-plane and l-layer. The CNN uses a sparse structure instead of a fully-connected one; also its number of layers is decreased. In order to increase the neural network's processing speed, convolution and subsampling operations are performed in each plane simultaneously (Simard et al., 2003) Fig. 2.

Facial features extraction with subspace projection 2.3.1 Principal Component Analysis
Principal Component Analysis (PCA) is a very popular method for dimensionality reduction in machine learning and statistics communities. It can be considered as method learning the basis vectors spanning the linear subspace, called principal subspace, containing all data points embedded in that subspace. This basis is determined by the non-zero eigenvectors of data covariance matrix, and the dimension of the subspace is their number. Usually this number is much less than the dimension of the ambient space and dimensionality reduction is performed by projection the data points onto that subspace. PCA can be formulated in two ways (Bishop, 2007). The firs one is called minimum-error formulation and the target is to minimize the mean squared error between the data and its projections onto the subspace. In the second approach, called maximum variance formulation, the goal is to maximize the variance of the projected data. In this work we approach PCA by the former one. Let {x i } M i=1 ⊂ R N be a data set of measurements of physical phenomenon, drawn from an unknown probability distribution. We seek a subspace of dimension D ≪ N such that the variance of the projected data onto it is maximized. First step of PCA involves computation of data covariance matrix by: x i is the mean vector. Then the variance of the projected data onto single direction v ∈ R N is: Maximizing the variance, means maximizing the quadratic term v T Cv. But when v → ∞ it follows that v T Cv → ∞, hence it is desired to constrain the problem in appropriate way. This is achieved by the normalized condition v T v = 1. By introducing Lagrange multipliers we reformulate the problem from constrained to unconstrained maximization one: where λ is a Lagrange multiplier. Taking the partial derivatives of 8 with respect to v and setting to zero, we end up with the eigenvalue problem Cv = λv. Thus the direction defining maximum variance of projected data is the leading eigenvector called principal component. Additional principal component can be found by calculating new direction which maximizes 8 and it is orthogonal to the previous ones considered. By arranging the eigenvalues on the diagonal of the diagonal matrix Λ and the eigenvectors as e columns of the matrix V we can write the eigenvalue problem as CV = ΛV. By selecting only D eigenvectors of V corresponding to the leading eigenvalues and arranging them in matrixṼ the subspace projection is defined byx i = x T iṼ for i=1. . . M.

Spectral regression framework for dimensionality reduction
Dimensionality reduction can be well interpreted in a graph embedding fashion. Such interpretation is very intuitive and also opens the possibility of developing a new approaches to the problem of pattern recognition. Advantage of graph embedding is the unifying power to the most of the dimensionality reduction methods such as Linear Discriminant Analysis (LDA), Locally Linear Embeddings (LLE), Locality Preserving Projections (LPP), ISOMAP as proposed in (Yan et al., 2007). Consider a data set {x i } M i=1 ⊂ R N represented as points in Euclidean space. Each data point can be viewed as e vertex of adjacency graph Γ = {X, W} with edges W defined under some rule. Depending on the rule, graph edges can be distance, or a measure of similarity, between data points (vertices of Γ). The graph Laplacian is defined by The goal of graph embedding is to find a low dimensional representation of each vertex, while preserving similarities between the vertex pairs. This is achieved by minimizing (Chung, 1997): where y =[ y 1 , y 2 ,...,y M ] T is a map of graph vertices (data points) on the real line. If we select a linear map of the form y i = a T x i , the problem 9 can be reduced to: where X =[x 1 , x 2 ,...,x M ] is the data matrix. This approach is called Linear extension of graph embedding, and the optimal solution of 10 can be found by the generalized eigenvalue problem: With different choices of W, 11 can be formulated as LDA, LPP etc. Solving 11 can be computationally intensive in cases where W is a dense matrix of very high dimensions.
To overcome this issue in (Cai et al., 2007) proposed a method, called Spectral Regression, which casts solving the generalized eigenvalue problem 11 into regression framework. The advantages of this approach are: (1) Solving a regression problem less computationally intensive than eigenvalue problem; (2) Regression can be solved with regularization term controlling the complexity and avoid overfitting; (3) By selecting regression terms, various properties can be achieved, such as sparsity. Furthermore, if we chose a function in a Reproducing Kernel Hilbert Space y i = ∑ M j=1 α i K(x j , x i ), where K(., .) is a Mercer kernel, SR can be extended in kernel mode. Algorithmically SR is performed as follows (Cai, 2009): 1. Construct the weights matrix W: • Set W ij = 1/l k if x i and x j both belong to the k-th class where l k is the number of samples in it; • Set W ij = δ.s(i, j) if x i is among the p-nearest neighbor of x j or vice-versa. The parameter δ is used to adjust the weight between supervised and unsupervised neighbor information. s(i, j) is a function evaluating the similarity between x i and x j .
This function can be the heat the kernel function s(i, j)=exp(−

Responses Generation:
Solve the eigenvalue problem Wy = λDy and select the K largest 3. Regression: Ridge regression solves the quadratic regression problem with Euclidean norm penalty: In Lasso regression, the regression problem is solved with L 1 -norm penalty: In both 12 and 13 the solutions are represented by {a k } K k=1 ⊂ R N . 4. Subspace Projection: Perform dimensionality reduction by projecting on a lower dimensional space x → z = A T x, where A = [a 1 , a 2 ,...,a K , ] is matrix of the solution vectors from the regression step.
For the purpose of this work, we select two different regularization terms in the regression step. The first one, called Ridge, is the standard quadratic regression 12. The dimensionality of the output space is this case is c − 1 and K = c. The second mode, called Lasso, uses L 1 -norm for the regularization term which induces sparsity on {a k } K k=1 . We test the proposed framework for face recognition with both modes of SR.

Classification of facial images
Support Vector Machine is a supervised learning algorithm used for classification in two classes. The aim of SVM is to find a N-dimensional hyperplane that optimally separates the data. Optimally in this case means that the margin, between nearest data points and the hyperplane, will be maximized. Unfortunately in real problems data is rarely separable by a hyperplane but can be separated by a non-linear surface. SVM can be transformed to a non-linear classifier by applying the kernel trick (Vapnik, 1998). This way data is mapped implicitly in a higher dimensional space where it can be separated by a hyperplane.
be a data set with class labels vector y =[ y 1 , y 2 ,...,y M ] such that y i ∈ {−1, 1} , i = 1, 2, . . . , M. Learning the parameters of the support vectors is performed by solving the constrained optimization problem: where C > 0 is an upper bound, e is a vector with unit elements, Q is M × M positive semi-definite matrix defined by Q ij = y i y j K(x i , x j ) and K(., .) is a Mercer kernel (Vapnik, 1998). The decision function for unknown sample x is given by:

Data set
We apply the proposed framework on a part of the face image database from the Computational Vision at the California Institute of Technology, USA (Caltech-CV-Group, 1999). The original database contains JPEG images of faces of 19 persons with different lighting/expressions/backgrounds/ and male or female. We select prepare a subset of the database with 10 images per person. Next we split this subset in two groups with even images per class, i.e. 5 images per person for the first and second groups.

Experimental setup and testing
For each group of the subset we perform training followed by testing with the remaining group and for each run we calculate the recognition rate. The final recognition rate is calculated by averaging the rates of each run. With this protocol we test our framework with 'Ridge' and 'Lasso' regression setting of SR, where the former is tested with different value of sparsity. The dimensionality of the subspace for 'Ridge' and 'Lasso' is 18 and 30 respectively. For the first one it is controlled by the number of classes and for the second one it is determined by experiments. In Table 1

State of art
The ECG is widely used as diagnostic tool in the cardiology because of its clinical significance and considering its noninvasive nature. It is a registration of the electrical activity of the human heart over time. The genesis of this electrical activity is discussed in detail in (Malmivuo & Plonsey, 1995) . The registration of ECG is performed by measuring the generated electrical voltage between pair of leads attached on the human body according to defined standards. Usually more than 2 electrodes are used and the standard for clinical electrocardiography is an ECG acquisition taken from 12 channels, thus "viewing" the electrical activity from 12 different angles. According to electrodes placement, the leads can be divided into two groups: limb leads (RA, LA, LL) and augmented leads (V1-6) (Fig. 3). Each of these groups represents the electric field of the heart in a given plane (frontal or horizontal respectively).

Fig. 3. Electrodes placement in standard 12 channel electrocardiography
Obviously the anatomical differences in heart muscle among individuals can be "seen" better in ECG when using all 12 channels. This type of acquisition is very impractical for real world application, so it is important to extract appropriate features when using only easy accessible leads, for example left and right arm. A typical "healthy" ECG waveform has three distinct regions called waves (Malmivuo & Plonsey, 1995) (Fig. 4), however this morphology strongly depends on used leads, patient's condition, etc. A cardiac cycle starts with P wave, which represents the depolarization (electrical discharge) of the heart's atria. The QRS complex is a transient signal deflection related to the depolarization of the ventricles. The cardiac cycle ends with repolarization (electrical recharge) of the ventricles, which is seen in ECG as T wave. The remaining regions are referred as baseline. The researches on the use of ECG as modality for personal identification have started since the beginning of the 21-st century. The first published articles covering this field are concentrated mainly in proving the personal discriminative characteristics of ECG as well as theirs relatively time invariance. In (Biel et al., 2001) are studied the time domain characteristics of such signals taken from 20 subjects (Fig. 4). In the cited article time domain features were reduced down to 7 according to achieved results from Principal Component Analysis (PCA). It was proven by experiments that it is possible to identify a person using ECG taken from only one lead.
In (Israel et al., 2005) can be seen a similar approach for ECG features extraction. The time domain characteristics are referred in this article as analytic features. The signal is preprocessed in terms of bandpass filtration. After this, the peaks of P wave, R wave and T wave are found as local maxima determination is a given region. The minimum radius curvature is incorporated to find the onsets and offsets of such waves. The relevant features are selected using Wilks' Lambda method. Fig. 4. Graphical representation of time domain ECG characteristics used as features for ECG personal identification (Biel et al., 2001) Use of analytic ECG features has some disadvantages. Firstly, it requires sophisticated methods for automated ECG segmentation. Secondly, the amplitude parameters of the ECG depend not only on the unique anatomic characteristics of the human heart but also on electrodes contact, electrodes placement, etc. Finally, when the signal is highly influenced by noise, automated determination of analytic features with acceptable accuracy could fail. As described above, the ECG segmentation is a primary and unavoidable process for any kind automated technique for features extraction. In order to avoid the need of precise determination of ECG wave boundaries, in (Plataniotis et al., 2006) is described an original approach. Firstly, the signal is divided into subsets in way that the subsets contain at least two complete cardiac cycles. Secondly, the normalized Autocorrelation Function (ACF) is calculated for a subset. Finally, the Discrete Cosine Transform (DCT) is applied on the ACF. About 40 of the most significant DCT coefficients serve as features for the classification process. The motivation behind using the ACF is the capability of non random patterns detection in the signal.

Methods for automated ECG segmentation
ECG segmentation is a key process in automated ECG identification systems. In healthy persons the ECG has strong cyclic recurrence. This is exploited in numerous approaches for full automated segmentation using Hidden Markov Models (HMM) (Hughes et al., 2004), (Boumbarov et al., 2009). When dealing with morphological features, the full segmentation is not necessary to be performed. Instead of that some simple, fast and still effective ways can be used to extract the subsets consisting all waves and complexes in a complete cardiac cycle. The approach in (Velchev, 2010) proposes an identification of R peaks based on morphological filtration and histogram analysis. The morphological filtration is an interaction between the signal x(t) and a predefined simple shape g(t) called structuring element. The domains x(t) and g(t) are respectively X and G . The morphological filtration is based on two simple operations: dilatation D(x, g)(t) and erosion E(x, g)(t) . They are defined as: and The morphological operations Close and Open are defined as follows: Close(x, g)=E(D(x, g), g) and Open(x, g)=D(E(x, g), g).
The operation Open(x, g) smoothers the convex peaks of the signal, while Close(x, g) smoothers the concave peaks. The structure of morphological filter for QRS complexes exaggeration is according to the following relation: where x ′ (t) is the filtered signal. The chosen structuring element is "disk". Determining its radius is a highly empirical procedure. However, it is clear that the diameter of the "disk" should be less than wf s ,where w is the smallest width of the P and T waves and f s is the sampling frequency. In Fig. 5 are shown two examples achieved according to 20 and using the "disk" as structuring element. As can be seen it is possible to choose the parameter of the structuring element in such way, the QRS complexes are remaining as much close to the original while the P and T waves are considerably suppressed. The next procedure is to find an appropriate threshold for coarse detection of the R peaks. A histogram is calculated for each detected local maxima of the signal. This histogram for most of the cases will be bimodal. The optimal threshold is calculated according to Otsu's method (Otsu, 1979).

Fig. 5. Results from morphological filtration of ECG signal with structuring element 'disk" with radius 4 (left) and 5 (right). The sampling frequency is 256Hz
The last procedure in the segmentation process is to find a sample in the baseline between T and next P wave. This sample has to belong to the smoothest part of the region and its determination doesn't have to be extremely precise. Let z =[ z1, z2,...,z N−1 ] be a vector which elements are the numbers of the wanted samples and N is the number of the identified R peaks. An element of z is calculated according to: where CW ψ gauss1 x and CW ψ gauss2 x are the continuous wavelet transforms of x with wavelet function first and second derivative of the Gaussian, s is the scale of the wavelet transform, τ is the translation, r i is the position of the i-th detected R peak and Θ 2 > Θ 1 are time values measured form R peak position between which the wanted sample is expected to be found. The values of s should be chosen large enough in terms for noise robustness and small enough in terms to achieve an accepted accuracy of identification. In Fig. 6 is shown an example result using the described approach. As a final result from these procedures the ECG signal is segmented into complete cardiac cycles containing all important information in the middle of the resulting segments. Additionally the proposed approach doesn't require any supervised training. Fig. 6. An example of detection of samples belonging to the baseline between T and next P wave using combination of wavelet transforms with first and second derivative of the Gaussian as wavelet functions

ECG features extraction using linear and nonlinear projections in subspaces
As mentioned above, using the time domain characteristics as unique ECG features for personal identification has many significant drawbacks. An alternative approach is extraction of morphological features from ECG. These features could be extracted from a whole cardiac cycle in ECG, thus the need of full segmentation is eliminated. In this sense these features can be considered as holistic. They consist simultaneously amplitude and temporal characteristics of the ECG waves as well as their shape. In this section two approaches for holistic features extraction are described. The first is based on linear projections in subspaces: PCA and Linear Discriminant Analysis (LDA). The second uses nonlinear versions of PCA and LDA: Kernel Principal Component Analysis (KPCA) and Generalized Discriminant Analysis (GDA). A block diagram for an ECG personal identification system based on described features is shown in Fig. 7.   Fig. 7. Block diagram for ECG personal identification system based on features extracted using linear or nonlinear projections in subspaces

ECG features extraction using linear projections in subspaces
PCA is a statistical technique for dimensionality reduction of high dimensional correlated data down to a very few coefficients called principal components (Castells et al., 2007). This technique is optimal in terms of retaining as small as possible the Mean Squared Error (MSE) between the original signal and the restored signal form reduced set of principal components. Let the ECG signal is automatically segmented into PQRST complexes. These complexes are aligned and arranged as rows in a matrix. In general they differ in their length. The number of columns of the matrix is chosen larger than the maximal expected length of the PQRST complexes. The PQRST complexes are arranged in the matrix in such way the R peaks are in the middle of rows (Fig. 8). The elements in each row before and after the copied PQRST complex are set as same as the first and last sample of the complex respectively.
The training procedure requires a common training matrixX =[x ij ] N×M built from PQRST complexes taken from all individuals. The normalized training matrixZ =[ z ij ] N×M is calculated according to:Z where u is a column vector of ones with number of elements N,μ = [μ 1 ,μ 2 ,...,μ M ] is the mean vector (bias vector) andσ = [σ 1 ,σ 2 ,...,σ M ] is the vector of the standard deviation ofX. The covariance matrix ΣZ =[σZ ,ij ] M×M ofZ is calculated according to: where Λ =[ λ ij ] M×M is a diagonal matrix and the values in its main diagonal are the eigenvalues of ΣZ.The columns of V =[ṽ ij ] M×M are the eigenvectors of ΣZ.
The eigenvalues in Λ and the columns of V have to be rearranged in descending order of the eigenvalues. The number of principal components is equal to the original length of the input observations M so they have to be reduced. The reduced number of principal components L is calculated according to: where mse(Ẑ)= mse(Ẑ) 1 , mse(Ẑ) 2 ,...,mse(Ẑ) M is the vector from MSE values between original matrixZ and the matrixẐ, which is restored back from reduced set of principal components and mse tr is a preliminary determined value. The elements of mse Ẑ are calculated according to (Sanei & Chambers, 2007): The transformation matrix W =[w ij ] M×L is built as: In the process of authentication the features Y of the authenticated individual are calculated as (Franc & Hlavac, 2004): where u is a column vector of ones with number of elements N . Typically it the set of PQRST complexes there exist an amount of non specific information. This is due to the presence of noise and artifacts or can be caused by an arrhythmia. These non specific PQRST complexes should be excluded from the analysis. A convenient way is to use the Hotelling statisticsT 2 = t2 1 ,t 2 2 ,...,t 2 N . A given element indicates how far is the training sample from the centroid of the formed cluster (Hotelling, 1931): The criterion for excluding a given PQRST complex from the analysis ist 2 i >t 2 tr , where the threshold valuet 2 tr is:

ECG features extraction using linear projections in subspaces
In Fig. 9 is given an example of features distribution of five individuals using two different techniques -PCA and KPCA. As can be seen the extracted features using PCA aren't linearly separable. Despite the complications of the process, the results from KPCA are much better.
(a) (b) Fig. 9. Distribution of the first three principal components from ECG signals taken from 5 individuals using PCA -(a) and KPCA -(b) According to (Schölkopf et al., 1998) KPCA could be considered as performing PCA in other (usually higher dimensional) space F . The mapping of the original input vector (PQRST complex) in the new space is expressed as x → Φ(x). Let Φ(x 1 ),...,Φ(x M ) be the projections of the input vectors, where M is the number of observations. An input vector is expressed as x j = x j,1 , x j,2 ,...,x j,N , where N is the original dimensionality. The key process behind KPCA is to perform an eigendecomposition of the covariance matrix Σ Φ(X) : where v is an eigenvector of Σ Φ(X) v, λ is the corresponding eigenvalue and Σ Φ( Taking into account the definition of Σ Φ(X) , v is a linear combination of the vectors Φ(x j ), j = 1, . . . , M: Using 31 and 32 we obtain: which is equivalent of system of M equations: In the last expression the inner products in the new space give the possibility not to deal directly with Φ, but to use the kernel matrix K[k ij ] M×M : where the operator ·, · stands for inner product and k is the kernel function. Using kernel matrix 34 is transformed into: where w =[w 1 , w 2 ,...,w M ] T . The last is equal to (Schölkopf et al., 1998): Determining w for each principal component is calculating the eigenvectors of K. Let λ 1 ≤ λ 1 ≤,...,λ M is the full set of arranged eigenvalues, w 1 , w 2 ,...,w M is the set of eigenvectors and λ p is the first nonzero eigenvalue. According to (Schölkopf et al., 1998) w 1 , w 2 ,...,w M have to be normalized in the way v k , v k = 1, ∀k = p,...,M. Using 37 the normalization is expressed as: Obtaining the projection of x onto its principal components subspace would require calculating the projections of the eigenvectors v k , k = p,...,M onto F : A traditional approach for improving the class separability is the Linear Discriminant Analysis (LDA) (Theodoridis & Koutroumbas, 2006), however if the features are not linearly separable LDA usually fails. It is possible to use a more generalized approach called Generalized Discriminant Analysis (GDA) (Baudat & Anouar, 2000). Let the input matrix X be composed from columns arranged following their class membership. The covariance matrix of the centers of the classes ΣΦ (X) is: where n c is the number of observations belonging to the class c. C is the number of the classes, andΦ(x c )=E {Φ(x c )}. For the covariance matrix Σ Φ(X) the following is valid (Baudat & Anouar, 2000): where x c,k is the k-th observation from the class c. The goal in GDA is maximization of between-class variance and minimization of within-class variance. This is done by the following eigendecomposition: where v is an eigenvector, and λ is the corresponding eigenvalue. The maximum eigenvalue maximizes the ratio (Baudat & Anouar, 2000): This criterion 43 in the space F is: where K is the kernel matrix, w =[ w 1 , w 2 ,...,w M ] T is a vector from coefficients for which the equation 32 is valid, and P[p ij ] M×M = P 1 ⊕ P 2 ⊕ ...⊕ P c ⊕ ...⊕ P C is a block matrix. The elements of P c [p c,ij ] n c ×n c are: The solution of 45 can be found in (Baudat & Anouar, 2000).

Results and discussion 3.4.1 Datasets for experiments
The ECG signals for the experiments are collected from 28 individuals using own ECG registration hardware. The sampling rate is 512Hz and the resolution is 12bit. The system was trained using subsets from these signals. The testing was performed two weeks later in order to prove the time invariance of the features.

Experimental results
All experiments with kernel versions of PCA and LDA were made using the Gaussian kernel function k(x, y)=exp − x−y 2 2σ 2 ) with σ 2 = 1 . In Table 2 (Biel et al., 2001) 95.0 Table 2. Accuracy of the ECG identification As can be seen the results using holistic features extracted with linear projections in subspaces are relatively poor. The GDA outperforms all approaches but the significant disadvantage of this method is the computation complexity. In addition the maximal dimensionality of the features is limited up to the number of identified individuals minus one. For combining with facial biometric modality we select KPCA approach for feature extraction. Despite its lower performance we prefer it because there is an algorithm, called GreedyKPCA, in which the kernel matrix does not have to be stored.

Combining ECG personal identification and face recognition 4.1 Classifier combination approaches
There are different approaches for combining classifiers, depending on their output. If only class labels are available as output, voting schemes can be used for final decision. If a posteriori probabilities are available however, different linear combinations rules can be applied. We will follow the former approach. The strategies for combination utilize the fact that the classifier output reflects its confidence, and not the final decision. The confidence of a single classifier is represented by (Duin, 2002): where ω i is the i-th class and i = 1, 2, . . . , M is the class number. In the case of multiple classifiers, the confidence need to be defined for each of the classifiers j = 1, 2, . . . , C. The output of each classifier can be viewed as a feature vector z i . Then, following the Bayesian rule for optimal classification a sample x is assigned to class ω i if: Using the Bayesian rule we can express 46 by the likelihood functions (Kittler et al., 1998): where p(z 1 , z 2 ,...,z C ) is the unconditional joint probability density. It can be expressed through the conditional distributions and for this reason it could be used for the calculation of the final decision (Kittler et al., 1998). Also, using the numerator term only allows for various classifier combination rules to be obtained (Kittler et al., 1998): • Min rule: According to this rule a sample x is assigned to class ω i if min j P(ω i | z j )=max n min j P(ω n | z j ), j = 1, 2, . . . C, n = 1, 2, . . . , M.
This rule selects a classifier with least objection against a certain class.
This rule selects a classifier with least objection against a certain class.
• Product rule: This rule assigns x to class ω i if • Sum rule: This rule can be derived from the product rule and assigns x to class ω i if

Experimental results
We approach the combination of Face and ECG biometric modalities by combining both classifier's output probabilities using the rules specified in the previous section. For ECG identification we use the output probabilities of Radial Basis Neural Network classifier and for Face Recognition framework we use LIBSVM library (Chang & Lin, 2001) for calculating the output probabilities of SVM classifier. For both modalities we select 19 persons for identification with 5 samples for training and 5 samples for testing. Hence, the output of both classifiers is a 19 elements feature vector of probabilities with totally 10 samples per person, per modality. In our experiments, we test all rules for classifier combination considered and the results are displayed in Table 3. Also, we compare our work with the best results achieved by Combining Attributes in (Israel et al., 2003), .  (Israel et al., 2003) 99.0 Table 3. Accuracy of the ECG identification Experimental results reveal that combining probabilities output of ECG identification and Face Recognition framework with the Product Rule achieved best results.

Conclusion
In this work an approach for personal identification based on biometric modality fusion was presented. The presented combination of classifiers is characterized by its high accuracy and it is particularly suitable for precise biometric identification in intelligent video surveillance systems.

Acknowledgement
This work was supported by National Ministry of Education and Science of Bulgaria under contract DO02-41/2008 "Human Biometric Identification in Video Surveillance Systems", Ukrainian-Bulgarian R&D joint project References The methods for human identity authentication based on biometrics â€" the physiological and behavioural characteristics of a person have been evolving continuously and seen significant improvement in performance and robustness over the last few years. However, most of the systems reported perform well in controlled operating scenarios, and their performance deteriorates significantly under real world operating conditions, and far from satisfactory in terms of robustness and accuracy, vulnerability to fraud and forgery, and use of acceptable and appropriate authentication protocols. To address some challenges, and the requirements of new and emerging applications, and for seamless diffusion of biometrics in society, there is a need for development of novel paradigms and protocols, and improved algorithms and authentication techniques. This book volume on â€oeAdvanced Biometric Technologiesâ€ is dedicated to the work being pursued by researchers around the world in this area, and includes some of the recent findings and their applications to address the challenges and emerging requirements for biometric based identity authentication systems. The book consists of 18 Chapters and is divided into four sections namely novel approaches, advanced algorithms, emerging applications and the multimodal fusion. The book was reviewed by editors Dr. Girija Chetty and Dr. Jucheng Yang We deeply appreciate the efforts of our guest editors: Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park and Dr. Sook Yoon, as well as a number of anonymous reviewers.