A Survey on Soft Biometrics for Human Identification

The focus has been changed to multi-biometrics due to the security demands. The ancillary information extracted from primary biometric (face and body) traits such as facial measurements, gender, color of the skin, ethnicity, and height is called soft biometrics and can be integrated to improve the speed and overall system performance of a primary biometric system (e.g., fuse face with facial marks) or to generate human semantic interpretation description (qualitative) of a person and limit the search in the whole dataset when using gender and ethnicity (e.g., old African male with blue eyes) in a fusion framework. This chapter provides a holistic survey on soft biometrics that show major works while focusing on facial soft biometrics and discusses some of the features of extraction and classification techniques that have been proposed and show their strengths and limitations.


Introduction
Along with the automation of our modem life, security issues become more critical and important. There are questions asked in our daily life such as "is this the right person to be allowed to access the system?", "is this the authorized person to perform such action?", and "does this person belong to this country?" [1]. There were two methods for answering this questions: first one based on "what you have" and called (knowledge factors), such as ID cards, and the second one based on "what you know" and called (ownership factors), such as passwords as shown in Figure 1. However both methods can be borrowed or copied or stolen, so users need to carry many IDs and memorize a lot of passwords. As reported banks, telecommunication companies, and governments are losing millions of dollars annually because of the violations of their password-based and card-based security police [2]. To solve this person identification issue, biometrics is an opened field.
Biometrics rely on what you are called (inherence factors) so can natively differentiate between a permitted and illegal person [3,4]. Biometric traits offer the following advantages [5]: • They are unique for each individual.
• They cannot easily be forgotten, stolen, borrowed, shared, or observed.
• They always vary and are always available.
• They cannot easily be transferred to another individual.
A biometric-based security system is almost impossible to be fooled. The word biometric is a composite word bios, which refers to life, and metron, which refers to measure, coming from the Greek language. Biometric is sometimes defined as a research area focused on measuring and analyzing a person's unique characteristics [6] to identify or verify a person identity and is an essential daily task for a security system to make sure that the services are available for the permitted users only [7]. It can be divided into traditional, primary, and soft biometrics as shown: traditional biometric deals with physical, behavioral, and biological characteristics such as facial features, eye, signature, gait, voice, DNA, and fingerprints as shown in Figure 2. Soft biometrics are concerned with ancillary characteristics that provide some information not enough to identify a person clearly as gender, ethnicity, skin color, scars, and height [8,9]. Behavioral or physiological human features must fulfill the following requirements to be recognized as can be used as a biometric characteristic [7,10]: 1. Universal: each person has the trait.
3. Resistance to circumvention: not easy to cheat.

4.
Distinctive: can be used to differentiate between persons.

5.
Permanence: they don't change over a period of time.

6.
Collectable: the characteristic can be easily collected and measured.

Machine Learning and Biometrics
However, there is not a single biometric feature that satisfies all these characteristics identified above yet, so as a result, none of existing biometric system provides a precise foolproof recognition, so there is a gap for improving the recognition accuracy and speed of primary biometrics using soft biometrics.
This chapter is divided into five sections as follows: Section 2 shows the soft biometric benefits, unimodal biometric system limitation and how multimodal biometric system overcome this limitation, the need of biometric fusion for system performance, and the system performance measurement. A holistic survey on related works is presented in Section 3 while focusing on facial soft biometrics. In Section 4 we show the challenges and the limitations of the soft biometrics. Section 5 concludes the work.

Soft biometrics
Soft biometrics provide ancillary information but are not fully distinctive and permanent, so these features cannot provide a reliable person recognition. However, such ancillary information still can be used as a secondary information to complement the primary biometric traits (face, iris, etc.), and these features can be classified to physique (e.g., color skin, gender, ethnic origin), clothing (e.g., clothes' color), or accessories (e.g., glasses, hat) [11].

Benefits of soft biometrics
• Can be used to improve the recognition accuracy and speed of a primary biometric system [12].  • Can be used when there is a difficulty to collect a primary biometric trait or the collected data is not clear due to the sensor error or data collected from a distance with no cooperation with the user.
• Acceptable: collecting data for identification don't need cooperation between the person and the sensor and available.
• Soft facial biometrics are not expensive to compute since they can be acquired at the same time during primary face biometric collection.
• Enrolling person needs no cooperation and taken at distance even training of the system is done offline.
• Soft biometric bridges the gap between machine and human since they have a semantic meaning and can be understood by the human as old and short African male.
• Soft biometrics don't rise a privacy concern about collecting and saving data because they provide ancillary description and are not fully distinctive as old and short male.
• Filtering and indexing the large database to limit the number of searched data according to the connected person characteristics [13], for example, we can restrict the search for female gender.

Biometric system
It is an essential pattern recognition system that uses the human characteristics in order to identify the person divided into unimodal system when using single trait and one that uses more than single traits called multi-biometric [14]; when developing a reliable biometric system, there are some concerns that need to be analyzed and balanced as needed [7]: • Harmless to the users, as reported a research company put a SIM card under the skin for authentication.
• Performance, which means the highest recognition rate and system speed, while tolerance the environmental factors affecting the system, stable and time invariant.
• Acceptability, are the people ready to use their biometric trait?
• Circumvention means how easily your system can be overcome or bypassed using fake techniques.
• Accessible, easy to use.
Unimodal systems suffer from low-resolution data due to the person or the sensor, and this can lead to high failure to enroll rate, lacking people coverage area, and low recognition rate because cooperation with the user is needed to collect the data. So it is almost difficult to get very high recognition rates using unimodal system [14]; to improve the recognition rate, we need to acquire more than one trait from the same sensor or multiple sensors, but while increasing the recognition rate, the complexity and processing, which is time-consuming, increase.
Some problems associated with the unimodal biometric systems can be overcome by the use of the multi-biometric systems that combine the information obtained from multiple sources [15]. Still, such a system has two major limitations: first, the overall cost to construct the system can be prohibitive due to the need for more high-quality sensors, large storage capacity, and computational requirements. Second, the system requires a longer time for verification, hence causing inconvenience to the users [10]. However, soft biometrics are the solution to decrease the cost by using the same sensor [10]. The main steps for a biometric system are as follow [7,16] as shown in Figure 3: • Enrollment is the first step where biometric traits of the person are collected by the sensor and saved to the dataset as a template for verification purpose and later on used for identification. Successful biometric enrollment is necessary for the next steps.
• Enhancing the stored data to get high recognition rate by doing preprocessing as histogram equalization, clipping the area of interest dealing with the illumination.
• Extracting features vector from the individual for identification and match it with the stored template data.
• Template dataset: enrolling data means storing biometric data to the dataset as a template to be compared with the stored one. In the case of authentication, biometric data are matched against a reference template from the template database.
• Classification and matching: biometric feature data are validated against the template data in the dataset • Decision can be rejected or accepted according to the matching similarity score or the threshold value. A Survey on Soft Biometrics for Human Identification http://dx.doi.org/10.5772/intechopen.76021

Biometric system can work in two modes:
Identification either in identification mode or verification mode. Identification mode works as one to many by comparing the individual with all the templates stored in the dataset, while a verification mode works as one to one by comparing the individual with his own template stored in the dataset.

Biometric fusion
Biometric data may change over time or affected by environmental condition, so by fusing more than one trait or same trait from more than one source, we overcome the unimodal limitation and try to reduce one or more of the rejection and acceptance error rate based on the system requirements [17] as shown in Figure 4. Moreover, there is no one best biometrics since different applications require different policies such as distance learning, border control, and national identity card that require low false accept rate and failure to enroll. However, fusion is key to increase the recognition rate and can be taken at different stages (sensor, decision, feature extraction, classification stage).

Machine Learning and Biometrics
Sanderson and Paliwal et al. [18] divide the fusion into two categories: before classification called pre-matching and after classification called post-matching as shown in Figure 5: • Pre-classification fusion [19][20][21]: before the classification level, the integration can be done in two ways as followed: 1. Sensor level: integrating the raw data is difficult because it has a lot of unimportant features not only the region of interest and data collected from the sensors can be suffered from noisy as nonuniform illumination. Sensor-level fusion refers to raw data obtained using multiple sensors or multiple snapshots of a biometric using a single sensor. Face images collected from multiple sources with different resolutions may not be possible to integrate together.

2.
Feature level: in feature-level fusion, we get a lot of information by producing one feature set from fusing different features that are extracted from the captured images.
So feature sets need to be tuned, normalized, transformed, and reduced. In practice, it is difficult to achieve feature-level fusion because concatenating different features may lead to dimensionality problem.
• Post-classification fusion [19][20][21][22]: the integration after the classification can be divided into three types: 1. At score stage [23]: scores combined to generate one score value to and used for making decision according to the threshold value. Threshold making the system more reliable than using true and false since there is range can be tuned to increase or decrease the false acceptance rate and false rejected rate. However, a lower threshold decreases the rate of falsely rejected rate but also increases the rate of falsely accepted rate.  2. At rank stage [24]: the score values are arranged in descending order showing the possibility of the decision that at top list most preferred classes are placed and at down list least preferred classes.

3.
Decision stage depends totally on the result value of the score stage, and final decision is taken whether the identified person is fake to reject or a unique to accept. Each classifier provides a hard decision. The decisions can be combined using: • Majority voting: • Decision is taken when a majority of the classifiers declare the same decision. To ensure a decision is taken, we must have classifiers more than the number of classes.
• Logic operator (and, or): • And operator means all the classifiers give the same result whether reject or accept, and it is good when low false acceptance is required. Or operator is useful when low false reject is required.
• Fuzzy logic [25]: • Instead of having reject or accept, we have a truth value between two values.

Performance evaluation
A biometric system needs to be evaluated and tested; there are some measurement concepts for evaluation as equal, false rejection and false acceptance rate [1,26]: • EER means both rate false accept and false reject are equal, and the more the EER, the more accurate the system is. The FRR refers to the rate of permitted users but are rejected by the system falsely.
• FAR means how many people don't have permission but the system accepts them as authorized person and falsely accepted.
• Failure to enroll (FTE) concerned with the rate of individuals not able to enroll in the system.
• FRR: the number of the authorized person but falsely rejected by the system.
• Failure to capture (FTC) concerned with the biometric traits are presented correctly, but the system was not able to capture them correctly. NIA means number of impostor attempts and NAA means number of authorized attempts.
The accuracy and recognition rate and performance measurements of a biometric system can be affected by some factors [26]: • Environmental factors as high temperature, steam, and rain humidity lead to low accuracy. The features change over time as age and performance. The age, gender, ethnic, and face pose.
• User willing and wishes: since users don't need to deal with the system intentionally, the system get affected and accuracy decrease.
• The plastic surgery patients and people who don't have a hand cannot use a fingerprint.
All the measurement rates are affected by the above factors, so any biometric system needs to calculate the error factors and tune and normalize them according to the system requirements and nature.

Literature review and related work
Alphonse Bertillon, who firstly introduced the idea of personal identification system based on biometric, morphological, and anthropometric using color of the eyes, hair, and skin in 1896. Face recognition is lower in uniqueness and more acceptable than iris but still is user-friendly, and people are willing to use it than other techniques [27]. The soft biometric is divided into three groups as follows [28]: • Global traits are used for dataset indexing that remain fixed for the whole life as ethnicity and sex.
• Body features are used to describe an individual height and weight as tall or fat.
• Head features, this is where the research is heading now because of the rich feature in this body part as facial measurements and skin and hair color.
Soft biometric traits also can be classified according to permanence and distinctiveness as shown in In this paper, we are focusing on the head soft biometric features. As shown in Table 2, humans can easily be identified by their faces because they don't change over a period of time and widely. According to Lin [47], face features provide different information when resized or clipped or shown from different sides.
The related works show some of the major works presented in timeline order starting from 2000 up to 2017 as shown in Table 2.
Jain and Dass et al. [13] the father of soft biometric who introduced it as ancillary information, but are not able to individually authenticate the person due to the lake of distinctiveness and permanence. They propose to use demographic information (gender, ethnicity, and height) as soft biometrics to improve the primary fingerprint system. Experiments show that recognition performance of fingerprint increased 5% by using soft biometrics.
Pedro and Julian et al. [28] experimental result shows that soft biometrics can be used as a secondary information to improve the primary biometrics and they can be acquired from distance; fusion is taken at score stage. Park and Jain et al. [34] use three feature extraction techniques: • Active appearance model for extracting facial features as nose and eyes • Laplacian of Gaussian

• Morphological operators
Two datasets are used to evaluate the system. They show that the use of soft biometrics (ethnicity, gender, facial marks) increases the recognition rate. Soft biometric traits can be considered as an alternative when face images are occluded or partially damaged. Gender and ethnicity of a person do not change over the lifetime, so they can be used to purge the database to narrow the search list. However, performance increased, but complexity also increased, and facial mark extraction depends on the image resolution and controlled environment needed.
Dantcheva and Velardo et al. [48] introduce two new soft biometric traits, called body weight and clothes color. Related promising results on the performance are provided. Dantcheva and Dugelay et al. [35] use eyes, skin, and hair color traits and cascade classifier; performance increased and balanced between complexity and performance. However, system suffers from illumination and poses, evaluated under one dataset and controlled environment. Soft biometric traits collected from a distance without user cooperation as shown by Denman and Fookes et al. [31] propose head and body traits and system evaluated using PETS 2006 small dataset and recognition rate decreased but the system can be used when primary data not available. Niinuma and Jain et al. [33] propose framework for continuous user authentication that uses clothing and skin colors fused with password. Soft biometric traits collected automatically every time user login with his password. Experiment results show the method effectiveness for continuous user authentication. However, system is evaluated with one dataset and suffering from illumination.  Table 2. List of some of these works.

Machine Learning and Biometrics
Asma and Souhir et al. [40] use facial measurements and skin and hair color as soft biometric traits. Support vector machine as a classifier is evaluated using one dataset. Results show equal error rate is decreased and recognition rate improved and requires no more cost since soft biometric traits are collected at the time of primary biometric collection by the same sensor. However, system needs to be tested with more difficult dataset and compared with another system. On the other hand, facial measurement features are very sensitive to pose and expression variation.
Nawaf and Nixon et al. [42] consider the eyebrow measurement distance and length from crowd sourcing. System is evaluated under one dataset with one classifier. Recognition rate increases but still needs to be tested with another dataset and compared with different classifiers. Jain and Park et al. [11] fuse face and facial marks. Their results show system performance increased up to 94.14%, but still facial mark extraction depends on the image resolution.
Min and Hadid et al. [37] propose facial occlusions as sunglasses, scarf, eye color, beard mustache, and glasses' traits. Experimental result shows that facial occlusions affect the system performance especially when user tries to use it to prevent himself from being recognized. However, they used one dataset for evaluation and did not compare it with other systems. Chen and Huang et al. [44] define new soft biometric traits to describe people based on their clothes' type, color, and pattern. RCNN body detector is used. However, they used their own dataset taken under controlled environment for training the RCNN, so the system cannot be compared with different systems and neural network needs more training data.
Jain, Dass, and Nandakumar et al. [10] combine gender, height, and ethnicity as soft biometric traits with fingerprint. The system performance increased by 6%. However, soft biometric traits did not extract automatically, and the system is evaluated by 160 subjects only. Lee, Jain, and Jin et al. [29] achieve a recognition rate of 98.6% on Web-DB with good quality taken under controlled environment and 77.2% on Michigan State Police Tattoo Database (MI-DB) using scale-invariant feature transform (SIFT) feature extractor. Experiment results show scars, marks, and tattoos (SMT) are more distinctive than other demographic biometrics such as ethnicity, gender, and weight to identify a person. However, tattoo dataset is collected under controlled environment at booking time.
Batool, Nazre, and Sima et al. [41] report a classification accuracy of 88% for facial wrinkles as a soft biometrics using modified Hausdorff distance (MHD) algorithm. There is no standard dataset to evaluate the system and compare with the other one. However, wrinkles are extracted manually by hand, and detecting wrinkles needs high-resolution image. Velardo, Carmelo, and Jean-Luc et al. [38] present a human body measurement (anthropometry) to prune primary biometric dataset. Their own medical dataset is collected from Indian hospital used for evaluating the body measurements and FERET data for face recognition. Results show system accuracy and recognition speed increased.
Saini and Sinha et al. [36] integrate the face and facial measurement of the lips and eyes as distance between two pupils, distance between the eyes and the lips, and length of the lips and the eyes to improve the recognition rate using hamming, absolute difference, and biohashing distance techniques. Experiment results on Yale dataset show error rate is decreased. However biohashing performances are poor when the tokenized random numbers are compromised; also only one dataset is used and results are not compared with another system. Tiwari S, Singh A, and Singh SK et al. [39] propose an optimal framework for newborn recognition by fusing match scores from face and soft biometrics. Results on IMS-BHU Indian hospital dataset show that soft biometrics improve recognition rate by 5.6% over the primary biometric. However framework evaluated on one dataset has high-resolution image taken under controlled pose and illumination.
Jaha, Emad, and Mark et al. [43] show clothing traits can be used for identification of individual where clothing descriptions might be the only available feature. An, Chen, Kafai, Yang, and Bhanu et al. [49] aim to improve the re-identification performance by re-ranking the returned results based on soft biometric attributes. Experiments on challenging benchmark VIPeR dataset show that reranking improves the recognition accuracy.

Challenges and future work
Multimodal biometric systems are used to overcome the unimodal biometric system limitations by collecting multiple traits from multiple sensors. However, such a system will decrease the performance by increasing the processing duration and verification steps, and this causes users' troubles. So for developing reliable and user-friendly biometric system, we fuse soft and primary biometrics to improve the overall performance of the primary biometric system.
Soft biometrics inherit the nonintrusiveness and computational efficiency, which allow for fast, enrolment-free, and pose-invariant biometric analysis. However biometric system based on soft biometric trait only cannot provide accurate recognition because they change over time and lack distinctiveness, so there are still many challenges in this area. Parameter tuning as fusion rules and decision threshold otherwise error rate will increase and this can be improved using fuzzy logic.
Soft biometrics are very sensitive to illumination, expression variations, and pose variation, so we can use deep learning for preprocessing and feature extraction. New soft biometric traits can be also introduced as relative between the size of the head and body and facial distance measurement.

Conclusion
In a holistic survey on soft biometrics for user identification, we have seen that there is no one best biometric technology since it depends on the application requirement. A zero false acceptance rate is needed, for example, in security, and the false rejection rate needs to decrease, but in the civilian application, we need the opposite, so for any biometric system, we need to find a good balance between authentication reliability and complexity. As a result, traditional biometrics suffer from low recognition rate because they need cooperation with the user, operate in the controlled environment, and introduce privacy concern. So using multi-biometrics is the solution, but still, the system suffers from computation cost and long processing steps.
However, another possible solution is to use soft biometrics to increase the population coverage and decrease the system cost and complexity.