HLA Polymorphism in Anthropology

Humans are the most adaptable species among living organisms. Adaptation is the sum of all processes which allows the organism to cope with environmental stresses, in particular climate and topography, for its survival. All living species continuing through time and space to the present day are endowed with innate biological adaptation. The whole gamut of biological adaptation of human species is dependent on the sum total of anatomical, physiological, immunological and genetic characteristics. But the human species, in addition to biological adaptation, also possess cultural adaptation. Cultures are customs and traditions which help shape the human body and mind. The quest for understanding diversity of his own species and the rest of the species through time and space has promoted the expansion of the science of biological/physical Anthropology [1].


Introduction
Humans are the most adaptable species among living organisms.Adaptation is the sum of all processes which allows the organism to cope with environmental stresses, in particular climate and topography, for its survival.All living species continuing through time and space to the present day are endowed with innate biological adaptation.The whole gamut of biological adaptation of human species is dependent on the sum total of anatomical, physiological, immunological and genetic characteristics.But the human species, in addition to biological adaptation, also possess cultural adaptation.Cultures are customs and traditions which help shape the human body and mind.The quest for understanding diversity of his own species and the rest of the species through time and space has promoted the expansion of the science of biological/physical Anthropology [1].
The irresistible urge for understanding the phenomenon of evolution of Homo sapiens with a systematic scientific search to decipher the chronological events of the past led researchers to the objective reconstruction of the vanished past, postulating that anatomically similar modern humans had emerged in Africa 200,000 years ago and dispersed to all regions of the world.The accumulated evidences which led to such a suggestion encompasses research from several scientific disciplines viz., hierarchical taxonomy of primates displaying nested groupings; comparative anatomy of all primates exhibiting homologies such as arboreal adaptation and brachiating anatomy of apes and humans; comparative primate embryology exhibiting similar ontogeny; comparative molecular genetics of hominoid chromosome exhibiting 98% similarity between chimpanzees and humans; adaptive anatomical structures such as pelvic structure adapted to erect bipedalism and larynx adapted for speech; presence of vestigial structures mimicking ancient forms, nipples on males; paleogeographical evidences such as distribution of fossils of earlier and later forms of hominoids and their sequence and pattern and chronological sequence of ancient tools, overwhelmingly centred around the African origin of modern humans [2][3][4][5][6][7].
Biological anthropology, primarily deals with tracing the biological origins by analysing change in gene frequency in a population gene pool over a period of time leading to heritable genetic differences in subsequent generations and ultimately the genetic diversity of the human species.In the process, scientists undertake genetic analysis to find reasons behind the physical differences between people of various groups [1].The genetic analysis assesses frequency of variant allele relating to genetic markers and comparing genetic variation among populations with a view to trace evolution.In this article a general appraisal of genetic diversity, the causative factors of genetic diversity, their impact on evolution [7], and the different molecular genetic markers with emphasis on the human leukocyte antigen (HLA) genetic markers, used in the study of tracing past events of human migrations are discussed.

Genetic diversity
Humans across the world exhibit remarkable phenotypic variations coupled with behavioural attributes.These variations are due to the combined effect of genetic and environmental factors.Researchers assess genetic variation by comparing variation between individuals in a group and by comparing variation between individuals in different groups (intra and inter population differences).The sources of individual variation are due to recombinant events in the genome and mutational events in meiosis leading to polymorphic alleles.Variation between groups is due to selective pressure on the genome due to differences in the environment and due to the combined outcome of founder effect and genetic drift.These differences are the source to trace/validate migration patterns in populations [8][9][10][11][12][13][14][15].There are four major causes for genetic diversity in populations.They are (1) random sampling of gametes, (2) mutation, (3) subdivision, migration and genetic exchange and (4) natural selection.

Random sampling of gametes
In finite populations in the absence of any selection, random sampling of gametes effects a change in the gene frequency from the previous generation by chance.Random sampling is better visualised in finite populations.In real life, all populations are finite.For some populations (bacteria), the assumption of infinite size is a good approximation.For some, this is completely unrealistic.Hardy-Weinberg equilibrium (HW) assumes that the population is infinite.When a population is finite, random genetic drift produces a more pronounced effect.Random genetic drift is the random fluctuation of allele frequencies resulting in fixation or loss of an allele [10][11][12][13][14][15][16].Wright-Fisher model explains the random change of frequency in finite populations.This model assumes a constant number of small panmictic populations producing infinite number of gametes evolving through nonoverlapping discrete generations without mutation and without selection.This concept of random selection resulting in random genetic drift is explained in Fig. 1.Consider that the parents are heterozygous for a locus, say, A and a.The parents produce a large number of gametes of which A and a will be 0.5 in proportion approximately.This proportion may not be 0.5 since reproductive cell death may occur at any stage of the gamete formation and besides, in females ¾ of the products are lost as polar bodies.In Fig. 1, the population is shown as consisting of 10 (3 AA homozygote; 4Aa heterozygotes and 3 aa homozygotes) individuals at t 0 generation with (allele frequency) of A = 0.5 and a = 0.5, producing infinite gametes and 10 individuals by random selection of gametes in each generation.At generation t 20 (Fig. 1) the gene frequency of A=0.55 and a=0.45 (3 Aa heterozygous 4 AA homozygous and 3 aa homozygous) and over generations, the finite sampling process erase the heterozygosity in the population.The end result of the random change will be that the frequency of A will eventually be 1 and a = 0; -that is the population is becoming homozygous [Fig.1].The rate of genetic drift has an inverse relationship to the size of the population.This is to say that in an infinite population the loss of heterozygosity is extremely minimal.One reason for this is that the pool of reproductive individuals is always smaller than the total population.There are other reasons, including individual differences in expected fertility and changes in population size.Basic principles show that if the population size fluctuates, genetic diversity (heterozygosity) is lost at a rate related to the smallest size.There are two specific circumstances that greatly accelerate genetic drift.The first, called a bottleneck, occurs when a population size is reduced for a protracted period of time and then rebounds.The second, called a founder effect, occurs when all individuals in a population are traced back to a small number of founding individuals.Genetic diversity is lost very slowly in large populations.Like selection, drift is a process of differential reproductive success; however, the key element of genetic drift is that which individuals survive and reproduce is unrelated to their phenotype and genotype and it is random.

Mutation
Mutation is the ultimate source of all variations in a population.Mutations can be beneficial, neutral, or harmful for the organism, but mutations do not try to supply what the organism needs.In this respect, mutations are random.Many mutations are functionally silent either due to their presence outside the protein coding region of the genome or when present within the coding region does not alter the final protein product [17].These silent mutations are used in deciphering genetic ancestry and demography of populations.Random genetic drift due to finite sample size which results in loss of genetic variation is offset by mutations generating genetic variation.The balance between these two opposing forces is assessed by using Coalescent modelling [18,19].
Coalescent theory states that all genes or alleles in a given population are ultimately inherited from a single ancestor shared by all members of the population, known as the most recent common ancestor (MRCA) [19][20][21].If the inheritance relationships are displayed in the form of a phylogenetic tree (termed a gene genealogy), the gene or allele of interest is said to undergo coalescence to the common ancestor (sometimes termed the coancestor to emphasize the coalescent relationship) [ Fig 2].Basic coalescence theory assumes that genes do not undergo recombination and models genetic drift as a stochastic process.Because the process of gene fixation due to genetic drift is a crucial component of coalescence theory, it is most useful when the genetic locus under study is not under natural selection.Coalescent modeling helps to understand the structure of whole population by assessing a small sample of descendents.It allows quantitating expected sequence diversity, the expected number of segregating sites, expected heterozygosity etc.Though the coalescent model addresses complex issues of population genetics there underlies the following basic components viz., the expected time back to the MRCA, the mutation rate and the outcome of the mutation [16][17][18][19][20][21].

Subdivision, migration and genetic exchange
The collection of genetically differentiated subpopulations is referred to as population substructure [22,23].In a large random mating population the genes with multiple alleles obey HW equilibrium in the absence of natural selection.Migration into and out of a population affects the population genetic structure [23][24][25][26].Consider a hypothetical supposition that immigrants from a large population formed a new population in a different location.The parental population and the new population after hundreds of generations may again subdivide.This gives rise to two additional populations.All the four populations (ie. the parental plus the 3 sub-divided populations) will go through hundreds of generations.Eventually these populations are known as meta populations -regardless of whether they remain completely isolated or they may have been in communication with each other through the exchange of individuals.
Migration between two populations may have effect on genetic variation.However, it will be difficult to identify the boundaries of sub-populations and on genetic analyses one may confront with samples of individuals that may come from one sub-population or from more than one sub-population.The parental and the newly formed sub-populations may genetically be different from each other.Even if each of the subpopulations obeys HW and linkage equilibrium, a pooled sample from these populations may not match the expected and observed data.It is due to the fact that populations are more likely to choose mates living nearby and not in a random fashion.Since individuals that live close to one another tend to be more genetically similar than those that live far apart, the impacts of local mating will mimic those of inbreeding within a single well-mixed population.This is known as Walhund effect [27].
On genetic analysis the copies of a certain genetic locus coexisting in a sub-population do not always coalesce together.With reference to Fig. 3, sub-populations B and C, before coalescing at (MRCA), coalesce with D. If the subpopulations have high frequencies of certain alleles at a locus, the pooled population will show substantial linkage disequilibrium.If all the populations are in contact and random mating takes place it will take a considerable time for attaining linkage equilibrium since the reduction in linkage disequilibrium is by a factor of 1-r per generation, where r is the recombination fraction between two loci, that is the linkage disequilibrium between two linked loci will be reduced by ½ per generation.Continued random mating eventually result in linkage equilibrium.The genetic exchange between the local sub-populations is termed as gene flow.

F statistics
Walhund effect is the observation of excess homozygotes or deficiency of heterozygotes in a population of pooled subpopulations.Subpopulations fixed for a particular allele in a certain locus, is indicative of homozygous individuals in that population for that allele.Sewall Wright [11,12,28,29]  subpopulations, then F = 0 (random population); alternatively if they are fixed for an allele then F = 1 (100% homozygosity).In the absence of selection and mutation, genetic drift is the primary evolutionary force causing differentiation of the population.Mutation and migration may prevent F from reaching 1 by introducing alternative alleles.Low levels of migration leads to moderately high level of F value.If the drift and migration is continuing through many generations and reaching equilibrium then F attains a constant value and can be deduced by the equation,

F= 4Nm+1
Where N is the effective population size (often referred as N e ) and m is number of migration rate.

Natural selection
Natural selection is the fourth primary mechanism which acts as a whole on populations rather than individual organisms that produces changes in the genetic composition of a population from one generation to the next and in due course causes evolutionary change [29,30].Individuals in a population vary in genetic composition and some may have genetic variants conferring reproductive fitness making them more adaptable than others.In successive generations more offsprings will have better traits leading to changes in the frequency of that trait.Mutation often produces deleterious alleles.Selection removes deleterious alleles thereby providing stability of biological structures, and it is known as negative selection.Since negative selection confers stability by removing deleterious alleles it is also known as purifying selection or background selection.T cells which recognise self molecules and are eliminated in the thymus, is an example of negative selection.In some circumstances, a favorable allele may arise by mutation and may sweep through the entire population replacing all other alleles and such a selection is known as positive selection.The null allele at Duffy blood group locus conferring resistance to malaria parasite in African populations is a well known example for positive selection [26].

Genetic markers in the study of genetic diversity
Genetic variation is the fundamental prerequisite for evolution.Evolution is a continuous process and hence there should be processes to increase or decrease genetic variation.Genes mutate resulting in new alleles and they are the source of variation.Natural selection process acts on them furthering evolution.Genetic variation is the result of mutation and random association of alleles.Considerable variations are present in natural populations.
The study of genetic variation assists us to understand the place and time of origin of modern humans and their dispersal pattern to all regions of the world.The study of migratory pattern using genetic variation requires extensive genetic marker based population data.The application of various genetic markers in assessing genetic diversity in populations paralleled the development and design of new genetic markers [30].One can distinguish three phases in the use of various genetic markers [31][32][33].Blood grouping and other serological characteristics based genetic polymorphism data formed the first phase of genetic diversity information.With the advent of electrophoresis to separate variant alleles in protein markers, the second phase utilized protein markers extending across all the populations.The third phase is marked with the use of DNA markers.The DNA marker based genetic diversity studies on populations completely replaced the use of serological/protein based genetic markers.In the subsequent paragraphs, an overview of the utility of DNA molecular genetic markers in Anthropological studies of human populations, with special emphasis on the HLA genetic marker is discussed.
DNA-based genetic diversity studies either use frequency data or direct sequence data.In general such studies involve three important steps in deciphering the phylogeography of populations.The three steps are (1) assessing inter and intra population differentiation; (2) ascertaining gene genealogies by constructing phylogenetic tree and (3) drawing inferences about dispersal pattern.The DNA-based molecular genetic markers routinely used in human population genetic diversity studies includes mitochondrial DNA (mtDNA), Y chromosome markers, single nucleotide polymorphism (SNPs), microsatellites and HLA markers.

Mitochondrial DNA
Human mtDNA is a single double-stranded circular DNA consisting of 16,569 basepairs in length.It is endowed with certain unique features such as high copy number per cell, maternal inheritance, lack of recombination and higher mutation rate than nuclear genes.The mtDNA behave as a haploid genome.The maternal inheritance and haploid nature of the genome facilitates in identifying relationships in a population.Using sequencing and RFLP-based high resolution mapping, Cann et al [34] suggested that Africa is the likely source of human mitochondrial gene pool and based on mtDNA sequence divergence suggested that the common ancestor of all surviving mtDNA types existed 140,000-200,000 years ago.Using a different method, Ingmann et al [35] estimated the time since most recent common ancestor (TMRCA) as 171 ± 50 thousand years ago (kya).Mitochondrial studies on various worldwide populations found further evidence for the African origin hypothesis and also estimated TMRCA at about 100,000-200,000 years.mtDNA studies on evolution is approached in two ways, namely, lineage based (haplogroups) and population based.
However the recent trend is to use both haplogroup and population based studies to understand the pre-history of human populations.

Y chromosome
The Y chromosome in the human nuclear genome is haploid as that of mtDNA.The Y chromosome is paternally inherited.The human Y chromosome has now been sequenced [36].It consists of the recombining segments, known as pseudoautosomal regions at the Yp and Yq ends.The Male sex determining region of Y (MSY) previously known as nonrecombining region of Y (NRY) consists of euchromatic and heterochromatic regions accumulate changes due to insertions, deletions, base changes (SNPs) and Alu sequence insertions polymorphisms (YAP).The stable YAP and SNPs together are known as unique event polymorphisms (UEP) and many of them are bi-allelic markers.The microsatellites present in MSY region of Y chromosome accumulate changes, which either increase or decrease the copy number of core repeats faster than UEPs.The Y-linked loci in the MSY are haploid and the non-recombining nature of this region coupled with accumulated changes over thousands of generations are useful in delineating male lineages in populations and usefully exploited to study the prehistoric migrations of human populations [37].Extensive genetic studies were undertaken on worldwide populations to ascertain male Y haplotype lineages, and their pattern of distribution suggest a recent origin between 60 and 150 thousand years ago in Africa for all the present day Y chromosomes [37][38][39][40][41][42].

Microsatellites
Microsatellites, also known as short tandem repeats (STRs) are arrays of 2-6 bp length tandem repeat motifs present throughout the genome.Changes in the repeat number take place due to replication slippage and the rate of change in repeat number is faster than SNPs.It is in the order of 10 -3 .The change in repeat number follows most often stepwise mutation model (SMM) that is either increase or decrease by one repeat at a time.It is also reported that multistep changes or point mutations are also responsible for change in motif number.Being a neutral polymorphic marker, microsatellites are used in genetic mapping and studies of the evolutionary connections between species and populations.Microsatellites are the preferred markers for high resolution genetic mapping and useful in inferring relationships between closely related population groups.In a pilot study, Bowcock et al [43] studied 30 dinucleotide loci from 14 aboriginal populations and constructed a phylogenetic tree, in which the first split separated Africans from the rest of the populations www.intechopen.com and Goldstein [44] reanalyzed the data using a new genetic distance and estimated TMRCA 75,000 -287,000 years.

Single nucleotide polymorphism
Single nucleotide polymorphism (SNP) is defined as a single base change in the sequence of a segment of DNA occurring at a rate of >1% in a large population.SNPs occur in high frequency in the human genome and they can be found in coding and non-coding regions and they occur with very high frequencies -about 1 in 1,200 bases on average, which results in approximately 10 million SNPs in the human genome.SNPs are due to either base change or by deletion/insertion of a base.A base change in a coding region either alters the protein structure (synonymous mutation) or does not alter the protein structure (non-synonymous mutation).SNPs are the major cause of genetic diversity among different individuals facilitating large scale genetic association studies as genetic markers.With the advances in statistical methodologies in population genetics and the availability of large scale SNP data, this marker may facilitate in the study of prehistoric migration and demographic history of modern humans [45][46][47].

HLA polymorphic markers
The HLA is a multigene family and spans approximately 4 mega bases [48][49][50].Currently there are at least 7,130 alleles in the class I and class II HLA loci described by the HLA nomenclature and included in the IMGT/HLA database (as of January 2012).The IMGT/HLA consortium directly receives the sequences for new alleles from researchers for checking and assignment of official name prior to publication to avoid confusion and multiple names.The polypeptides produced by these alleles differ by one or more amino acid substitutions.The polymorphic nature of the HLA class I and class II loci is a useful tool for the study of human evolution.
Serological and DNA based typing methods are used in HLA typing [51][52][53].High resolution sequence specific primer (SSP) and sequence based typing (SBT) methods are more appropriate since both the methods are able to identify all the alleles so far defined, and for SBT, capable of identifying new alleles.

Significance of HLA diversity in evolution
The rate and number of nucleotide substitutions leading to new alleles in each of the functional HLA class I and II loci are quite high compared to neutral loci such as mtDNA and Y chromosome markers.Besides that the class I and class II loci exhibit high heterozygosity (80-90%) and hence are good genetic markers for phylogenetic study.Some of the lineages especially the DRB1 lineages are perpetuated more than 35 million years ago, the time of evolutionary divergence of the so-called hominoids (apes) from old world monkeys.Though the exact nature of mechanism for the high number of alleles and the perpetuation of alleles is not known, it is suggested that high mutation, inter locus genetic exchange (gene conversion) coupled with over dominant diversity results in the perpetuation of high number of alleles in the HLA locus.In such a selection, not only new alleles but old alleles have high selective advantage for perpetuation [54][55][56][57][58][59][60][61][62][63][64][65][66][67].

HLA polymorphism in phylogenetics
HLA markers are codominant SNP markers enabling heterozygotes to be distinguished from homozygotes, and allowing the determination of genotypes and allele frequencies.The class I and class II alleles in HLA are closely linked and occur jointly in individuals more often than by chance (linkage disequilibrium) [68].Population migration and genetic drift can cause linkage disequilibrium.Linkage disequilibrium decreases by random mating over a period of time which is dependent on recombination fraction per generation.The study and analysis of remnant linkage disequilibrium, assuming that there is no selection shall provide information on the number of generations passed in-between two closely related populations from the time of their separation.Haplotype diversity, allele frequency variation and linkage disequilibrium analysis in HLA genetic markers are used to reflect the amount of variation between closely related populations.Comparison of variation is used to assess population genetic substructure.The analysis of variation in the HLA class I and II markers increase the power of detecting population substructure because each locus will contain an independent history of the population depending on the amounts of random drift, mutation, and migration that have occurred.The allele frequency based genetic distance help to construct phylogenetic tree to infer the relative estimate of the time that has passed since the populations have existed as single cohesive units [69][70][71][72].The HLA marker allele frequencies of various populations are also used for cluster analysis using principal coordinate analysis (PCA).
Each population has unique HLA profile with reference to class I and class II HLA gene distributions.This has been reported in several studies compiling population data on HLA class I and class II genes from various populations of the world and it has significance in anthropological studies .To highlight the effectiveness of HLA genetic markers in phylogenetics, a few studies relevant to genetic relationships among populations are described.Serjeantson et al [75] used allele frequency variation and linkage disequilibrium studies of HLA A and B loci in 16 Pacific Island populations to trace the phylogeogrphy of the populations to elucidate the interrelationships and migrations among peoples of the Pacific Islands.Shaw et al [89] used HLA-A, -B, -DR and -DQ allele frequency and haplotype frequency data to show that the aborigines in Taiwan has a distinct profile than the Chinese population groups and found Javanese closely related to Taiwan aborigines.Using HLA class I (A and B) and HLA class II (DRB1 and DQB1) allele distribution, linkage disequilibrium and cluster analysis it was shown that Amerindians are the very first American Natives that were already in America when Na Dene (Athabascans, Navajo, Apache) and Eskimo speaking people reached it [105].Recently, Buhler and Sanchez-Mazas [108] used sequence based analysis on seven HLA genes in 23,500 individuals from 200 populations across the world to report on the significant correlation between genetics and geography which is in agreement with earlier HLA based studies using allele frequency data for population genetic diversity studies.They also concluded that geography plays a major role in shaping molecular variability among populations.

HLA genetic diversity studies and effective population size
The theory of African replacement model of evolution of modern humans generalize that the modern humans originated in Africa 200,000 years ago and the transition from archaic to modern humans was associated with a narrow bottleneck and the number of individuals in the bottleneck period was small.The theory of coalescence permits us to estimate the effective population size (N e ) and based on mtDNA and Y STR markers, the N e at the time of divergence of modern humans is estimated to be 10 4 (10,000) individuals.In contrast, HLA based genetic diversity studies, taking into consideration the occurrence of large number of alleles and the perpetuation of HLA lineages more than 35 million years ago, previously estimated the effective population size to the order of 10 5 (100,000) individuals without bottleneck [61][62][63].However subsequently this hypothesis was revised which included a bottleneck but spanning for a shorter period although the effective population size is maintained at 10 5 [64].However Bergstrom et al (1998) on analysis of intron sequences of more than 135 contemporary human DRB1 alleles generated after the separation of hominoids from old world monkeys suggested that the coalescent time of alleles within these allelelic lineages indicate that the effective population size (N e ) is similar to estimates based on mtDNA that is 10 4 [66].

Conclusion
The high polymorphism, tight linkage, the random association of alleles and the perpetuation of allelic lineages over time make HLA genetic markers an invaluable tool in unravelling the human past.The vital information relating to amount, pattern and distribution of genetic variation of HLA genetic markers in different populations enable us to correlate genetic profile of populations and their past migrations in the determination of their origin.

Fig. 1 .
Fig. 1.Random Selection of Gametes Illustration of random selection of gametes in a finite population represented by 10 circles, producing infinite number of gametes evolving through non-overlapping discrete generations without mutation and without selection.Solid circles (homozygous AA), open circle (homozygous a a) and semi-solid circles (heterozygous individuals A a)

Fig. 2 .
Fig. 2. Coalescent Model Tracing the gene geneology of the four sub-divided present day population leading back to the MRCA.

Fig. 3 .
Fig. 3. Coalescent model in a subdivided population.Adapted from (16) The present day sub-divided populations (also known as meta populations) represented by A, B, C, D, E and F tracing back to one single parent population in the past.Selected genetic locus coexisting in a sub-population do not always coalesce together.For example, the subdivided populations B and C coalesce with sub-divided population D before they coalesce with the MRCA despite sub-divided population D being in another group of subdivided population.Solid circle represents unchanged allele within the selected gene locus of the parent while open circle represents the alternate allele in the selected locus.