Genetic Diversity and Allele Mining in Soybean Germplasm

Soybean, Glycine max (L.) Merrill is recognized as the most important grain legume in the world in terms of total production and international trade (Golbitz, 1995), being an important source of protein and oil. There are developing thousands of breeding lines and hundreds of elite cultivars yearly in the soybean hybridization programmes over the world. The developing of these breeding lines increased genetic uniformity in the frame of species. Therefore, the genetic basis of these released cultivars is rather narrow. Generations of new and improved cultivars can be enhanced by new sources of genetic variation; therefore criteria for parental stock selection need to be considered not only by agronomic value, but also from the point of view of their genetic dissimilarity. That is why the evaluation of genetic variation is a very important task not only for population genetics but also for plant breeders. The study of genetic variation has fallen within population genetics which has focused on analyzing, measuring and partitioning genetic. The genetic diversity can be analyzed by agronomic and biochemical traits, and molecular marker polymorphisms, Analysis of gene marker data enables estimation of the mating system and monitoring of genetic changes caused by factors affecting the reproductive biology of a species. A key factor driving utilization of exotic germplasm is potential benefit. Benefit can be quite apparent for characteristics such as disease resistance or agronomic traits, but vague for yield or abiotic stress resistance.


Introduction
Soybean, Glycine max (L.) Merrill is recognized as the most important grain legume in the world in terms of total production and international trade (Golbitz, 1995), being an important source of protein and oil. There are developing thousands of breeding lines and hundreds of elite cultivars yearly in the soybean hybridization programmes over the world. The developing of these breeding lines increased genetic uniformity in the frame of species. Therefore, the genetic basis of these released cultivars is rather narrow. Generations of new and improved cultivars can be enhanced by new sources of genetic variation; therefore criteria for parental stock selection need to be considered not only by agronomic value, but also from the point of view of their genetic dissimilarity. That is why the evaluation of genetic variation is a very important task not only for population genetics but also for plant breeders. The study of genetic variation has fallen within population genetics which has focused on analyzing, measuring and partitioning genetic. The genetic diversity can be analyzed by agronomic and biochemical traits, and molecular marker polymorphisms, Analysis of gene marker data enables estimation of the mating system and monitoring of genetic changes caused by factors affecting the reproductive biology of a species. A key factor driving utilization of exotic germplasm is potential benefit. Benefit can be quite apparent for characteristics such as disease resistance or agronomic traits, but vague for yield or abiotic stress resistance.

Origin and diversification center of the soybean
Scholars generally agree that cultivated soybean (Glycine max) has originated in the eastern half of North China in the eleventh century B.C. or perhaps a bit earlier (Fukuda, 1933 andSingh, 2010). It is believed on world wide scale that soybean has been domesticated from the annual wild soybean Glycine soja Sieb.et Zucc. Many studies based on old Chinese literature, the geographic distribution of the wild ancestral species, the levels and types of genetic diversity of soybean varieties and the archeological evidence consistently indicated that China is the origin and diversification center of the cultivated soybean (Fukuda, 1933;Hymowitz, 1970;Zhuang, 1999). The evidences that China is the origin and main center of diversity of soybean are (1) the distribution of G. soja in China is the most extensive in terms of the numbers and diversity of types; (2) China has the earliest written records of soybean cultivation, about 4500 years ago; (3) soybean has been found in unearthed artifacts; (4) soybeans cultivated in different countries in the world were introduced directly or indirectly from China; and (5) the pronunciation of the word of soybean in many countries is about the same as the Chinese 'Shu'; for instance, it is pronounced 'soya' in England, 'soy' in the USA, and in other languages. Although, the origin of soybean cultivation may be China, scholars have different viewpoints on the original areas of soybean domestication. One of these views is the theory that soybean originated from northeast China (Fukuda, 1933). This theory based on the observations that semi-natural wild soybeans are extensively distributed in northeast China, that is, there are large numbers of soybean varieties that possess 'primitive' characteristics, such as small black soybean germplasm that extensively distributed in the lower and middle reaches of the yellow river North provinces. The second theory is that soybean cultivation originated in South China. In this theory, it has been thought that south China could be the origin of soybean (Wang, 1947). The evidences for that are the wide distribution of wild soybean in this area, extensive presences of primitive soybean varieties such as Nidou, Maliao Dou, Xiao Huangdou and others that have (1) the short-day character, which is considered to be the initial physiological state of soybean, and (2) the primitive agronomic characteristics related to yield and quality of soybean varieties. The other evidence supporting this theory is the close relatedness between cultivated soybeans in southern China, to wild soybeans in genetic terms based on isoenzymes, and RFLP (Restriction Fragment Length Polymorphism) markers of chloroplast and mitochondrial DNA, SSR data and botanical traits (Ding et al., 2008;Guo et al., 2010). In the third theory, it has been thought that the origin of soybean was the eastern part of northern China (i.e. the lower reaches the Yellow River) (Hymowitz, 1970). The evidences for his thought are the same blooming dates for both wild soybean and cultivated soybean at 35°N, confirming that cultivated soybean varieties may have been derived from local wild soybean at around 35°N. In addition, the protein content of cultivated soybean is close to that of wild soybean at 34-35°N. The fourth theory stated that the cultivated soybeans have multiple origins (Lü, 1978). The evidences for that postulation are (1) both South and North China have regions with early developed cultures, that is: the ancients in these regions used local wild soybean as food and did not domesticated wild soy-beans into cultivated ones; (2) the occurrence of wild soybean and cultivated soybean in the same regions and the similarities of both of them in morphological characters; (3) the successful cultivation of both wild and cultivated soybeans in different regions across China. In addition, the geographical distribution of the short-day character of wild soybean indicates the possibility of multiple origins of cultivated soybean.

Genetic diversity of soybean germplasm based on morphological traits
As we know, phenotypic traits are controlled by genes and affected by environment, but large numbers of accessions can adapt to environments. The phenotypic data has more polymorphism in genetic diversity and reveal genetic variation indirectly. On the contrary, the molecular data reveal genetic variation directly, but fewer markers have less polymorphism. It is very difficult to obtain molecular data for a large number of accessions that has enough polymorphism to show the genetic diversity of germplasm. So, the morphological traits are the suitable and practical tools for studying the genetic diversity on large numbers of accessions. Variation in shape of plants has always been an important means of (1) distinguishing individuals; (2) controlling seed production; and (3) identifying the negative traits those effects on yield, the genetic diversity centers of annual wild soybean and the soybean lines resistance to pod shatter, drought, pests or disease (Truong et al., 2005;Malik et al., 2006Malik et al., , 2007Ngon et a.,l 2006). The studied soybean germplasm exhibited a wide range of phenotypic variation for pod number, seed number, and plant yield. It also showed that soybean developing stages had close association with agronomic traits as well as yield and yield components (Malik et al., 2006(Malik et al., , 2007Ngon et al., 2006). Pod shape is one of the important descriptors for evaluating soybean genetic resources (IPGRI, 1998;USDA, 2001). Truong et al. (2005) tested the applicability of elliptic Fourier method for evaluating genetic diversity of pod shape in 20 soybean (Glycine max L. Merrill) genotypes. They concluded that principal component scores based on elliptic Fourier descriptors yield seemed to be useful in quantitative parameters not only for evaluating soybean pod shape in a soybean breeding program but also for describing pod shape for evaluating soybean germplasm. The genetic diversity was evaluated for genotypes of soybean based on the yield-related traits (Rajanna et al., 2000;Malik et al., 2006Malik et al., , 2007Ngon et al., 2006). It has been reported that differences among genotypes for all the characters were highly significant and the grain yield was positively and significantly correlated with number of pods per plant. The selection for the character had positive direct effect on yield. However, some traits had negative direct effects on yield, such as the leaf area, first pod height, days to 50% flowering, days to flowering completion, days to maturity, plant height, oil content and protein content. The study of the genetic diversity of wild soybean is invaluable for efficient utilization, conservation and management of germplasm collections. Dong et al. (2001) statistically analyzed the agronomic traits of the data base from the National Germplasm Evaluation Program of China to study the geographical distribution of accessions, genetic diversity of characters and genetic diversity centers of annual wild soybean. The results showed that most annual wild soybeans are distributed in northeast China, and the number of accessions decreases from the northeast to other directions in China. They proposed three genetic diversity centers for annual soybean grown in China, the northeast, the Yellow River Valley and the Southeast Coasts of China. Based on these results and Vavilov's theory of crop origination, two opposing possible models for the formation of the three centers are proposed, either these centers are independent of each other and the annual wild soybeans in these centers originated separately, or the northeast center was the primary center for annual wild soybeans in China, while the Yellow River Valley center was derived from this primary center and served as the origin for the southeast Coast center. The genetic variability in 131 accessions of edamame soybeans (the Japanese name for a type of vegetable soybean eaten at the immature R6 stage) was analyzed using phenotypic traits e. g. maturity information, testa color, and 100-seed weight for breeding new edamame lines resistance to pod shatter (Mimura, 2001). The 131 accessions include 108 Japanese edamame, 11 Chinese maodou, 8 WSU breeding lines, 2 US edamame and 2 US grain soybeans. The obtained results indicated that Edamame genetic diversity was generally clustered around maturity groups and testa color. It was also reported that the genetic diversity among the Japanese edamame cultivars was narrow, compared to Chinese maodou; Japanese edamame and Chinese maodou soybeans may have different genetic pools. Soybean genotypes, which exhibit genetic diversity in root system developmental plasticity in response to water deficits in order to enable physiological and genetic analyses of the regulatory mechanisms involved, were identified (Young, 2008). These genotypes can tolerate drought stress which is the major factor that limiting soybean yield. The results showed substantial genetic diversity in the capacity for increased lateral root development (number and total length of roots produced) and in the responses of overall root and shoot growth under water deficit conditions. The extent of between-and within-species differences in the resistance of the four commonest species of Glycine (G. canescens, G. clandestina , G. tabacina and G. tomentella) to leaf rust caused by Phakopsora pachyrhizi was investigated by Burdon & Marshall (1981). The results of their study showed qualitative and quantitative resistance to leaf rust, and considerable variation in a number of disease characteristics both between and within populations of each species.

Genetic diversity in soybean germplasm based on karyological traits
Genetic diversity based on genome size among and within plant species has been well documented in the literature (Rayburn, 1990;Bennett and Leitch, 1995;Rayburn et al., 1997). The variation was pronounced in Chinese germplasm collected from diverse geographic locations. It was attributed to the environmental factors (Knight and Ackerly, 2002), cell size, minimum generation time, cell division rate and growth rate (Edwards and Endrizzi, 1975;Bennett et al., 1983) and polypoid species, in species with large seeds, and habits type (Bennett et al., 1998;Chung et al., 1998). Reports of genome size variation in soybean [Glycine max (L.)] have ranged from 40 to 0% (Rayburn et al., 2004). This wide range is highly reproducible and has resulted in doubts of the existence of intra-specific DNA variation in soybean. Rayburn et al. (2004) determined genome size of 18 soybean lines, selected on the basis of diversity of origin, by flow cytometry. They found that genome size variation between these lines was at approximately 4%. This amount of DNA variation is lower than was originally reported (Doerschug et al., 1978;Yamamota and Nagato, 1984;Hammatt et al., 1991;Graham et al., 1994). Doerschug et al. (1978) is the first to determine genome size of soybean, upon examining 11 soybean lines, reporting over a 40% variation in nuclear DNA content. Graham et al. (1994) observed a 15% variation among soybean cultivars while Rayburn et al. (1997) reported a 12% variation among 90 Chinese soybean introductions. Chung et al. (1998) observed among 12 soybean strains a 4.6% DNA content variation. Yamamota and Nagato (1984) stated about 60% variation, while Hammatt et al. (1991) reported that the variation of genome size in 14 different Glycine species from different parts of the world was approximately 58%. These results indicated that the variability between DNA content was varied between the different scholars. The wide variation in genome size between soybean germplasm makes these accessions good candidates for crop improvement.

Evaluation of genetic diversity in soybean germplasm at the biochemical level
The genetic markers have made possible a more accurate evaluation of the genetic and environmental components of variation. The biochemical markers are ones of the interesting measures of genetic diversity. They include protein techniques and isozymes. The protein techniques are practical and reliable methods for cultivars and species identification because seed storage proteins are largely independent of environmental fluctuation (Sammour, 1992(Sammour, , 1999Camps et al., 1994;Jha and Ohri, 1996). They are less expensive as compared to DNA markers. SDS-PAGE is one of these techniques, widely used to describe seed protein diversity of crop germplasm (Sammour, 2007;Sammour et al., 2007). Genetic diversity and the pattern of variation in soybean germplasm have been evaluated with seed proteins (Hirata et al., 1999;Bushehri et al., 2000;Sihag et al., 2004;Malik et al., 2009). SDS-PAGE (Bushehri et al., 2000) and discontinuous polyacrylamide slab gel electrophoresis (Sharma and Maloo, 2009) were used very successfully in evaluating the genetic diversity and identifying soybean (Glycine max) cultivars. Malik et al., (2009) evaluated the genetic variation in 92 accessions of soybean collected from five different geographical regions using the electrophoretic patterns of seed proteins. The accessions from various sources differed considerably, indicating that there is no definite relationship between genetic diversity and geographic diversity. Similar results were reported by (Ghafoor et al., 2003). Based on the results of Ghafoor et al., (2003) and Malik et al., (2009), SDS-PAGE cannot be used for identification of various genotypes of wild soybean at the intra-specific level, because some of the accessions that differed on the basis of characterization and evaluation exhibited similar banding patterns. However, it might be used successfully to study inter rather than intra-specific variation (Sammour, 1989;Sammour et al., 1993;Karam et al., 1999;Ghafoor et al., 2002). 2-D electrophoresis can be used to characterize the genotypes exhibited similar banding patterns (Sammour, 1985). Allozyme markers have been used in soybean to evaluate genetic diversity in accessions from diverse geographic regions (Yeeh et al., 1996;Chung et al., 2006), wild soybean in natural populations from China, Japan and South Korea (Pei et al., 1996;Fujita et al., 1997), and Asian soybean populations (Hymowitz & Kaizuma, 1981;Hirata et al., 1999). From an analysis of the Kunitz trypsin inhibitor (Ti) and beta-amylase isozyme (Sp1 = Amy3), Hymowitz & Kaizuma (1981) defined seven soybean germplasm pools in Asia: (1) northeast China and the USSR, (2) central and south China, (3) Korea, (4) Japan, (5) Taiwan and south Asia, (6) north India and Nepal and (7) central India. Hirata et al. (1999) compared the genetic variation at 16 isozyme of 781 Japanese accessions with the genetic variations of 158 Korean and 94 Chinese accessions, detecting a number of region-specific alleles that discriminated Japanese from Chinese accessions. The presence of alleles specific to the Japanese population suggested that the present Japanese soybean population was not solely a subset of the Chinese population.

Evaluation of genetic diversity in soybean germplasm using molecular markers 6.1 Introduction
The soybean genome is consisting of around 1115 Mbp, much smaller than the genomes of maize and barley, but larger than the genomes of rice and Arabidopsis (Arumuganathan & Earle, 1991). Soybean is a tetraploid plant, evolved from a diploid ancestor (n=11), went aneuploid loss (n=10), followed by polyploidization (n=20) and diploidization (chromosome pairing behavior) (Hymowitz, 2004). As a result of polyploidization soybean has a significant percentage of internal duplicated regions distributed among its chromosomes (Pagel et al., 2004). Sequence diversity in cultivated soybean is relatively low compared to other species leading to a major challenge in the improvement of this important crop. To efficiently broaden the genetic base of modern soybean cultivars, we have a detailed insight into genetic diversity of soybean germplasm. Such insight could be achieved through molecular characterization using DNA markers, which are more informative, stable and reliable, compared to pedigree analysis and traditionally used morphological markers. The genetic markers include RFLP, RAPD, SSR and AFLP markers were used to probe the genetic differences between wild and cultivated soybeans or for the origin and dissemination of soybeans (Brown-Guedira et al., 2000;Tian et al., 2000;Li & Nelson, 2001;Xu & Zhao, 2002;Abe et al., 2003). These studies have revealed higher levels of genetic diversity in wild soybean.

RFLP (Restriction Fragment Length Polymorphism)
This analysis exploits variation in the occurrence of restriction sites in genomic sequences hybridizing to a cloned probe. Originally, RFLP analysis required Southern blotting and hybridization, making the method fairly slow and laborious. This technique is still used to generate ''anchor'' markers, used by many scholars to make consensus recombinational maps, though it is often implemented with the polymerase chain reaction (PCR) to generate the polymorphic fragments (Schulman, 2007). Chung et al. (2006) evaluated levels of genetic diversity in USDA soybean germplasm (107 accessions), originated from six provinces in central China, using RFLP analysis. They detected significant genetic differentiation among the six provinces (mean GST = 0.133). These results suggest that Chinese germplasm accessions from various regions or provinces in the USDA germplasm collection could be used to enhance the genetic diversity of US Cultivars.

AFLP (Amplified Fragment Length Polymorphism)
AFLP is an anonymous marker method, detects restriction sites by amplifying a subset of all the sites for a given enzyme pair in the genome by PCR between ligated adapters. To some extent, it like RFLP detects single nucleotide polymorphisms (SNPs) at restriction sites. Ude et al. (2003) analyzed the genetic diversity within and between Asian and North American soybean cultivars by AFLP. They found that the average genetic distance between the North American soybean cultivars and the Chinese cultivars was 8.5% and between the North American soybean cultivars and the Japanese cultivars was 8.9%, but the Chinese soybean was not completely separated from the Japanese soybean. They also revealed that Japanese cultivars may constitute a genetically distinct source of useful genes for yield improvement.

RAPD (Random Amplified Polymorphic DNA)
RAPD analysis uses conserved or general primers that amplify from many anonymous sites throughout the genome. It is indeed rapid, and need only short primers of random sequence, but suffers from low polymorphism information content (PIC), poor correlation with other marker data, and problems in reproducibility due to the low annealing temperatures in the reactions. The genetic diversity in the wild soybean populations from the Far East region of Russia was analyzed using RAPD markers (Seitova et al., 2004). The results obtained suggest that (1) genetically different groups of wild soybean have active development, (2) level of polymorphism was significantly higher than in the cultivated soybean and (3) geographically isolated subpopulations showed maximum distance from the main population of wild soybean. The high level of polymorphism between the wild and cultivated soybean accessions was also reported by Kanazawa et al. (1998) in their study on soybean accessions from the Far East using RAPD profiles of mitochondrial and chloroplast DNA. Xu & Gai (2003), Pham Thi Be Tu et al. (2003,  confirmed the results of Kanazawa et al. (1998) and Seitova et al. (2004) in terms of the high genetic variation between the wild and cultivated soybean accessions. They also found that the diversity of G. soja was higher than that of G. max; and environmental factors may play important roles in soybean evolution. Furthermore, they revealed that accessions within each species tend to form subclusters that are in agreement with their geographical origins, demonstrating that an extensive geographical genetic differentiation exists in both species. Consequently, it was indicated that geographical differentiation plays a key role in the genetic differentiation of both wild and cultivated soybeans. The relationship between geographical differentiation and genetic diversity appeared in the work of Chen & Nelson (2005) who identified significant genetic differences between soybean accessions collected from different provinces in China. Their data provided pronounced evidence that primitive cultivars of China were generally genetically isolated in relatively small geographical areas. Similar results were obtained by Li & Nelson (2001 in their study on soybean accessions from 8 provinces in China using a core set of RAPD primers with high polymorphism in soybean (Thompson et al., 1998). On the contrary, Brown-Guedira et al. (2000) did not find an association between origin and RAPD markers among soybean lines of more modern origin. It is likely that these genotypes have been dispersed by human intervention from the areas of actual origin. The relationship between genetic differentiation and origin of 120 soybean accessions from Japan, South Korea and China was evaluated with RAPDs (Li & Nelson, 2001). They found that the Japanese and South Korean populations were more similar to each other, whereas both were genetically distinct from the Chinese population, suggesting that the S. Korean and Japanese gene pools might be probably derived from a relatively few introductions from China.  compared the genetic diversity of ancestral cultivars of the N. American (18) as well as the Chinese soybean germplasm pools (32) using RAPD markers, the N. American ancestors have a slightly lower level of genetic diversity. Cluster analyses generally separated the two gene pools. In particular, a great genetic variability was detected between the ancestors of northern U.S. and Canadian soybeans and the Chinese ancestors. Chowdhury et al. (2002) examined the level of genetic similarity among forty-eight soybean cultivars imported out of their country Thailand using DNA (RAPD) markers. They found high level of genetic similarities between these cultivars. Cluster analysis of the obtained data classified the 48 cultivars into four groups at 0.57 similarity scale, even though the cultivars are morphologically or geographically very close. Comparing agronomic performance and RAPD analysis via dendrogram, a total of 11 cultivars can be useful to soybean breeders in Thailand who want to utilize genetically diverse introductions in soybean improvement. Baránek et al. (2002) evaluated the genetic diversity within 19 soybean genotypes included in the Czech National Collection of Soybean Genotypes by RAPD method. The polymorphism among the studied genotypes was 46%. Presented results enable the selection of genetically distinct individuals. Such information may be useful to breeders willing to use genetically diverse introductions in soybean improvement process.

SSRs (Simple sequence repeats)
SSRs molecular markers have been widely applied in the genetic diversity studies of the soybean germplasm (Abe et al., 2003;Wang et al., 2006;Fu et al., 2007;Li et al., 2008;Wang & www.intechopen.com Takahata, 2007;Wang et al., 2008;Yoon et al., 2009). The advantages of SSR over other types of molecular markers are that they are abundant, have a high level of polymorphism, are codominant, can be easily detected with PCR and typically have a known position in the genome. High levels of polymorphism at SSR loci have been reported for both the number of alleles per locus and the gene diversity (Diwan & Cregan, 1997;Abe et al., 2003;Wang et al., 2006;Fu et al., 2007 ;Wang et al., 2010). Wang et al. (2010) used 40 SSR primer pairs to study genetic variability in 40 soybean accessions of cultivars, landraces and wild soybeans collected from China. These results indicated that wild soybeans and landraces possessed greater allelic diversity than cultivars and might contain alleles not present in the cultivars which can strengthen further conservation and utilization. The UPGMA (Unweighted Pair Group Method with Arithmetic) results also exhibited that wild soybean was of more abundant genetic diversity than cultivars. A total of 2,758 accessions of Korean soybean landraces were profiled and evaluated for genetic structure using six SSR loci (Yoon et al., 2009). The accessions within collections were classified based on their traditional uses such as sauce soybean (SA), sprouted soybean (SP), soybean for cooking with rice (SCR), and others-three different Korean Glycine max collections and for groups distinguished by their usage, such as SA, SP, and SCR. Nei's average genetic diversity ranged from 0.68 to 0.70 across three collections, and 0.64 to 0.69 across the usage groups. The average between-group differentiation (Gst) was 0.9 among collections, and 4.1 among the usage groups. The similar average diversity among three collections implies that the genetic background of the three collections was quite similar or that there were a large number of duplicate accessions in three collections (Yoon et al., 2009). The selection from the four groups classified based upon usage may be a useful way to select accessions for developing a Korean soybean landrace core collection at the RDA gene bank. Hudcovicová et al. (2003) analyzed allelic profiles at 18 SSR loci of 67 soybean genotypes of various origins. Six only of SSR markers differentiated all 67 genotypes each from others successfully. Guan et al. (2010) investigated the genetic relationship between 205 Chinese soybean accessions that represent the seven different soybean ecotypes and 39 Japanese soybean accessions from various regions using 46 SSR loci. Cluster analysis with UPGMA separated the Chinese accessions from Japanese accessions, suggesting that soybean in these two countries form different gene pools. It also showed that (1) accessions from China have more genetic diversity than those from Japan, (2) studied germplasm was divided into three distinct groups, "corresponding to Japanese soybean, Northern China soybean, Southern China soybean and a mixed group in which most accessions were from central China", and (3) Japanese accessions had more close relationship with Chinese northeast spring and southern spring ecotypes. This study provides interesting insights into further utilization of Japanese soybean in Chinese soybean breeding. Abe et al. (2003) analyzed allelic profiles at 20 SSR loci of 131 accessions introduced from 14 Asian countries. UPGMA-cluster analysis clearly separated the Japanese from the Chinese accessions, suggesting that the Japanese and Chinese populations formed different germplasm pools; showed that Korean accessions were distributed in both germplasm pools, whereas most of the accessions from south/central and southeast Asia were derived from the Chinese pool; indicated that genetic diversity in the southeast and south/central Asian populations was relatively high; and exhibited the absence of region-specific clusters in the southeast and south/central Asian populations. The relatively high genetic diversity and the absence of region-specific clusters in the southeast and south/central Asian populations suggested that soybean in these areas has been introduced repeatedly and independently from the diverse Chinese germplasm pool. Therefore the two germplasm pools can be used as exotic genetic resources to enlarge the genetic bases of the respective Asian soybean populations. Chotiyarnwong et al. (2007) evaluated the genetic diversity of 160 Thai indigenous and recommended soybean varieties by examining the length polymorphism of alleles found in 18 SSR loci from different linkage groups. UPGMA-Cluster analysis and principal component analysis (PCA) separated Thai indigenous varieties from recommended soybean varieties. However, the genetic differentiation between the indigenous and recommended soybean varieties was small. Shi et al. (2010) performed genetic diversity and association analysis among 105 food-grade soybean genotypes using 65 simple sequence repeat (SSR) markers distributed on 20 soybean chromosomes. Based on the SSR marker data, the 105 soybean genotypes were divided into four clusters with six sub-groups. Thirteen SSR markers distributed on 11 chromosomes were identified to be significantly associated with oil content and 19 SSR markers distributed on 14 chromosomes with protein content. Twelve of the SSR markers were associated with both protein and oil QTL. A negative correlation was obtained between protein and oil content. Mimura et al. (2007) investigated SSR diversity in 130 vegetable soybean accessions including 107 from Japan, 10 from China and 12 from the United States. Eighteen of the 130 accessions were outliers, and the rest of the accessions were grouped into nine clusters. The majority of food-grade soybean cultivars were released from Japan and South Korea because of the market availability and demands. However, the genetic diversity of South Korea food-grade soybean remains unreported (Mimura et al., 2007). Nguyen et al. (2007) used 20 genomic SSR and 10 EST-SSR SSR to explore the genetic diversity in accessions of soybean from different regions of the world. The selection of the thirty SSR primer-pairs was based on their distribution on the 20 genetic linkage groups of soybean, on their trinucleotide repetition unit and on their polymorphism information content. All analyzed loci were polymorphic. A low correlation between SSR and EST-SSR data was observed, thus genomic SSR and EST-SSR markers are required for an appropriate analysis of genetic diversity in soybean. They observed high genetic diversity which allowed the formation of five groups and several subgroups. They also observed a moderate relationship between genetic divergence and geographic origin of accessions. Xie et al. (2005) analyzed genetic diversity of 158 Chinese summer soybean germplasm, from the primary core collection of G. max using 67 SSR loci. The Huanghuai and Southern summer germplasm were different in the specific alleles, allelic-frequencies and pairwise genetic similarities. UPGMA cluster analysis based on the similarity data clearly separated the Huanghuai from Southern summer soybean accessions, suggesting that they were different gene pools. The data indicated that Chinese Huanghuai and Southern summer soybean germplasm can be used to enlarge genetic basis for developing elite summer soybean cultivars by exchanging their germplasm. Most diversity studies on cultivated soybean published by now have focused on North American (Brown-Guedira et al., 2000;Narvel et al., 2000;Fu et al., 2007) Asian (Abe et al., 2003Xie et al., 2005;Wang et al., 2006;Li et al., 2008;Wang et al., 2008;Yoon et al., 2009) as well as South American (Bonato et al., 2006) soybean germplasm. In several studies only a few genotypes of European origin have been represented among germplasm studied www.intechopen.com (Brown-Guedira et al., 2000;Narvel et al., 2000;Fu et al., 2007;Hwang et al., 2008). Baranek et al. (2002) evaluated genetic diversity of 19 Glycine max accessions from the Czech National Collection using RAPD markers. Recently, Tavaud-Pirra et al. (2009) evaluated SSR diversity of 350 cultivated soybean genotypes including 185 accessions from INRA soybean collection originating from various European countries and 32 cultivars and recent breeding lines representing the genetic improvement of soybean in Western Europe from 1950 to 2000. They found the genetic diversity of European accessions to be comparable with those of the Asian accessions from the INRA collection, whereas the genetic diversity observed in European breeding lines was significantly lower. Breeding material and registered soybean cultivars in southeast European countries are strongly linked to Western breeding programs, primarily in the USA and Canada. There is little reliable information regarding the source of germplasm introduction, its pedigree and breeding schemes applied. Consequently, use of these genotypes in making crosses to develop further breeding cycles can result in an insufficient level of genetic variability. Assessing the genetic diversity of this germplasm at genomic DNA level would complement the knowledge on the European soybean gene pool (germplasm) and facilitate the utilization of the resources from southeastern Europe by soybean breeders. Ristova et al. (2010) therefore assess genetic diversity and relationships of 23 soybean genotypes representing several independent breeding sources from southeastern Europe and five plant introductions from Western Europe and Canada using 20 SSR markers. Cluster analysis clearly separated all genotypes from each other assigning them into three major clusters, which largely corresponded to their origin. Results of clustering were mainly in accordance with the known pedigrees.

EST (Expressed Sequence Tags )
The use of functional molecular markers, such as those developed from EST allows direct access to the population diversity in genes of agronomic interest that they represent coding sequences, facilitating the association between genotype and phenotype. Nelson and Shoemaker (2006) identified approximately 45,000 potential gene sequences (pHaps) from EST sequences of Williams/Williams 82, an inbred genotype of soybean (Glycine max L. Merr.) using a redundancy criterion to identify reproducible sequence differences between related genes within gene families. Analysis of these sequences revealed single base substitutions and single base indels are the most frequently observed form of sequence variation between genes within families in the dataset. Genomic sequencing of selected loci indicates that intron-like intervening sequences are numerous and are approximately 220 bp in length. Functional annotation of gene sequences indicates functional classifications are not randomly distributed among gene families containing few or many genes. The identification of potential gene sequences (pHaps) from soybean allows the scientist to get a picture of the genomic history of the organism as well as to observe the evolutionary fates of gene copies in this highly duplicated genome.

Concept
Exploitation of gene banks for efficient utilization depends on the knowledge of genetic diversity, in general, and allelic diversity at candidate gene(s) of interest, in particular. Hence, allele mining seems to be a promising in characterization of genetic diversity or allelic/genic diversity among the accessions of the collection in terms of its utility for www.intechopen.com improving a target trait (Kaur et al., 2008). The availability of sequence and sequence variation that affects the plant phenotype is of utmost importance for the utilization of genetic resources in crop improvement (Graner, 2006). The existing allelic diversity in any crop species is caused by mutations, the evolutionary driving force (Kumar et al., 2010). Mutations create new alleles or cause variations in the existing allele and allelic combinations. They take place in coding and non-coding regions of the genome either as single nucleotide polymorphism (SNP) or as insertion and deletion (InDel). As far it is known, there is no cited literatures on the effect of mutations on transcript synthesis and accumulation which in turn alter the trait expression in 5′ UTR including promoter, introns and 3′ UTR in the genome of soybean. In coding region, it may have tremendous effect on the phenotype by altering the encoded protein structure and/or function. For example, the AtAHASL protein encoded by csr1-2 differs from the native AtAHASL protein by one amino acid substitution of a serine with an asparagine at residue 653 (S653N) which results in tolerance to imidazolinone containing herbicides. Besides the altered herbicide binding, the protein retains its biological function in the plant. Soybean line CV127 is tolerant to herbicides that contain imidazolinone. The another example is the mutations in soybean microsomal omega-3 fatty acid desaturase genes which resulted in reduce of linolenic acid concentration in soybean seeds (Bilyeu et al., 2005). Alternatively, several studies suggested that many diseases resistant alleles like soybean aphid [Aphis glycines Matsumura (Hemiptera: Aphididae)] resistance like Rag1 from Germplasm collection (Kim et al., 2010), brown Stem Rot resistance like Rbs1 and Rbs3 from soybean lines L78-4049 and PI 437.833, and PI 84946-2 (Eathington et al., 1995;Klos et al., 2000), soybean cyst nematode (SCN) resistance genes like rhg1 and Rhg4 from soybean lines PI 88788, PI 437.654, Peking, PI90763 and PI209332, sudden death syndrome (SDS) resistance like Rfs1, rfs2, and rft from soybean lines PI 437654 ( Meksem et al., 2001).

Approaches
Two major approaches are available for the identification of sequence polymorphisms for a given gene in the naturally occurring populations: (1) modified Targeting Induced Local Lesions in Genomes (TILLING) procedure and (2) sequencing based allele mining.

TILLING approach
In the TILLING approach, the polymorphisms (more specifically point mutations) resulting from induced mutations in a target gene can be identified by heteroduplex analysis (Till et al. 2003). This technique represents a means to determine the extent of variation in mutations artificially induced. EcoTilling represents a means to determine the extent of natural variation in selected genes in the primary and secondary crop gene pools Henikoff, 2006 andKumar et al. 2010). Like TILLING, it also relies on the enzymatic cleavage of heteroduplexed DNA, formed due to single nucleotide mismatch in sequence between reference and test genotype, with a single strand specific nuclease under specific conditions followed by detection through Li-Cor genotypers. At point mutations, there will be a cleavage by the nuclease to produce two cleaved products whose sizes will be equal to the size of full length product. The presence, type and location of point mutation or SNP will be confirmed by sequencing the amplicon from the test genotype that carry the mutation.

Sequencing-based allele mining
This technique involves amplification of alleles in diverse genotypes through PCR followed by identification of nucleotide variation by DNA sequencing. Sequencing-based allele mining would help to analyze individuals for haplotype structure and diversity to infer genetic association studies in plants. Unlike EcoTilling, sequencing-based allele mining does not require much sophisticated equipment or involve tedious steps, but involves huge costs of sequencing. (Kumar et al., 2010)

Applications
Allele mining can be effectively and efficiently used for (1) discovery of superior alleles, through 'mining' the gene of interest from diverse genetic resources, (2) providing insight into molecular basis of novel trait variations and identifying the nucleotide sequence changes associated with superior alleles, (3) studying the rate of evolution of alleles; allelic similarity/dissimilarity at a candidate gene and allelic synteny with other members of the family, (4) paving way for molecular discrimination among related species through development of allele-specific molecular markers, and (5) facilitating introgression of novel alleles through Marker Assisted Selection (MAS) or deployment through Genetic Engineering (GE). Allele mining can also be potentially employed in the identification of nucleotide variation at a candidate gene associated with phenotypic variation for a trait. Through this, the frequency, type and the extent of occurrence of new haplotypes and the resulting phenotypic changes can be evaluated.

Challenges
The genetic resources collections, which are held collectively in various gene banks, harbour a wealth of undisclosed allelic variants. Now the challenge is how to efficiently identify and exploit the useful variation of these collections to exploit in crop improvement. The challenges stand as stampling block to make use of these collections are (1) selection of genotypes, (2) handling genomic resources, (3) demarcation of promoter region, (4) characterization of regulatory region, and (5) higher sequencing costs. The selection of germplasm to be 'mined' is one of the utmost challenges face the scholars because of the huge genetic resources collections. To overcome the aforementioned challenges, we must (1) narrow down the core collection to a manageable size while maintaining the variability, (2) refine phenotyping protocols to increase the efficiency of allele mining, (3) exploit the developments in allele mining, association genetics and comparative genomics by combining expertise from several disciplines, including molecular genetics, statistics and bioinformatics, (4) develop cheaper and faster sequencing platforms for high through put detection of allelic variations (5) develop flexible computational tools to manage genetic resources, select desirable alleles, analyze the functional nucleotide diversity to predict specific nucleotide changes responsible for altered function, accurately predict the core promoter region based on the representation/over-representation of consensus regulatory motifs, and get the snapshot of the regulatory elements which can be further examined through suitable experiments.

Conclusion
Soybean oil is used in many foods, industrial and fuel products. Whereas soybean meal is incorporated into animal feed. The variation in the quality and quantity of these products is basically dependent on the genetic diversity of soybean germplasm. The genetic diversity in soybean germplasm was evolved from the dispersion of the cultivated soybean domesticated by the Chinese farmers. Many factors are affecting the dispersion of soybean including regional adaptation and selection. Morphological, cellular, biochemical (proteins and isozymes) and molecular markers have been used on the wide scale for the study of the genetic diversity of the cultivated and wild relative of soybean. These analyses were carried out to meet wide rang of objectives from simply testing the usefulness of a particular marker system to identifying exotic germplasm accessions to expand the genetic diversity of the elite germplasm pool in order to permit genetic improvement for increased soybean yield. Exploitation of soybean germplasm for efficient utilization depends on the knowledge of genetic diversity, in general, and allelic diversity at candidate gene(s) of interest, in particular. The beneficial alleles from vast soybean genetic resources existing worldwide were derived from cultivated germplasm. However, a significant portion of these beneficial alleles were still resided in the wild soybean germplasm. Nowadays, considerable attention has focused on allele mining (gene polymorphisms) and their potential use to alter protein function in ways that might prove biologically important. But increasing numbers of polymorphisms are also being identified in the regulatory and non coden regions of genes. Therefore, allele mining is a promising approach to dissect naturally occurring allelic variation at candidate genes controlling key agronomic traits which has potential applications in crop improvement programs. Allele mining can be effectively used for discovery of superior alleles, through 'mining' the gene of interest from soybean germplasm. It can also provide insight into molecular basis of novel trait variations and identify the nucleotide sequence changes associated with superior alleles. In addition, the rate of evolution of alleles; allelic similarity/dissimilarity at a candidate gene and allelic synteny with other members of the family can also be studied. Allele mining may also pave way for molecular discrimination among related species within the genus Glycine, development of allele-specific molecular markers, facilitating introgression of novel alleles through Marker Assisted Selection or deployment through genetic engineering. The alleles mining approaches and the challenges associated with it are also discussed.