Biological Identifications Through DNA Barcodes

Although much biological research depends upon species diagnoses, taxonomic expertise is collapsing. We are convinced that the sole prospect for a sustainable identification capability lies in the construction of systems that employ DNA sequences as taxon ‘barcodes’. It was established previously that the mitochondrial gene cytochrome c oxidase I (COI) can serve as the core of a global bioidentification system for animals. A new tools were developed recently to be complementary markers for (COI) DNA barcoding.


Introduction
Although much biological research depends upon species diagnoses, taxonomic expertise is collapsing. We are convinced that the sole prospect for a sustainable identification capability lies in the construction of systems that employ DNA sequences as taxon 'barcodes'. It was established previously that the mitochondrial gene cytochrome c oxidase I (COI) can serve as the core of a global bio-identification system for animals. A new tools were developed recently to be complementary markers for (COI) DNA barcoding.
Species identification is essential in food quality control procedures or for the detection and identification of animal material in food samples. Recent food scares e.g. avian flu and swine flu, malpractices of some food producers and religious reasons have tremendously reinforced public awareness regarding the composition of food products. However, because labels do not provide sufficient guarantee about the true contents of a product, it is necessary to identify and/or authenticate the components of processed food, thus protecting both consumers and producers from illegal substitutions [1]. In addition, trade of endangered species has contributed to severe depletion of biodiversity.
Numerous analytical methods that rely on protein analysis have been developed for species identification, such as electrophoresis techniques [2], immunoassays [3] and liquid chromatography [4]. However, these methods are of limited use in species identification. The progress of molecular biology introduced a new approach, which is based on nucleotide sequence diversities among species in particular regions of DNA [5][6][7]. The nucleotide regions chosen for species identification were varied by researchers. Within vertebrates, a cytochrome b (cyt b) gene in the mitochondrial DNA has been studied from multiple viewpoints including the nucleotide diversity among species [6] and the availability of nucleotide sequence data for references [5]. Many of the other regions studied are also located in the mtDNA. The coding regions for 12S and 16S ribosomal RNA [8][9][10], and the noncoding D-loop region [7,11,12] have shown their potential to be the targets for the species test.
Although central to much biological research, the identification of species is often difficult. DNA sequencing, with key sequences serving as a pattern ''barcode'', has therefore been proposed as a technology that might expedite species identification [13].
DNA barcoding promises fast, accurate species identifications by focusing analysis on a short standardized segment of the genome [14]. Several studies have now established that sequence diversity in a 650-bp fragment of the mitochondrial gene cytochrome c oxidase I (cox1; also referred to as COI) provides strong species-level resolution for varied animal groups including birds [15], fishes [16] and Lepidoptera [17].
Besides the cox1 gene, other mitochondrial markers also have been widely sequenced across vertebrates for their utility in phylogenetic or to complement cox1 in DNA barcoding.
In amphibians the 16S ribosomal RNA gene (16S) has been suggested as a complementary DNA barcoding marker [18]. Another protein coding gene, cytochrome b, has also been suggested as a marker to determine species boundaries [19,20].
An attempt was made to present a phylogenetic systematic framework for an improved barcoder as well as a taxonomic framework for interweaving classical taxonomy with the goals of 'DNA barcoding' [21]. Another study showed that DNA arrays and DNA barcodes are valuable molecular methods for biodiversity monitoring programs [22]. In this chapter we introduce the use of specific fragments of mitochondrial ribosomal RNA from Egyptian buffalo to be used as a perfect barcode for identification of closely related species. Also, we will extend this study to include distantly species identification [23][24]. Our studies were also extended for chickens and small organisms like mites to be studied by both nuclear and mitochondrial markers. Identification of these mites is very important for biological control programs.
All these methods could be used for global bio-identification system or forensic science development.

DNA purification
Genomic DNA was extracted from peripheral blood of Egyptian buffalo's and chickens by using standard commercial Kit (Pure-gene Genomic DNA purification Kit) as recommended by the manufacturer (www.gentra.com). In case of mites, Genomic DNA was extracted using Capture Column kit method, total DNA was purified using generation DNA purification system.

Primers used for amplification of specific fragments from mites
Two target DNA fragments of the predatory mite, A. swirskii were PCR amplified and sequenced: a fragment in the central part of the mitochondrial cytochrome oxidase subunit I gene (COI) and the fragment of the nuclear ribosomal transcribed spacers (ITS) [25][26]. The COI primers were designed specifically for tetranychid mites. They were: 5'TGATTTTTTGGTCACCCAGAAG3' and 5'TACAGCTCCTATAGATAAAAC 3'.
The ITS region was amplified using the primers 5'AGAGGAAGTAAAAGTCGTAACAAG 3' for the 3' end of 18S rDNA and 5' ATATGCTTAAATTCAGGGGG 3' for the 5' end of the 28S.

16S primers
PCR amplification and direct sequencing With two universal primers (sense, 5'-GTGCAAAGGTAGCATAATCA-3' and antisense, 5'-TGTCCTGATCCAACATCGAG-3') directed toward conserved regions [24], the polymerase chain reaction was used to amplify homologous segments of mitochondrial 16S rRNA from four animal species belonging to family Bovidae, including river buffalo, cattle, sheep and goat.

The amplification reaction
The amplification reaction used for amplification of the D-loop fragment was also used (with little modifications in temperature cycling) in the other experiments according to the conditions of each experiment.
The amplification reaction was carried out in a 25 μl reaction mixture consisting of 1.25 unit Taq polymerase (DyNAzyme), 1X enzyme buffer (1X is 10 mM Tris-HCl, pH 8.8 at 25 0C, 1.5 mM MgCl2, 50 mM KCl and 0.1% Triton X-100) supplied by the manufacture, 1 μM of each forward and reverse primer, 0.2 mM dNTPs and 100 ng of DNA. The reaction mixture was overlaid with sterile mineral oil and was run in an MJ research PTC-100 Thermocycler. The temperature cycling was as follows: 30 cycles of 45 seconds at 94°C; 1 minute at 58°C and 1 minute at 71°C, followed by a final extension at 71°C for 5 minutes. All PCR amplifications included a negative control reaction which lacked template DNA. No product was seen in any negative control. Small quantities of the reaction products (5 μl each) were used for electrophoresis with an appropriate size marker on 1.5% agarose in 1X-Tris acetate buffer (TAE).
After electrophoresis the gels were stained with ethidium bromide and were examined with UV lamp at a wave length 312 nm to verify amplification of the chosen specific fragment. The PCR products were purified using QIAquick PCR purification kit (Qiagen, Inc.) and the resulting purified products were used in the subsequent sequencing reactions. Sequencing was performed on an Applied Biosystems 310 genetic analyzer (Applied Biosystem) using Big Dye terminator cycle sequencing ready reaction mixture according to manufacturer's instructions (Applied Biosystems).

Sequence analysis and multiple sequence alignment
Pairwise sequence alignments were carried out using NCBI-BLASTN 2.2.5 version & PSI BLAST. Multiple sequence alignments were done using the MUSCLE 3.6 software and CLUSTALW (1.82). Analysis, manipulation, conservation plots, positional entropy plot and conserved region analysis was done using the BIOEDIT package. Variable sites were extracted from the multiple sequence alignment using the MEGA 3.1 package [12].

Phylogenetic analysis
Phylogenetic model selection was done using the FINDMODEL server available from the HCV LANL database at (http://hcv.lanl.gov/ /content/hcv-db/findmodel/). A Bayesian phylogenetic tree was constructed by Markov chain Monte Carlo (MCMC) method as implemented in the MR BAYES 3.1 package using the Hasegawa-Kishino-Yano plus Gamma model HKY+G substitution model with an invariant four category gamma distribution among sites. A 50% consensus tree was generated and the analysis was repeated two times. Maximum parsimony tree was conducted using MEGA version 4, with 1000 bootstraps for reliability.

Biological Identifications Through DNA Barcodes 113
The mean overall, within group and between groups genetic distances were done using the MEGA 4.0 software [12].

Results
Our experience in the field of molecular identification or DNA barcoding through a series of published research papers are represented in this section Results with some illustrated figures and tables are represented here but the complete information could be obtained through obtaining the complete published papers from the publication section.
Shows the Positional entropy plot of the D-loop for the buffalo, and cow sequences The Bayesian phylogenetic trees of cow and buffalo sequences were constructed using MRBAYES software ( Figure 2) and Maximum parsimony tree using the Kimura twoparameter model and the closest neighbor interchange method of the MEGA 3.1 software package ( Figure 3). Table 1. Shows the Substitution events detected in complete D-loop sequences from multiple sequence alignments between cows and buffaloes.    Shows the PCR amplification of chicken mitochondrial D loop fragments while the phylogenetic tree constructed between the Egyptian and GenBank database chicken samples is represented in Figure 5. The Polymorphic sites and their positions are shown in Table 2.      According to the molecular analysis (Table 3) cccaccat accagt cat acc -a g -a g

Sample 4 EU924214 c c c a c c a t a c c a g t c a t a c c -a g -a g
Sample 2 EU924215 a a t a t c a t a -c a g t c a t a c c -a g -a g

N. swirskii EU310505 a a t -t g t ---t -t t t t -t -t t t t t a t
Sample 5 EU924216 Group 2 nat-tgt---tagatt-cct t a t g g t Sample 6 EU924217 n n n -t g t ---t a g a t a -c -t t a t -g t Table 3. The variable sites (a = Adenine, c = Cytosine, g = Guanine, t = Thymine, -= deletion and n = Not detected) detected in a fragment of nuclear ITS region of six samples of A. swirskii collected from citrus and grapes in the Nile delta of Egypt.   Considering multiple alignment results between homologous 16S rRNA sequences obtained from GenBank database with the reference sequence, it was shown that, the entire 16S rRNA fragment (422 bp. in size) contains more than 57 variable sites (from base no. 21 to base no. 323) inside the two conserved regions. The bases outside this variable region are completely conserved in the four species ( Figure 10 and Table 5). From these variable sites, 25 specific nucleotides were chosen (which gave clear significant results in both types of alignment comparisons (two and multiple alignment sequences programs) as a reference for identification of unknown species (from base no. 21 to base no. 308). It was also shown that the size of the amplified fragments were less by one nucleotide (421 bp) in case of goat and two nucleotides (420 bp) in case of both cattle and sheep.
Detection of specific variable sites between Egyptian buffalo 16S rRNA gene fragment and the other studied three species is shown to be a good marker for identification of the four studied species. The detected variable sites can be classified as represented in both Fig. 10 and Table 5.

DNA barcoding, genome evolution & phylogenetic trees
The ability of molecular trees to encompass both short and long periods of time is based on the observation that different genes evolve at different rates. The DNA specifying ribosomal RNA (rRNA) changes relatively slowly, so comparisons of DNA sequences in these genes are useful for investigating relationships between taxa that diverged hundreds of millions of years ago. Studies of the genes for rRNA have shown, for example, that fungi are more closely related to animals than to green plants-something that certainly could not have been deduced from morphological comparisons alone.
In contrast, the DNA in mitochondria (mtDNA) evolves relatively rapidly and can be used to investigate more recent evolutionary events.
The methodology used in DNA barcoding has been straightforward. Sequences of the barcoding region are obtained from various individuals. The resulting sequence data are then used to construct a phylogenetic tree using a distance-based 'neighbour-joining' method. In such a tree, similar, putatively related individuals are clustered together. The term 'DNA barcode' seems to imply that each species is characterized by a unique sequence, but there is of course considerable genetic variation within each species as well as between species. However, genetic distances between species are usually greater than those within species, so the phylogenetic tree is characterized by clusters of closely related individuals, and each cluster is assumed to represent a separate species.
An evolutionary tree (or phylogenetic tree) is a branching diagram that represents the evolutionary history of a group of organisms. For example, we might use morphological and genetic data to figure out a phylogenetic tree of animals. Such a tree can provide a huge amount of information. For any particular group of animals our tree could identify the ancestors and closest relatives of the group. If we traced the history of animals all the way back, we could use the tree to help us answer questions such as, What did the earliest animals look like? What features did they pass on to all their descendants?. Phylogenetic trees also have great practical value. The same techniques we use to reconstruct evolutionary history have been used in forensics, where phylogenetic trees have helped solve criminal cases, and epidemiology, where trees have been used to estimate when and where diseases such as AIDS originated.
Now that we can compare entire genomes, including our own, some interesting facts have emerged. As you may have heard, the genomes of humans and chimpanzees are strikingly similar. An even more remarkable fact is that homologous genes are widespread and can extend over huge evolutionary distances. While the genes of humans and mice are certainly not identical, 99% of them are detectably homologous. And 50% of human genes are homologous with those of yeast.
It is not a coincidence that DNA barcoding has developed in concert with genomics-based investigations.
DNA barcoding (a tool for rapid species identification based on DNA sequences) and genomics (which compares entire genome structure and expression) share an emphasis on large scale genetic data acquisition that offers new answers to questions previously beyond the reach of traditional disciplines. DNA barcodes consist of a standardized short sequence of DNA (400-800 bp) that in principle should be easily generated and characterized for all species on the planet (1). A massive on-line digital library of barcodes will serve as a standard to which the DNA barcode sequence of an unidentified sample from the forest, garden, or market can be matched. Similar to genomics, which has accelerated the process of recognizing novel genes and comparing gene function, DNA barcoding will allow users to efficiently recognize known species and speed the discovery of species yet to be found in nature. DNA barcoding aims to use the information of one or a few gene regions to identify all species of life, whereas genomics, the inverse of barcoding, describes in one (e.g., humans) or a few selected species the function and interactions across all genes.
To be practical as a DNA barcode a gene region must satisfy three criteria: (i) contain significant species-level genetic variability and divergence, (ii) possess conserved flanking sites for developing universal PCR primers for wide taxonomic application, and (iii) have a short sequence length so as to facilitate current capabilities of DNA extraction and amplification. A short DNA sequence of 600 bp in the mitochondrial gene for cytochrome c oxidase subunit 1 (CO1) has been accepted as a practical, standardized species-level barcode for animals (see www.barcoding.si.edu). The inability of CO1 to work as a barcode in plants set off a race among botanists to find a more appropriate marker. A number of candidate gene regions have been suggested as possible barcodes for plants, but none have been widely accepted by the taxonomic community. This lack of consensus is in part due to the limitations inherent in a plastid marker relative to plant CO1, and also because a quantitative context for selecting a gene region as a barcode for plants has not been offered. Several factors must be considered and weighted in selecting a plant DNA barcode: (i) universal PCR amplification, (ii) range of taxonomic diversity, (iii) power of species differentiation, and (iv) bioinformatics analysis and application.

Molecular genetics reveals evolutionary relationships
Evolution results from the accumulation of inherited changes in populations. Because DNA is the molecule of heredity, evolutionary changes must be reflected in changes in DNA.
Systematics have long known that comparing DNA within a group of species would be a powerful method for inferring evolutionary relationships, but for most of the history of systematics, direct access to genetic information was nothing more than a dream. Today, however, DNA sequencing-determining the sequence of nucleotides in segment of DNAis comparatively cheap, easy, and widely available. The polymerase chain reaction (PCR) allows systematics to easily accumulate large samples of DNA from organisms, and automated machinery makes sequence determination a comparatively simple task.

Direct benefits of DNA barcoding undoubtedly include
i. make the outputs of systematics available to the largest possible community of endusers by providing standardized and high-tech identification tools, e.g. for biomedicine (parasites and vectors), agriculture (pests), environmental assays and customs (trade in endangered species); ii. relieve the enormous burden of identifications from taxonomists, so they can focus on more pertinent duties such as delimiting taxa, resolving their relationships and discovering and describing new species; iii. pair up various life stages of the same species (e.g. seedlings, larvae); iv. provide a bio-literacy tool for the general public.
Perhaps another advantage of DNA barcoding is that it will also facilitate basic biodiversity inventories. Indeed, from the premises of molecular phylogenetics to assembling the tree of life, DNA sequences in environmental sampling and reconstruction of phylogenetic trees to place sequences into an evolutionary context have been used in several inventories of cryptic biodiversity (e.g. soil bacteria or marine/freshwater micro-organisms).
New 'Genetic Bar Code' Technique Establishes Ability to Derive DNA Information from RNA Science Daily (Apr. 8, 2012) -Researchers from Mount Sinai School of Medicine have developed a method to derive enough DNA information from non-DNA sources --such as RNA --to clearly identify individuals whose biological data are stored in massive research repositories. The approach may raise questions regarding the ability to protect individual identity when high-dimensional data are collected for research purposes.
A paper introducing the technique appears in the April 8 online edition of Nature Genetics.
DNA contains the genetic instructions used in the development and functioning of every living cell. RNA acts as a messenger that relays genetic information in the cell so that the great majority of processes needed for tissue to function properly can be carried out.
To date, access to databases with DNA information has been restricted and protected as it has long been considered the sole genetic fingerprint for every individual. However, vast amounts of RNA data have been made publicly available via a number of databases in the United States and Europe. These databases contain thousands of genomic studies from around the world.
In this study, authors developed a technique whereby a person's DNA could be inferred from RNA data using gene-expression levels monitored in any of a number of tissues. In contrast, most studies involving DNA and RNA begin with DNA sequences and then seek to associate expression patterns with changes in DNA between individuals in a population. This is the first time going from RNA levels to DNA sequence has been described.
"By observing RNA levels in a given tissue, we can infer a genotypic barcode that uniquely tags an individual in ways that enables matching the individual to an independently derived DNA sample,". Not only can genotypic barcodes be deduced from RNA, but RNA levels in some tissue can inform not only individual characteristics like age and sex, but on diseases such as Alzheimer's and cancer, as well as the risks of developing those diseases."