An Evolutionary Biology Approach to Understanding Neurological Disorders

Many common human neurological disorders, including epilepsy, Alzheimer’s disease, Parkinson’s disease, autism spectrum disorders, and schizophrenia show complex heritability and genetics. While studies of single-gene diseases typically provide a more straightforward opportunity to understand the underlying molecular mechanisms of disease, complex diseases are more common and inherently more difficult to study. Nonetheless, researchers have begun to make dramatic inroads into the study of complex human diseases, including many neurological disorders, in the post-human genome sequence era. This is largely due to new technologies and resources that are promoting our understanding of protein structure and function, thereby facilitating the association of disease phenotypes with genetic loci. The online Mendelian inheritance in man (OMIM) database lists those genes implicated in human disease, and this highlights progress made in this field, where around 10% of human genes have a known disease-association (Amberger et al., 2009).


Introduction
Many common human neurological disorders, including epilepsy, Alzheimer's disease, Parkinson's disease, autism spectrum disorders, and schizophrenia show complex heritability and genetics. While studies of single-gene diseases typically provide a more straightforward opportunity to understand the underlying molecular mechanisms of disease, complex diseases are more common and inherently more difficult to study. Nonetheless, researchers have begun to make dramatic inroads into the study of complex human diseases, including many neurological disorders, in the post-human genome sequence era. This is largely due to new technologies and resources that are promoting our understanding of protein structure and function, thereby facilitating the association of disease phenotypes with genetic loci. The online Mendelian inheritance in man (OMIM) database lists those genes implicated in human disease, and this highlights progress made in this field, where around 10% of human genes have a known disease-association (Amberger et al., 2009).
In the first few sections of this paper, we highlight the differences and similarities between simple and complex human genetic disorders, and key methods to study these disorders. We emphasize the key role comparative and evolutionary biology techniques play in increasing our understanding of the pathophysiology of complex human disorders, including in the assessment of the functional traits of gene products implicated in human disease. Several human neurological disorders are used to illustrate the power of this methodology. In the last sections of this paper, the significance and implications of comparative and evolutionary biology data are highlighted using schizophrenia, and autism as specific examples. The surprising recent links between neurological disorders and cancer are discussed in the final section. We conclude that exploration of the evolutionary history of human genes, and comparison of protein structure, helps us understand how and why human neurological disorders originated, influences the choice of appropriate animal Another factor complicating the phenotype of Mendelian disorders is the finding that heterozygotes for some recessively inherited Mendelian disorders, whom show no symptoms of the homozygotic phenotype, are at risk of an apparently unrelated disorder (Sidransky, 2006;Sriram et al., 2005). For example, patients who are heterozygous for the gene deficient in Gaucher disease are at an increased risk of neurodegenerative synucleinopathies, such as Parkinson disease (Sidransky, 2006). An additional complication arises in patients who show clinical symptoms consistent with a single-gene defect in a metabolic pathway, but do not have a complete deficiency in any one enzyme, but rather have multiple partial defects. This phenomenon is referred to as synergistic heterozygosity (Vockley et al., 2000).
Finally, while some genetic disorders are largely polygenic and complex in nature, a subset is inherited in a classical Mendelian manner (see Fig. 1). For example, with Alzheimer disease (ALZ) and Parkinson disease (PKD), a subset of the diseases (prefixed by the term 'familial') are inherited in a Mendelian manner. With PKD, around 5% of cases are due to mutations in one of several specific genes with either autosomal dominant or recessive inheritance patterns (Gasser, 2009;Lesage & Brice, 2009;Shulman et al., 2011), but PKDassociated genes with a more modest penetrance are now beginning to be identified (International Parkinson Disease Genomics Consortium, 2011;Liu et al., 2011;Shulman et al., 2011). With ALZ, around 0.1% of cases are inherited in an autosomal dominant manner, while one APOE allele, present in 2% of Caucasian populations, has recently been www.intechopen.com Protein Structure 6 reclassified from 'risk gene' status to being considered moderately penetrant with semidominant inheritance (Blennow et al. 2006;Genin et al., 2011). Nonetheless, ALZ in most patients is influenced by a combination of multiple genetic risk factors and protective alleles (Sherva & Farrer, 201;Waring & Rosenberg, 2008). Therefore, Mendelian disorders have more in common with multi-factorial diseases than originally thought, and both are affected by genetic background and environmental conditions. Furthermore, the rare Mendelian forms of common complex disorders are providing key insights about the pathogenesis of many complex diseases by highlighting cellular pathways perturbed in the disease state (discussed further in Section 4.4) and this is leading to testable hypothese about disease etiology (Peltonen et al., 2006). Complex genetic disorders are discussed next, emphasizing the importance of evolutionary and comparative biology, while the relevance of these areas of research to multifunctional genes will be discussed further in Section 4.

Complex genetic disorders
While most Mendelian disorders are rare, there are over 7000 such disorders, and so they collectively affect hundreds of millions of people worldwide (Amberger et al., 2009). By contrast, most of the common disorders of children and adults are complex diseases, and a single highly-penetrant gene is not causative of the disease phenotype (see Fig. 1). Indeed, the causes of such disorders are usually heterogeneous, and a combination of effects from more than one gene, combined with non-genetic factors (environment), play a role in disease development (Davey Smith et al., 2005). Such disorders in children include mental retardation, autism spectrum disorders, attention deficit/hyperactivity disorder, and cancer. In adults, common complex disorders include schizophrenia, bipolar disorder, diabetes, coronary heart disease, hypertension, obesity, and cancer. The complex, multigenic, nature of these diseases has made them inherently more difficult to study. However, in the next 7 section of this paper, we will discuss the key methods used to determine the genetic underpinnings of common complex disorders. Understanding the etiology of these multifactorial diseases is essential for the development of effective means of treatment and/or prevention.

Studying complex human genetic disorders
Complex disorders often cluster in families without clearly demonstrating Mendelian inheritance patterns. This makes it difficult to determine the genetic versus non-genetic contribution to the disease phenotype, and to calculate the heritable component of the disorder. Below we will discuss methods used to establish the heritability of human complex disorders, generation of the genetic variation that underpins these disorders, and discuss how to establish which genes are responsible for complex human disease.

Heritability of complex human diseases
Heritability is usually defined as the proportion of total phenotypic variation that can be attributed to genetic variability (Lee et al., 2011;Visscher et al., 2008). While the interaction of environment on phenotype makes heritability difficult to measure accurately in some cases (Ober & Vercelli, 2011), methods of obtaining unbiased estimates of heritability from various types of pedigree data are well established for both continuous phenotypes and complex human disorders (Lee et al., 2011;Visscher et al., 2008). Furthermore, animal models are invaluable in dissecting aspects of genetic and environmental interactions that are more difficult to assess in human studies (Complex Trait Consortium, 2004) and is discussed further in Section 4.5.
For many human diseases, recent data suggest that the heritable component of many has previously been underestimated (Lee et al., 2011). This has been due to limitations of the methodology employed, as well as other factors, such as evidence demonstrating monozygotic twins are less genetically similar than once thought (Zwijnenburg et al., 2010). A further example is that of PKD, which was long considered a non-hereditary disorder (Shulman et al., 2011;Westerlund et al., 2010). Despite extensive efforts to find environmental risk factors for the disease, genetic variants now stand out as the major causative factor (Shulman et al., 2011;Westerlund et al., 2010;Wirdefeldt et al., 2011). This shift of focus away from environmental toxins, towards genetic contributions, is now leading to rapid progress in understanding PKD and in guiding the development of the next generation of therapeutics (Shulman et al., 2011).
While it is clear that genetics underpins the pathophysiology of complex human disorders, the genetic alleles contributing to the disease phenotype are not always inherited. De novo mutations are increasingly being implicated in human disease and, by definition, these mutations are not present in the biological parents of the affected individual. Nonetheless, depending on the severity of the phenotype, and on any effects on fitness, these novel mutations may be transmitted to subsequent generations. Indeed, the rise of techniques such as intracytoplasmic sperm injection (ICSI), can facilitate transmission of de novo mutations even if they lead to infertility (Jiang et al., 1999). There is a wide variety in de novo human germline mutations, and these can include duplications or deletions of various size, as well as alterations in the number of chromosomes (Arnheim & Calabrese, 2009). The frequency of www.intechopen.com Protein Structure 8 de novo mutations in germ-line cells increases with parental age, and can also be caused by environmental factors such as radiation exposure (Sasaki, 2006), genotoxic chemicals (Phillips & Arlt, 2009), or congenital viral infections (Ansari & Mason, 1977;Fortunato & Spector, 2003;Nusbacher et al., 1967;Vijaya-Lakshmi et al., 1999).
While de novo mutations typically occur in gametes, and are often defined as such, new mutations can also occur in the precursors of germ cells, leading to germline mosaicism (Arnheim & Calabrese, 2009), or occur post-fertilization, during embryonic/foetal development and somatic cells (Lupski, 2010). Indeed, new mutations can develop at any time, with cancer being the best-known example of a genetic disorder caused by somatic cell mutagenesis, while Proteus syndrome is a recently-identified disease linked to somatic mosaicism (Lindhurst et al., 2011). While mutations in the genome of somatic cells cannot be passed on to future generations, they may have detrimental effects. While de novo mutations are best-studied in cancer (see Section 6.2.3), a contributing role of somatic de novo mutations, such as those occurring during brain development, to neurological disorders (see Section 6.1.2) has not been explored.
Mutation frequencies also vary widely across the genome, and often concentrate at certain positions or 'hotspots' (see also Sections 3.2, 6.1.2 & 6.2.3), which have structural and functional features affecting mutagenesis (Ananda et al., 2011;Arnheim & Calabrese, 2009;Carvalho et al., 2010;Rogozin & Pavlov, 2003). For example, CpG context elevates the mutation rate by an order of magnitude (Schmidt et al., 2008), non-B DNA structures induced by palindromic AT-rich repeats facilitate recurrent translocations on chromosomes 11 and 22 at positions 11q23 and 22q11 (Kurahashi et al., 2006), while interspersed repetitive elements such as Alu, LINE, long-terminal repeats, and simple tandem repeats are frequently observed at breakpoints in the 9q34.3 subtelomere region (Yatsenko et al., 2009). However, at any given point, multiple mechanisms are acting, making prediction of mutational site and frequency difficult (Arnheim & Calabrese, 2009).
Despite de novo mutations most typically being deleterious, rather than neutral or advantageous, their very existence is evidence for ongoing adaptive evolution. Therefore, genetic disorders can be considered 'side-effects' or manifestations of the fundamental mechanisms that provide the genetic variation necessary for evolution to occur. The interaction between the evolutionary past of the human genome and human genetic disease is discussed next.

Human evolution and genetic disorders
Duplicated regions of DNA play a key role in the evolution of novel gene functions (Conant & Wolfe, 2008;Lynch & Conery, 2000;Ohno, 1970), but are also a source of genetic instability, leading to mutations implicated in both rare and common human genetic disorders (Marques-Bonet & Eichler, 2009). It is therefore relevant to explore the origins of human duplicated sequences. Current evidence indicates that many segmental duplications occurred the hominid lineage and, more specifically, in the common ancestor of African great apes (chimpanzee, gorilla, humans) after divergence from the Ponginae or Asian great ape (orangutan) lineage (Bailey & Eichler, 2006;Carvalho et al., 2010;Koszul & Fischer, 2009;Marques-Bonet & Eichler, 2009). A considerable portion of duplicated human sequences have also been found to correspond to expanded gene families, some of which show 9 signatures of positive selection (Marques-Bonet & Eichler, 2009). Duplications specific to the Homo sapiens lineage have also been detected, and include duplications in gene families implicated in neurotransmission, and these may play a role in higher-order brain function in humans (Han et al., 2009). Different classes of repetitive DNA sequence have been identified (Bao & Eddy, 2002). Some, such as LINES (e.g. L1 family) or SINES (e.g. Alu family), are long and short retrotransposable elements, found interspersed throughout the genome. Others are concentrated in certain regions, such as centromeres and telomere-adjacent sequences. These latter regions are also sites of increased genomic instability, which are associated with disease-causing chromosomal breakpoints (Stankiewicz & Lupski, 2002). Alu elements have propagated to more than one million copies in primate genomes, and likewise contribute to human genomic diversity (Batzer & Deininger, 2002). Indeed, one in 50 individuals will carry a de novo L1 insertion, and one in 20 individuals a de novo Alu insertion (Collier & Largaespada, 2007). The active nature of many human retrotransposons is therefore linked to disease-causing somatic and germline mutations (Collier & Largaespada, 2007;Wallace et al., 1991;Oldridge et al., 1999;Claverie-Martin et al., 2003). Repeats may also contribute to DNA secondary structures that are more prone to breakage (Yatsenko et al., 2009). One novel aspect of our increased understanding of the role of repetitive DNA sequence in de novo mutations, and our ability to detect such sequences, is that this information can now be used to predict rearrangements that will contribute to genomic disorders (Carvalho et al., 2010;Ou et al., 2011;Sharp et al., 2006). Therefore, while duplicated sequences in primate genomes predispose apes and humans to extensive genetic diversity and biological innovation, the downside is that many de novo genomic changes are mediated by recombination events between these duplications. This characteristic of hominids, and Homo sapiens in particular, makes humans particularly susceptible to genomic rearrangements. These rearrangements, in turn, then play a major role in human genetic disease pathogenesis (Inoue & Lupski, 2002;Marques-Bonet & Eichler, 2009). The evolutionary history of some specific genomic rearrangements is discussed next.

Evolutionary history of specific human disease mutations
Using an evolutionary perspective, we can use comparative genomic analyses to calculate the age of appearance of segmental duplications mediating specific disease-causing mutations. Such analyses have revealed that the segmental duplication flanking the Charcot-Marie-Tooth disease region on chromosome 17 (at position 17p12) has an origin in the hominoid ancestor after the divergence of chimpanzees and humans, those flanking the DiGeorge syndrome region on chromosome 22 (22q11.2) expanded after the divergence of hominoids from Old World monkeys, the duplications flanking the Angelman/Prader-Willi region on chromosome 15 (15q11-q13) began to expand before the divergence of the Old World monkeys, while the Smith-Magenis syndrome segmental duplications (17p11.2) date back to after the divergence of New World monkeys (Marques-Bonet & Eichler, 2009). These, and other similar data, have demonstrated that the predisposing genomic features contributing to many genomic disorders have emerged within the last 25 million years (Marques-Bonet & Eichler, 2009).
Using similar methodology, the evolutionary history of human genetic disorders where Alu elements are implicated, have also been dated. For example, Alu elements mediating lipoprotein lipase deficiency (LPL gene) are found in human, ape, and monkey genomes; those Alu-elements implicated in Lesch-Nyhan syndrome mutations (HPRT gene) are common to human, chimpanzee, and gorilla; while those implicated in ApoB deficiency are restricted to human and great ape genomes (Martinez et al., 2001). Of note, while Alu elements acted as 'selfish DNA' when they first inserted into primate genomes, many have subsequently gained regulatory function, a process known as exaptation (Hasler & Strub, 2006). Therefore, despite contributing to the pathogenesis of human genetic disorders, Alu elements are thought to have played a role in the divergence of primates, and to have contributed to the regulatory and developmental complexity in primate lineages (Hasler & Strub, 2006). Therefore, Alu-depending human genetic diseases also date to primate lineages. Methods used to predict the different phylogenetic ages of genetic disorders instigated by DNA repeats and duplications can therefore being used to both predict and explain the different susceptibility of various primate species to genetic diseases (Martinez et al., 2001;Marques-Bonet & Eichler, 2009).

Identifying and characterizing genetic determinants of human disease
In the last few decades there has been rapid progress in human disease gene identification, due to recombinant DNA technologies, genome sequencing and analysis methods (Strachan & Read, 2010). With the vast range of resources now available, identification of novel disease genes is currently occurring on a weekly, if not daily, basis. There is no standard procedure for gene identification, however, and identifying the genes responsible for human disease requires information about both gene position and biological function. Functional data is proving a bottleneck for progress in understanding complex diseases in particular and, as outlined below, our understanding of evolutionary biology is of great benefit to studies aimed at identifying and characterizing the genetic determinants of human disease phenotypes.
Genetic linkage-and association-based analyses have been very successful in identifying rare genetic variants with highly penetrant effects, such as those causing Mendelian diseases, and have also been used to investigate the genetics of complex disorders (Altshuler et al., 2008;Jordi, 2000;Ku et al., 2011). More recently, techniques such as genotyping arrays and next-generation DNA sequencing are facilitating the identification of mutations causative of the many as-yet-uncharacterised Mendelian disorders, and of the genetic variation contributing to complex genetic disorders (Kingsley, 2011;Kuhlenbäumer et al., 2011;Roberts et al., 2010).
Two key issues have been emerging from these recent genetic studies. The first is the finding that genome-wide association studies (GWAS) have not been particularly effective in identifying complex genetic disorder risk genes, and the numbers and impact of identified genetic risk factors has been 'disappointing' (Davey Smith et al., 2005;Manolio et al., 2010;Gandhi & Wood, 2010). If sample numbers are sufficiently great, however, GWAS may be better placed to unambiguously identify risk or protective loci for complex diseases (Sullivan, 2001;Wray et al., 2008). Secondly, there are great ongoing difficulties in differentiating disease-causing mutations from rare benign variants (Kuhlenbäumer et al., 2011). These difficulties not only highlight the need for greater 'power' in GWAS, but also the need for follow-through on genetic findings and the application of many aspects of what is referred to as 'integrative genomics' (Giallourakis et al., 2005). Below, we outline some of the varied ways in which our understanding of evolutionary biology is of vital importance to leveraging information obtained from genetic studies. The resulting information can provide key insights into the biological function of uncharacterised genes, can be used to predict candidate disease genes, to predict detrimental mutations, and can provide valuable information about biological pathways altered in different disease states, which may lead to novel therapeutics.

Validating candidate disease genes
GWAS and linkage studies generate large sets of potential disease genes. However, it remains difficult to identify the most likely disease-related genes. Various computational methods for disease gene identification have been described (Oti et al., 2011;Tiffin et al., 2006), and many of these have as their basis data from the field of evolutionary biology. Many software tools apply some of this type of information to genetic datasets, and can be used to determined which genes are the most likely to be involved in the disease in question. The Gene Prioritization Portal website provides an up-to-date summary of webbased candidate disease gene prioritization and prediction tools (Tranchevent et al., 2010). However, most gene prediction tools were designed to study Mendelian disorders, and not for the analysis of complex genetic disorders. The exceptions are the web-based tool, CANDID, specifically designed to prioritize genes implicated in complex human genetic traits (Hutz et al., 2008), and CAESAR, which is not web-based (Gaulton et al., 2007). Tools applicable to complex disorders can be expected to expand over the coming years, due to the large amount of data that will be generated using new genome analysis methods.
A major drawback for the prediction and prioritization of disease genes is that most candidate gene identification tools are reliant on how well-characterized each human gene is, and whether its molecular and cellular function are known. Bear in mind, then, that: (i) over 98% of all gene ontology (GO) annotations are computationally inferred, have not been curated, and are considered by the GO consortium to be potentially unreliable (du Plessis L et al., 2011); (ii) errors in the sequence databases affect at least 1 in 6 sequences (Lagerstrom et al., 2006;Slater & Bishop, 2006;Haitina et al., 2009;Bishop, unpublished data), and this may affect the output from tools such as PROSPECTR (Adie et al., 2005) that use sequence features to rank genes in order of their likelihood of involvement in disease; (iii) while highthroughput protein-protein interaction detection studies have great potential for increasing our understanding of complex genetic disorders, the paucity and unreliability of available data currently limit the power of these approaches (Chen et al., 2008;Chua & Wong, 2008;; and (iv) many domains in proteins are of unknown function. Tools, such as SUSPECTS (Adie et al., 2006), which rely on detecting shared domains, annotation, and patterns of expression, are clearly limited by the incompleteness and inaccuracy of these data. Therefore the gaps in our knowledge about gene product function are greatly hampering our understanding of complex genetic disease. Computational techniques, many anchored by evolutionary understanding, are helping direct and accelerate our understanding of the biological function of human genes, and the effect of human gene mutation and variation. These techniques are discussed next.

Gene age-based candidate gene prioritization
The first systematic study comparing the sequence characteristics of human disease genes (listed in OMIM) with genes not known to be involved in disease, found a subset of sequence-based features to be significantly different between the two sets of genes (Adie et al., 2005). These researchers created a web-based tool called PROSPECTR based on those features, which enriches lists for disease genes (Adie et al., 2005). Relevant DNA characteristics, more common in disease genes, include larger gene length and the presence of a mouse homologue (Adie et al., 2005). However, one limitation of this, and other, disease gene prediction algorithms is that they are developed on the basis of known disease genes, and many disease genes remain unidentified (reviewed by Ropers, 2007).
While PROSPECTR examines whether murid orthologues of a gene exist (Adie et al., 2005), the evolutionary history of human 'disease genes' has subsequently been explored in greater depth. A comprehensive study on the evolutionary 'age' of genes mutated in human diseases compared to those not implicated, revealed that human disease genes are more likely to be 'old' genes (Domazet-Loso & Tautz, 2008). These so-called 'old genes' are classified on the basis that they have orthologues in urochordates and/or more ancientlydiverging lineages, and contrast with 'recent genes' where orthologues are restricted to chordate lineages. The over-representation of human disease genes among old genes is even more pronounced among those genes with tissue-specific expression profiles (Nagaraj et al., 2010). The evolutionary history of Mendelian disease genes, compared to genes implicated in complex disease, has also been examined, and more recently-evolving genes also divided into 'middle-aged' and 'young' gene categories (Cai et al., 2009). This study found that Mendelian disease genes tend to be older than non-disease genes, while complex disease genes are typically middle-aged (Cai et al., 2009). Therefore, despite not being evolutionarily ancient, most complex disease genes originated during the emergence of vertebrates, and are not human-or primate-specific.

Determining human disease gene function
The function of many human gene products is unknown or very poorly understood, and this greatly hampers progress in all studies on complex human genetic disorders. Below we discuss key methods used to predict and understand gene function, where a thorough understanding of evolutionary biology is imperative.

Sequence-based approaches to predicting gene function
The main method for predicting the function of a gene product in the absence of experimental data is termed 'homology-based transfer' (Friedberg, 2006;Sleator & Walsh, 2010). This approach is based on the detection of significant amino acid sequence similarity to a protein(s) of known function using programmes such as BLAST (Altschul et al., 1997). As sequence similarity suggests a common evolutionary origin, the function of the known protein is then transferred to the query protein. This method is not foolproof, however, and exceptions have been described at both ends of the similarity scale (reviewed by Sleator & Walsh, 2010). Understanding which residues are essential for protein function can be important in evaluating the relevance of similarities detected between proteins, such as those in conserved motifs. Furthermore, there are many proteins where homology-based 13 prediction cannot be used. Therefore, more recently, non-homology based computational approaches have begun to emerge (reviewed by Sleator & Walsh, 2010). These methods are based on a combination of sequence, structural prediction methods, evolutionary history, biochemical properties, and genetic and genomic knowledge.

Predicting the effect of mutation on gene function
A major problem in the search for disease-causing mutations is the fact that some of them are difficult to recognize. Tools to evaluate the functional impact of mis-sense mutations is limited by the few solved protein tertiary structures, and on the limitations of software to predict effects of mutations on protein domain function and/or protein conformation, much of which incorporates evolutionary conservation data (reviewed by Ropers, 2007). Complicating matters further, even silent mutations have been found to be pathogenically relevant (Kimchi-Sarfaty et al., 2006;Pagani et al., 2005). Furthermore, non-coding mutations may not be examined or detected and, even if they are, it is currently even more challenging to predict whether these have functional effects. For example, intronic changes may alter the splicing pattern (Richards et al,. 2007;Lenski et al., 2007) and promoter mutations may affect gene expression levels (Almeida et al., 2006;Borck et al., 2006). Unlike the situation found with Mendelian disease loci, an increasing proportion of the loci being associated with complex disorders are being found outside protein-coding regions of the genome (Pomerantz et al., 2010). The thousand genomes project will sequence the complete genome sequence of more than 1,000 humans, and will provide valuable information about variants normally present in the human population (Marth et al., 2011).
Genetic mutations implicated in human disease are often mis-sense/nonsense mutations, or involve small duplications/deletions in coding regions. However, recent sequencing of multiple human exomes (coding regions of the genome) suggest that these types of mutation are actually quite common and such changes are frequently benign (Ng et al., 2008;Ng et al,. 2009). One way of distinguishing disease-causing mutations, at least in Mendelian disease genes, is because they occur more frequently in evolutionarily wellconserved amino acid residues, than in non-conserved ones, and these changes are expected to have a more severe impact on the function of the resulting protein. By contrast, the distribution of mutations, such as non-synonymous single nucleotide polymorphisms, (nsSNPs or cSNPs) contributing to complex human diseases, are often difficult to distinguish from the distribution for "normal" human variation (Thomas & Kejariwal, 2004). These results indicate that individual SNPs implicated in complex genetic disorders will have more subtle effects on function in isolation. This observation further suggests a disease architecture involving the concerted contribution of multiple genetic loci, each with a small individual effect. Indeed, this explanation is suggested to be the basis of a large proportion of common complex disorders, and is known as the common-variant common-disease (CVCD) model of complex genetic disease (Grady et al., 2003;Visscher et al., 2011). However, as discussed in Section 2.1 above, rare genetic changes can also be causative of a subset of common disorders, and is described as the RVCD (rare-variant common-disease) model. However, there is no cut-off point between these two models (see Fig. 1), and there is a broad variety in both the frequency of disease gene variation and in the penetrance of a given genetic change, which together combine to cause a given disease phenotype (Grady et al., 2003;Visscher et al., 2011).
Computational prediction of the effects of genetic variation utilizes evolutionary conservation of the resultant gene product, in combination with predictions of the changes to the physicochemical properties (Mooney, 2005;Ng & Henikoff, 2006;Ng et al., 2008;Tarpey et al., 2009). Computer-based tools are also used to predict conserved domains and motifs, and can be used to determine whether nsSNPs or other genetic changes are likely to contribute affect protein function. Examples of such databases include PROSITE (Hulo et al., 2008), BLOCKS (Henikoff et al., 2000), and PRINTS (Attwood et al., 2003). Variation in the sequence of orthologues with conserved function can also be used to indicate the amino acid variation possible in a domain, or motif, which is still predicted to maintain some degree of functionality.
Nonetheless, while sequence-based approaches provide a good basis for predicting the function of genes of unknown function, in many cases there may be little or no sequence similarity between an unknown gene product and any characterized gene product. Fortunately, due to evolutionary constraints, there is often still significant structural similarity between an uncharacterized protein and a characterized protein, and this can be a useful indicator of function (Shatsky et al., 2008;Todd et al., 2001;Watson et al., 2005). Therefore, recent developments are aimed at combining both sequence and structural information to increase the likelihood of a functional prediction (Laskowski et al., 2005;Pierri et al., 2010;Skolnick & Brylinski, 2009). However, there is much room for improvement to the currently available approaches for the prediction of protein function, as only ~1% of proteins on the UniProt database have experimentally-supported function, ~65% have some functional annotation, and over one-third are uncharacterized or have no predicted function (Barrell et al., 2009;Erdin et al., 2011;Goldsmith-Fischman & Honig, 2003;Laskowski et al., 2003;Magrane & Consortium, 2011). This may be particularly important for disease genes, as many intrinsically-disordered proteins, lacking stable secondary and tertiary structures, are being found associated with many complex human diseases, including cancer, diabetes, neurodegenerative diseases, and cardiovascular disease Uversky, 2009;Uversky et al., 2008Uversky et al., , 2009Wang et al., 2011).

Evolutionary pedigree and co-expression to leverage functional prediction
Expression data can also provide information relevant to genetic disorders. For example, it is useful to know whether candidate genes are expressed in the tissue affected by the disease. Furthermore, genes involved in similar cellular processes are also more likely to be co-transcribed, hence the 'guilt-by-association' algorithm (Walker et al., 1999;Oliver, 2000). Not only can 'power' be added to the analysis of co-expression datasets, a better understanding of the evolutionary history of human genes may lead to novel ways to interpret sequence data and predict protein function (Eng et al., 2009;Thornton & DeSalle, 2000). For example, proteins with a similar evolutionary pedigree, and emerging at the same 'point' in the evolutionary history of the Eukaryota, are assumed to have evolved in parallel. This, in turn, indicates they are more likely to have a common function (Eisenberg et al., 2000). The 'guilt-by-association' principle greatly informs our understanding of proteinprotein interaction networks, as it is known that most cellular functions are carried out by networks of interacting proteins (Qiu & Noble, 2008). Understanding protein-protein interaction networks is also informing computational approaches aimed at understanding the role of pathways affected in complex genetic diseases, and this is discussed next.

Network studies to leverage functional prediction
Network-based models incorporating protein-protein interaction (PPI) data are a relatively new way for studying disease-related genes. Nonetheless, this approach has already been proved to be effective for the identification of complex disease genes, including those involved in colon cancer (Nibbe et al., 2009). Therefore, analyzing the functional networks of human genes is providing a key framework for prioritizing candidate disease genes. Such analyses may also lead to the identification of key cellular pathways common to complex diseases that may be amenable to therapeutic intervention.
Based on large-and small-scale PPI studies, preliminary protein-protein interaction networks are being created (e.g. Fig. 2). These can be accessed and viewed using a number of web tools, such as STRING and BioGRID (Han, 2008). Proteins sharing a particular functional category cluster in the same location of PPI networks, and are referred to as functional modules, and placement of proteins in PPI networks can be used to inform protein function prediction studies (Dziembowski & Seraphin, 2004;Yook et al., 2004;Makino & Gojobori, 2006). Understanding networks of PPIs can be used to predict additional genes that, when mutated, may cause the same disease as that associated with mutations in interacting partners (McGary et al., 2010), and also explains why so many different mutated genes can cause the same or similar complex disease (Bill & Geschwind, 2009;Bourgeron, 2009;Crespi et al., 2010;Gilman et al., 2011;Guilmatre et al., 2009). Relating to the network-based nature of many gene products is the concept that some proteins interact with multiple other proteins, referred to as 'hub' genes (Fig. 2). Multifunctional genes may impact on numerous pathways, while different mutations may cause different diseases (Gillis & Pavlidis, 2011). The encoded hub proteins have multi-functional cellular roles, and are typically annotated with multiple GO categories (Gillis & Pavlidis, 2011). The multi-functional nature of many proteins also contributes to phenotypic pleiotropy, where mutation(s) in a single gene can affect multiple phenotypic traits. Genes leading to pleiotropic phenotypes tend to be more evolutionarily conserved and are more likely to have essential functions (Eisenberg & Levanon, 2003;Feldman et al., 2008;Fraser et al., 2002;Gandhi et al., 2006;Goh et al., 2007;Jeong et al., 2001;Saeed & Deane, 2006). Overall, human disease-causing genes have an intermediate essentiality, being less than that of housekeeping genes, but greater than that of non-disease genes (Feldman I et al., 2008;Goh et al., 2007;Liao & Zhang, 2008;Tu et al., 2006). These findings are, to some extent, commonsense, as mutation of a housekeeping gene would be expected to lead to embryonic lethality.
As gene products that are peripheral in protein-protein interacting networks are known to have a higher evolutionary rate than hub proteins, these data also suggest human disease genes differ in rates of evolutionary change compared to non-disease genes. However, current results on this topic are inconsistent, with some studies indicating disease genes are evolving more slowly than non-disease genes (Blekhman et al., 2008;Tu et al., 2006), and other studies suggesting disease genes are evolving faster than non-disease genes (Huang et al., 2004;Smith & Eyre-Walker, 2003). There are a number of confounding factors contributing to these inconsistencies in establishing the rate of evolution of human disease genes. Comparing proteins based on PPI network interactions, genes encoding proteins that are part of 'modules' tend to be more conserved, evolutionarily old, and ubiquitously expressed. By contrast, genes encoding proteins outside modules are less well-conserved, evolutionarily younger, and enriched with at least some degree of tissue-specific expression (Dezso et al., 2008). Therefore, there appear to be different classes of disease genes, with different evolutionary patterns and different PPI patterns, and this may relate to the evolutionary 'age' of the disease gene (Nagaraj et al., 2010). Characterization the differing traits or classes of disease genes may contribute to our understanding of the different functional 'style' of disease genes. It also appears that Mendelian disease genes and complex disease genes have different evolutionary profiles (see Section 3.2.3) and, of course, these studies are limited by our current knowledge of disease genes, which remains incomplete.
Networks of PPIs are also informing efforts aimed at understanding whether human evolution is heading towards, or away from, susceptibility to a particular disease. One crucial aspect of this discussion, is the presence of alleles that increase the risk of one disorder, while simultaneously decreasing the risk for another. For example, multiple sclerosis and rheumatoid arthritis, or ankylosing spondylitis and multiple sclerosis, are negatively correlated (Sirota et al., 2009). By contrast, other alleles can increase the susceptibility to more than one complex disorder. This leads to multiple sclerosis and autoimmune thyroid disease, or type 1 diabetes and coeliac disease, commonly co-occurring (reviewed by Sirota et al., 2009). While these studies indicate the complexities of disease susceptibility, tools are being developed to leverage PPI data to predict such disease interactions (Chen et al., 2008;Chua & Wong L, 2008).
Overall PPI networks are crucial for the development of testable hypotheses regarding the underlying pathogenic mechanisms of many complex genetic disorders. However, PPI networks are incomplete and include both false-positive and false-negative interactions (Chen et al., 2008;Gandhi et al., 2006;). Development of a reliable and complete human PPI network will provide an invaluable framework to study the contribution of multiple genes to complex genetic disorders, and will require both computational and experimental data to achieve accuracy and completeness.

Role of animal models
Animal research, including animal models of disease, has been responsible, at least in part, for every major medical advance made during the last century (Müller & Grossniklaus, 2010). The reason why model organisms are able to contribute so effectively to our understanding of human diseases lies in the high degree of molecular conservation found between metazoan species, and in the conserved nature of protein-protein, and other, networks (Gandhi et al., 2006). Indeed, even bacteria, plants, protists, and fungi are being exploited to explore differing aspects of biology relevant to human disease (Annesley & Fisher, 2009;Ilievska et al., 2011;McGary et al., 2010;Spradling et al., 2006).
As discussed above, a huge number of susceptibility alleles for a range of complex human genetic disorders have now been identified, but the function of many of these genes is poorly understood. Therefore, while risk-associated loci are being successfully identified (Easton et al., 2007;Hindorff et al., 2009;Jia et al., 2009), these findings are rarely followed up and the contribution of the allele to the molecular basis of disease rarely evaluated (McCarthy & Hirschhorn, 2008). That this failure is leading to a bottleneck in our understanding of complex disorders was highlighted by a recent Nature Genetics editorial, which suggested that significant investment in functional characterization of risk loci is needed (Axton, 2010). There are a number of ways to investigate the molecular and cellular function of a gene and its alleles, and these include in vitro studies, cell culture systems, and the use of whole animal models. These tools can be used to test hypotheses gained from thorough in silico studies. In this section we will discuss the role of animal models, as this is of greatest relevance to the topic of this paper. Although in vitro and cell culture studies can be of great benefit, ultimately good animal models provide the best biological models for complex disease.
High-throughput phenotypic screens of RNAi knockdowns in Caenorhabditis elegans and Drosophila melanogaster often provide the first inkling of the biological function of an uncharacterized human gene (Buckingham et al., 2004). Efforts are also underway to systematically knockout all the genes in the mouse genome to facilitate phenotypic and functional screening (Guan et al., 2010). Animal models also provide a vital system amenable for dissection of the contributions of genetic, environmental and developmental components to the etiology of complex human disorders, and for evaluation of novel therapeutics, which can be achieved in no other way (Complex Trait Consortium, 2004;Iwata et al., 2010).
Many recent advances relevant to studying complex disorders have also been made. This includes the development of a well-defined collection of recombinant inbred mice with different genetic backgrounds (Complex Trait Consortium, 2004). Studies from many organisms indicate that the phenotype of some gene knockouts only becomes apparent upon inactivation of another gene (Barbaric et al., 2007), indicating genetic background can be very important. An alternative, but related, approach to investigating human disease mechanisms, is to study animals that already have a disease-related phenotype of interest. These orthologous phenotypes, or phenologs, can be used to predict novel genes associated with a disease. This approach has been used to predict genes for angiogenesis, breast cancer, autism spectrum disorder, and Waardenburg syndrome, among others, in many diverse model organisms (Gilby, 2008;McGary et al., 2010;Pearson et al., 2011). Finally, while models were previously limited to studying one variant/gene at a time, efforts are now being made to investigate the cooperative interactions of multiple genes. For example, quintuple knockout mice have been used to study the role of multiple immune system genes in asthma (Dahlin et al., 2011). Therefore, new resources and tools are being developed to study complex diseases more effectively in model organisms.
However, care must be taken in extrapolating data from animal models to the human situation, as no model organism can exactly reproduce the disease of another. It is as important to understand the differences, as it is to highlight the similarities, between the animal model and the human disease. The ability to gain a thorough understanding of the key differences and similarities between species will minimize misinterpretation of data gleaned from model organisms, and lead to the improved use of animal models to both understand and develop treatments for human genetic disorders. The importance in understanding the differences between human and animal gene expression and physiology will be discussed below, using autism spectrum disorder as an example.

Human evolution and disease susceptibility
Evolutionary analysis has been applied to many aspects of human disease. As discussed in Section 3.3.2 above, one of the earliest findings revealed that Mendelian disease-associated nsSNPs are more frequently found in conserved amino acid positions, and these positions can be conserved even in more distantly-related proteins, while complex disease-associated SNPs are frequently not (Miller & Kumar, 2001;Ng & Henikoff, 2002;Thomas et al., 2003;Thomas & Kejariwal, 2004). Some other aspects of evolutionary biology, increasing our understanding of human genetic disease, are discussed next.
Over the last 100,000 years, humans have adapted to many changes in environment as they moved out of Africa and modified both diet and lifestyle, factors which influence the incidence of common genetic variants by positive selection of those alleles that prove advantageous (Sabeti et al., 2006). This theory is supported by the finding that complex disease-associated gene variants show heterogeneity in allelic frequency among different human populations, leading to a non-homogeneous world-wide distribution of disease alleles (Ioannidis et al., 2004). These findings have implications for GWAS, as disease variants differ between human populations (e.g. between Hispanics and those of African descent, Asian descent or European descent), and will affect the reproducibility of results from genetic studies of complex disease depending on the ethnic mix studied (Marigorta et al., 2011). Therefore, the recent evolutionary history of humankind affects the present global patterns of susceptibility to disease. The classic example of this is the mutation in the Hemoglobin B gene, undergoing positive selection in African populations as it promotes malaria resistance, while simultaneously being causative of sickle cell anaemia (Currat et al., 2002;Williams TN, 2006). Another example is a mutation in a regulatory region of the lactase gene (LCT) that mediates adult tolerance to lactose. Evidence suggests this particular variant was selected in parts of Europe after the domestication of cattle (reviewed by Sabeti et al., 2006). However, identifying and understanding traits that have been targets of selection is a challenging task. It took forty years of effort, by a succession of researchers, to unravel the association between malaria and the sickle cell mutation and, even now, there is still work to be done on understanding exactly how the sickle-cell state inhibits malaria pathogenesis (Sabeti et al., 2006).
A recent study examined Mendelian-disease genes and found these genes are under widespread negative ('purifying') selection (Blekhman et al., 2008). By contrast, in this study (Blekhman et al., 2008), genes contributing to an increased risk of complex genetic disease showed little signs of evolutionary conservation, and may be targets of both positive and purifying selection. This latter conclusion was supported by a subsequent study by Corona and colleagues (2010). In their study of genes contributing to seven complex genetic diseases, only genes affecting three diseases showed signs of recent positive selection; those increasing susceptibility to Crohn disease, rheumatoid arthritis, and diabetes (Corona et al., 2010). Of note, alleles decreasing susceptibility to Crohn disease also showed signs of positive selection (Corona et al., 2010). Overall, they found evidence for an evolutionary trajectory towards a decreasing risk of Crohn disease, but an increasing risk of type 1 diabetes. Therefore, we may need to 'think outside the box' about why some complex disorders occur in the human population, and look not only for the disadvantages of a disease-causing mutation, but also consider unthought-of selective advantages. Furthermore, an allele increasing susceptibility to disease in the modern era, may have increased fitness in an earlier human environment. For example, rheumatoid arthritis susceptibility alleles are thought to enhance resistance to tuberculosis (Mobley, 2004;Rothschild et al., 1992), while the type 1 diabetes risk gene IFIH1, helps protect against enterovirus infection (Nejentsev et al., 2009). Therefore, for polygenic disorders, not only can the same polymorphisms contribute to more than one disease, some alleles may increase the risk for one disorder while simultaneously decreasing the risk for another (Sirota et al., 2009). Progress in our understanding of these complexities will be facilitated by an increased understanding of the function of disease genes and risk alleles, and rigorous studies examining natural selection in human disease-risk genes.

Applications of evolutionary biology to the study of human neurological disorders
Below, how knowledge from evolutionary biology-based analyses are providing us with crucial information impacting greatly on our understanding of neurodevelopmental disorders, and a surprising link to cancer, will be illustrated.

Neurodevelopmental disorders
Both schizophrenia (SCZ) and autism spectrum disorder (ASD) are considered common neurodevelopmental disorders (~1% of the population affected), which are typically diagnosed in early adulthood (SCZ) or childhood (ASD) (Bale et al., 2010;Costa e Silva, 2008;Lewis & Levitt, 2002;Owen et al., 2005). Both ASD and SCZ are behaviourally-based diagnoses, with the diagnostic criteria outlined in DSM-IV (American Psychiatric Association, 1994) and ICD-10 (World Health Organization, 1993). Briefly, SCZ is diagnosed on the presence of a collection of positive symptoms, negative symptoms, and cognitive deficits, while ASD is diagnosed on the basis of a triad of behavioural manifestations: social deficits, impaired communication skills, together with repetitive behaviours and/or restricted interests (see Fig. 3). Both ASD and SCZ are considered complex genetic disorders, with heritability estimates of around 80% for both (Ronald & Hoekstra, 2011;Sullivan et al., 2003). While some genetic disorders with Mendelian inheritance lead to syndromic forms of ASD (i.e. a phenotype of which ASD is typically one part), and while some alleles of intermediate penetrance appear to contribute, a large proportion of ASD cases fit the CVCD model (Eapen, 2011). As predicted by the CVCD model, parents of children with ASD share a subset of phenotypic traits, without having the 'full' ASD phenotype (Bernier et al., 2011;Robinson et al., 2011). A further 5-10% of ASD is caused by de novo mutations (reviewed by Eapen V, 2011), and this is discussed further below. For SCZ, the CVCD model of inheritance also dominates, as individuals with SCZ are less likely to pass on their genes to the next generation (Crow, 2011). Both ASD and SCZ are large, active areas of scientific research, and below we highlight those aspects where evolutionary biology is relevant.
Certain structural features make some chromosomal regions more prone to rearrangements, such as deletion, duplication or translocation (Smith et al., 2010), and this is discussed further in Section 6.2.3 below. For example, the 16p11.2 microdeletion linked to ASD is mediated by segmental duplications (Kumar et al., 2007), while deletions at 15q11-13 occur during meiosis and are caused by a number of repeated DNA elements in this chromosomal region. The 15q duplication syndrome is the result repetitive sequences mediating unequal but homologous recombination and, of note, also causes ASD (Chamberlain & Lalande, 2010). This indicates that duplication or deletion of some genes can cause an overlapping

A B
phenotype. However, this is not a universal phenomenon. Deletion of 16p11.2 leads to a phenotype of ASD with macrocephaly, while duplication to leads to SCZ and microcephaly (Brunetti-Pierri et al., 2008;McCarthy et al., 2009;Shinawi et al., 2010). A further example is duplication of 1q21.1, which is implicated in ASD with macrocephaly. Deletion of this region causes SCZ and microcephaly (Dumas & Sikela, 2009;Crespi et al., 2010). This is not restricted to neurodevelopmental disorders, as the most common locus affected in Charcot-Marie-Tooth (CMT) neuropathy, PMP22, when deleted causes a different neurological phenotype to that associated with gene gain (Chance, 2006). Duplication of PMP22 leads to CMT type 1A, while PMP22 deletion leads to a disease known as hereditary neuropathy with liability to pressure palsies (HNPP). Therefore, CNV gain, versus CNV loss, at identical loci can mediate similar or distinct phenotypes.

Phylostratigraphy of neurodevelopmental genes
As might be expected for neurodevelopmental disorders, many ASD-and SCZ-implicated gene products participate in protein-protein interaction networks implicated in neuron function (Bourgeron, 2009;Bill & Geschwind, 2009;Gilman et al., 2011;Sun et al., 2010;Torkamani et al., 2010;Voineagu et al., 2011). These genetic and PPI studies are supported by pathological findings, as structural alterations of dendritic spines are associated with both SCZ and ASD (reviewed by Penzes et al., 2011).
In addition to furthering our understanding of the evolution of human disease, model organisms play a major part in developing an understanding of the etiology of human genetic disorders. As discussed above, model organisms play a key role in characterisation of the normal and abnormal functions of risk genes (Aitman et al., 2011). Therefore, it is important to determine whether the genes implicated in SCZ and ASD are conserved in model organisms.
Phylostratigraphy is a term applied to the application of phylogenetic methods to evaluate the evolutionary origin of disease genes and/or the origin of genes contributing to major evolutionary adaptations (Domazet-Loso et al., 2007;Domazet-Loso & Tautz, 2008). Such an approach has not yet been applied to any neurodevelopmental disorder. However, some data are available relating to the conservation of key PPI networks implicated in SCZ and ASD. Of relevance to SCZ and ASD, as well as other neurological disorders, is evidence that the core components of the nervous system and immune system are conserved in vertebrates. Indeed, the core components of the synapse are found in cnidarians, which form primitive nerve networks, and evolved around 680 million years ago (Galliot et al., 2009;Grimmelikhuijzen & Westfall, 1995). Furthermore, many synaptic genes are found in sponges, the oldest-surviving metazoan phyletic lineage, which actually lack synapses (Kosik et al., 2008;Srivastava et al., 2010). These data suggest many of the genes involved in neurological disorders have a more ancient evolutionary origin than previously thought.
A contributing role of the adaptive and innate immune systems, to ASD or SCZ etiology has also been suggested by some studies (Ashwood et al., 2006;Muller et al., 2000;Sun et al., 2010;Voineagu et al., 2011). This role would need to be studied in vertebrate models (possessing adaptive and innate immune systems), as non-vertebrate species only have an innate immune system. Nonetheless, a comparison of phenotypes between vertebrates and non-vertebrate may provide vital information about the relative contributions of immune dysfunction to the ASD or SCZ phenotype. A similar logic has been used to explore novel cell-death pathways, where regulated cell death processes are examined in species lacking classical caspase enzymes (Degterev & Yuan, 2008;Guisti et al., 2010;Smirlis & Soteriadou, 2011). A thorough, systematic analysis of differences in key disease networks is also required to gain the best insights into the strengths and weaknesses of each specific model organism. As the genomes of all currently used model organisms have been sequenced, accurate network analyses and disease network analyses are now imperative if we are to understand disease evolution and the limitations and/or benefits of various model organisms for disease.

Evolution of common neurodevelopmental disorders
A number of hypotheses regarding the evolution of autistic and schizotypal traits have been proposed, promoting controversy and debate. For example, it has been proposed that the intense focus and repetitive behaviours of ASD may have been beneficial to hunters and gatherers (Reser, 2007). This hypothesis suggests that, subsequent to the ascendance of agriculture and the development of complex community-based lifestyles, these alleles have become increasingly disadvantageous, and the common alleles responsible for ASD traits only remain in the human gene pool because they previously had adaptive function (Reser, 2011). Alleles contributing to SCZ have likewise been hypothesized to have prior adaptive function, and contributed to a fitness advantage in the ancestral human environment, such as providing physiological and behavioural characteristics increasing survival in conditions of nutritional paucity and stress (Reser, 2007). The clinical ASD or SCZ diagnosis would then ensue when individuals with subclinical phenotypes mate (Del Giudice et al., 2010), following the CVCD model.
Potential evolutionary advantages afforded by sub-clinical phenotypes of other psychiatric disorders have likewise been proposed, based on data indicating that a large proportion of these disorders are due to the inheritance of multiple copies of low-risk gene variants, which are present in the parents and siblings of affected individuals (Bernier et al., 2011;Hoffman & Stat, 2010;Robinson et al., 2011). Providing more direct evidence for these hypotheses is difficult. Indeed, the debate over the amount of adaptive evolution occurring in the genome itself is far from resolved (Amos & Bryant, 2011;Eyre-Walker, 2006). Furthermore, little evidence supports adaptive evolution in genes linked to brain development (Voight et al., 2006). Finally, as the genetic and molecular pathways underpinning both ASD and SCZ remain very poorly understood, this adds to these difficulties.
Inverse comorbidity, a concept introduced in Section 5, may also be relevant to ASD and SCZ, and the example of sickle cell anaemia gene selection occurring due to protection of carriers from malaria (Currat et al., 2002;Williams TN, 2006), was provided. Recent data supporting immune system differences in individuals with ASD or SCZ (Cohly & Panja, 2005;Müller & Schwarz, 2010), suggests that other selective pressures may be involved for risk genes for these disorders. Of relevance, there is much stronger evidence for adaptive evolution in genes with known immune-system function (reviewed by Eyre-Walker, 2006). Another possible selective pressure is found in recent studies supporting a lower than expected occurrence of cancer in patients with SCZ (Tabares-Seisdedos et al., 2011), and cancer will be discussed further later in this paper. Understanding the molecular pathways underpinning these disorders will help us clarify these issues further. However, these emerging findings suggest the selective pressure on ASD or SCZ disease alleles may not be related to the behavioural phenotype.

Model organisms for neurodevelopmental disease
Many professionals were initially sceptical whether phylogenetically 'lower' species could be successfully used to study the molecular and cellular mechanisms of human brain disorders. Surprisingly, many animal species are emerging as excellent model systems for such disorders, and their tractable nature, plus the ability to control for environment and genetic background, has already led to far-reaching advances in our understanding of many neurological diseases (Chesselet, 2005;Shah et al., 2010;Tayebati, 2006). For example, many recent reviews discuss how animal models are proving their usefulness in improving our understanding of the pathophysiology of SCZ and in the development of novel therapeutic strategies (Arguello et al., 2010;Feifel & Shilling, 2010;Lazar et al., 2011;Powell, 2010;Young et al., 2010). A greater understanding of the genetics underlying SCZ will also inform the development of future animal models. Likewise, for ASD, there are a wide variety of animal models available, which are becoming increasingly well characterized (Patterson, 2011;Tordjman et al., 2007). These include naturally-bred rodents (Gilby, 2008;Pearson et al., 2010) and transgenic mouse models (Minshew & McFadden, 2011;Robertson & Feng, 2011). Some wide-reaching findings have already been made based on results from these model systems. For example, multiple studies indicate neurodevelopmental disorders may be treatable, even in adults (Ey et al., 2011;Silva & Ehninger, 2009). Furthermore, rodent models are replicating some of the co-morbidities associated with an ASD diagnosis, such as immune abnormalities (Heo et al., 2011) and epilepsy (Gilby, 2008;Peñagarikano et al., 2001).
Despite the progress and potential that animal models of disease provide, caution must be exercised with the interpretation of data from such model systems, particularly when considering disorders affecting the central nervous system (CNS). Factors to consider when evaluating animal model data include the developmental trajectories unique to humans. The most dramatic of these is in the timing of maturation and pruning of the CNS during childhood (reviewed by Dean, 2009). Regulation of gene expression differs between animal species, with differences in microRNAs (Berezikov et al., 2006), DNA methylation patterns (Enard et al., 2004) and, as discussed next, mRNA splicing, also being detected.
Understanding the role of alternative isoforms is vitally important, as aberrant gene splicing is emerging as a key contributor to a variety of neurological diseases (Anthony & Gallo, 2010) and cancers (Ward & Cooper, 2010). Alternative splicing is considered a major mechanism underpinning metazoan biological complexity, including the increasinglycomplex brain function of metazoa, with neurons having specific systems for regulating mRNA splicing and generating brain-specific isoforms (Ule & Darnell, 2007). Indeed, increasing numbers of splice site mutations are being implicated in the etiology of ASD and SCZ (Glatt et al., 2011). However, differences in splicing occur, even between closely-related species such as humans and chimpanzees (Calarco et al., 2007;Blekhman et al., 2010;Lin et al., 2010). Therefore, conservation of splice variation may be relevant to disease etiology, and different profiles should be assessed in animal models of disease. Current best-practice guidelines for preclinical studies for neurological disease in animal models, have recently been published (Shineman et al., 2011).

Evolution of the Deleted In Autism 1 gene
DIA1 (Deleted In Autism 1) is implicated in an autosomal recessive form of ASD (Morrow et al., 2008;Aziz et al., 2011a). An evolutionary biology-based approach to understanding the role of this gene in ASD has illustrated the importance of many of the principles outlined above. While DIA1 is conserved from cnidaria to humans, it is not detected in nematodes, suggesting C. elegans is not a suitable model in which to study the cellular role of this gene. Strikingly, a related gene was found in humans using phylogenetic-based analyses, DIA1R, which localizes to the X chromosome (Aziz et al., 2011b). DIA1R is vertebrate-specific and, as with DIA1, is implicated in ASD (Aziz et al., 2011a(Aziz et al., , 2011b. Of possible relevance to the ASD phenotype, DIA1R had been 'lost' in fish of a solitary nature, while those retaining the gene are 'social' schooling fish (Aziz et al., 2011b). Of further relevance to the use of animal models, DIA1R was found to be X-inactivated in mouse, but not in humans (Aziz et al., 2011b;Yang et al., 2010), and splicing may be species-specific. Indeed, we have preliminary evidence for brain-specific splicing of DIA1 that is primate-specific, and not found in other vertebrate lineages (Aziz & Bishop, unpublished data). This type of evolutionary-based evidence facilitates an educated approach to the choice of model organism for functional studies, and highlights issues that may arise from studies in mice or fish.

Cancer
Natural selection and evolution is dependent on genetic variability and the occurrence of new mutations in the germ-line. However, mutation occurs in both germinal and somatic cell lineages and, over a human lifespan, somatic mutations accumulate and may lead to cancer (Greaves, 2007). Therefore, as with other genetic disorders, carcinogenesis is another manifestation of the biological processes on which evolution depends. Cancer is considered a probabilistic disease, and is inevitable in long-lived organisms such as humans, where the lifetime risk is around one in three (Greaves, 2007;Simpson & Camargo, 1998). In most, but not all, cases this is due to multiple genetic changes accumulating in cells (Maffini et al., 2004;Stratton et al., 2009;Touw & Erkeland, 2007).
Genes contributing to cancer are often divided into two broad groups: caretakers and gatekeepers (Kinzler & Vogelstein, 1997;Macleod, 2000;Michor et al., 2004;Russo et al., 2006). Mutations affecting caretaker genes promote neoplasia indirectly, increasing genetic instability and mutation rates, which leads to defects in many genes including gatekeeper genes. Mutations affecting gatekeeper genes directly promote cancer progression, and have roles in cell differentiation, growth and/or death. Gatekeeper genes may be further divided into oncogenes and tumour-suppressor genes. Well-known caretaker genes include BRCA1, BRCA2, ATM and FANCA; gatekeeper genes include TSC1, TSC2, Rb, NF1, NF2 and PTEN (see Fig. 3); while some genes are multifunctional and can act as gatekeepers and/or caretakers, including p53 and ARF (Dominguez-Brauer et al., 2010;Rubbi & Milner, 2005;Russo et al., 2006).

Cancer and neurological disorders
One surprising recent finding is the emerging link between a number of human disorders and cancer. Recent data indicate a lower than expected occurrence of cancer in patients with SCZ, Down syndrome, Parkinson disease, Alzheimer disease and multiple sclerosis (Tabares-Seisdedos et al., 2011). What is striking, is that most of the disorders found to protect against cancer, and which lead to an inverse cancer morbidity in humans, are neurological disorders. This is thought to occur due to the genetic and molecular connections between cancer and these complex human diseases (Tabares-Seisdedos et al., 2011). Of greater concern, is a higher incidence of certain cancers in patients with ASD (Crespi, 2011). This is likewise due to shared molecular pathways (Crespi, 2011), and ASD also shares genetic connections with SCZ and Alzheimer disease (Crespi et al., 2010;Sokol et al., 2011). Indeed, some lines of evidence suggest that ASD and SCZ involve diametric etiology (Crespi et al., 2010). Exploring these links will have far-reaching implications.

Evolution of cancer genes
A link between cancer formation and the evolution of multicellularity was predicted, and this hypothesis was recently explored (Domazet-Loso & Tautz, 2010). Using a phylostratigraphic strategy, Domazet-Loso and Tautz (2010) found different evolutionary origins for gatekeeper, compared to caretaker, human cancer genes. Genes with a caretaker function have an evolutionary origin in the first cellular organisms, and are also found in bacteria and/or archaea (Domazet-Loso & Tautz, 2010). By contrast, only genes with gatekeeper functionality correspond to the origin of metazoa, and are detected in Porifera and/or Cnidaria and subsequently diverging phyla (see Fig. 4). For example, genes encoding the ESCRT (endosomal sorting complex required for transport) proteins, are essential for the downregulation of many cell-surface signalling molecules (reviewed by Ilievska et al., 2011). Not only do many of the genes encoding ESCRT subunits appear to have caretaker functions, with mutations being linked to both cancer and neurological disorders, these genes also have an origin pre-dating the metazoan lineage (reviewed by Ilievska et al., 2011). Model organisms, such as the amoeba, Dictyostelium discoideum (Annesley & Fisher, 2009;Williams RS et al., 2006), can therefore be used to study the fundamental molecular roles of these proteins (Blanc et al., 2009;Mattei et al., 2006). Overall, the evolutionary age of cancer genes parallels their role in human cancer etiology (Fig. 4).

Heritability of cancer and co-heritability with neurological disease
Most tumour gene mutations can be inherited as well as acquired, although inherited cancer syndromes are quite rare. The cancer syndromes are inherited in a dominant Mendelian manner and are often associated with developmental defects and benign tumours (Knudson, 1971(Knudson, , 1993Ponder, 2001). Single genes and deletions of multiple genes can both cause syndromes of which cancer is one part. For example, deletions of multiple genes at position 11p13, is clinically associated with WAGR syndrome and patients have a collection of symptoms including mental retardation, aniridia and kidney tumours (Fischbach et al., 2005). Mutations in TSC1 or TSC2 cause tuberous sclerosis, a disease characterised by nonmalignant hamartoma formation in many organs, with epilepsy, ASD and/or mental retardation being frequent comorbidities (de Vries, 2010), while PTEN mutations are identified in human cancers, and also in the germline of patients with hamartoma tumourrelated syndromes (PHTSs). In addition to harmatomas, ASD is a common comorbidity in individuals with PTEN mutations (Rodríguez-Escudero et al., 2011). Molecular network analyses (e.g. Fig. 2) are being used to establish the molecular pathways leading to the multiple phenotypes caused by mutation a single 'cancer gene' (de Vries, 2010;Rodríguez-Escudero et al., 2011). Genetic and functional studies are also being used, with current data indicating that different mutations in the PTEN gene can cause cancer alone, or cancer with ASD (Rodríguez-Escudero et al., 2011).

Recurrent genome rearrangements and cancer
Both somatic-cell chromosome rearrangements and germ-line rearrangements are implicated in both human variation and disease, including the development of cancers (Hanahan & Weinberg, 2000;Kidd et al., 2008;Inoue & Lupski, 2002;Stankiewicz & Lupski, 2002). On an evolutionary timescale, chromosome rearrangements have also played a key role in the divergence of species and differences in chromosomal arrangements between species (Drosophila 12 Genomes Consortium, 2007;Kehrer-Sawatzki & Cooper, 2007;Pevzner & Tesler, 2003;Peng et al., 2006). Furthermore, the evolutionary chromosome breakpoints and chromosome breakpoints found in diseases, including cancer and developmental disorders, overlap (Darai-Ramqvist et al., 2008;Lindsay et al., 2006;Murphy et al., 2005). Therefore, understanding the mechanisms of chromosome breakage and rearrangement is an active field of biological research.
Since that time, the most highly-fragile spots in the human genome have been mapped (Smith CL et al., 2010). The fragile-site model leads to various testable hypotheses as to the nature of these hot spots, and factors such as DNA sequence and chromosome structure were proposed. Since then, evolutionary breakpoints have been shown, for example, to be gene-rich regions with significantly more segmental duplications and/or repetitive elements than expected, which may facilitate homologous recombination (Bailey et al., 2004;Bulazel et al., 2007;Everts-van der Wind et al., 2004Kehrer-Sawatzki & Cooper, 2008;Murphy et al., 2005;Schibler et al., 2006). These regions also encompass genes with higher densities of copy number variation and SNPs (Larkin et al., 2009). Understanding chromosomal 'hotspots' will not only inform our understanding of evolutionary biology, but such studies will also increase our understanding of the etiology of human genetic disorders caused by chromosomal rearrangements.

Conclusion
Genetic disorders are an inescapable component of evolution. However, comparative and evolutionary biology methodologies also play an important role in developing and improving our comprehension of many aspects of complex human genetic disorders including schizophrenia and autism spectrum disorder. Thorough in silico analyses of disease genes, the encoded proteins, their structure and function, are fundamental tools for evaluating differences and similarities between human genes and genetic pathways, and the comparison of these with equivalents in animal models. Such explorations of the evolutionary conservation of human genes, proteins, protein structures, and resulting cellular networks enables us to: (i) understand how and why human diseases originated; (ii) predict disease genes; (iii) best predict the impact of genetic variation on cellular function; (iv) choose appropriate animal models for human diseases; (v) better interpret data obtained in studies using model organisms; and (vi) evaluate more accurately the validity of therapeutics. Therefore, comparative and evolutionary biology is of major relevance to our understanding of, and in the development of treatments for, many human neurological disorders.

Acknowledgment
Thanks to all members of the NEB and PRF laboratories for helpful discussions. We apologize to those authors whose work we have be unable to cite directly, due to space limitations. AA was supported by the Malaysian Ministry of Higher Education and the Islamic Science University of Malaysia. JI was supported by an Australian postgraduate award. This work was supported by grants from the Australian Research Council and the Thyne Reid Memorial Trust.

References
Adie