The Prediction and Analysis of Inter- and Intra-Species Protein-Protein Interaction

Protein-protein interactions (PPIs) are essential to cellular processes. Recent developments of high-throughput technologies have uncovered vast numbers of PPIs. However, the experimental evidences are mostly for intra-species interactions of model organisms, especially human. Studies of non-human organisms and inter-species PPIs are few. For organisms such as Arabidopsis thaliana, the experimentally detected 5990 PPIs are estimated to be less than 3% of the entire A. thaliana interactome (M. Lin et al., 2011). The accuracy of high-throughput PPI experiments is also doubtful (Mrowka et al., 2001; Sprinzak et al., 2003; von Mering et al., 2002). To resolve the above issues, several computational methods have been developed to evaluate and predict PPIs. This chapter focuses on direct PPIs which involve physical interactions of proteins, provides a brief overview of the reliabilities of high-throughput PPI detection technologies, and discusses the weakness and strength of important PPI computational prediction and evaluation methods. The major repositories which store, evaluate, and analyse both detected and predicted PPIs are also introduced.


Introduction
Protein-protein interactions (PPIs) are essential to cellular processes. Recent developments of high-throughput technologies have uncovered vast numbers of PPIs. However, the experimental evidences are mostly for intra-species interactions of model organisms, especially human. Studies of non-human organisms and inter-species PPIs are few. For organisms such as Arabidopsis thaliana, the experimentally detected 5990 PPIs are estimated to be less than 3% of the entire A. thaliana interactome (M. Lin et al., 2011). The accuracy of high-throughput PPI experiments is also doubtful (Mrowka et al., 2001;Sprinzak et al., 2003;von Mering et al., 2002). To resolve the above issues, several computational methods have been developed to evaluate and predict PPIs. This chapter focuses on direct PPIs which involve physical interactions of proteins, provides a brief overview of the reliabilities of high-throughput PPI detection technologies, and discusses the weakness and strength of important PPI computational prediction and evaluation methods. The major repositories which store, evaluate, and analyse both detected and predicted PPIs are also introduced.

Experimental detection of protein interactions
Not until the past decade, PPIs were identified by time consuming and labour intensive methods, such as low-throughput (small-scale) yeast 2-hybrid (Y2H). The development of high-throughput technologies brought studies of PPIs to an -omics level. Of all the technologies used for PPI detection, the high-throughput Y2H is most mature and commonly used. However, it is also one of the most inaccurate techniques, producing an estimated ~ 50% of false positives (Parrish et al., 2006;von Mering et al., 2002). The error rates of the other high-and medium-throughput technologies are summarized in Table 1 (Mrowka et al., 2001;Parrish et al., 2006;Sprinzak et al., 2003;von Mering et al., 2002).
In the past few years, BiFC has became one of the popular in vivo technologies as it has a medium throughput and reasonable cost, is technically straightforward, and provides information on subcellular localizations of proteins. The drawback for BiFC is its occasional false postivies caused by non-specific interactions and background flurorescence. Splitluciferase system has an extremely low background, but does not disclose the subcellular localization of interactions. Protoplast Y2H is similar to Y2H, but technically more challenging as it has to be operated in a nuclei or protoplasts. SUS has high rates of false positives and background signals. The in vitro technologies are less favourable as the reactions do not occur in cellular environments and do not examine the cellular localization of proteins.

Technology
Throughput Accuracy References

Computational prediction and evaluation of protein interactions
As listed in Table 2, the methods for PPI prediction and evaluation can be classified into five categories based on the types of information required for analysis -(1) protein sequences, (2) Gene Ontology (GO), (3) gene expression profiles, (4) topology of the interaction network, and (5) experimental data.

Shared Gene Ontology Annotation
Protein functions (De Bodt et al., 2009;Jain & Bader, 2010;Wu et al., 2006) Protein localization (De Bodt et al., 2009;Jain & Bader, 2010) Topology Analysis Distance between proteins in a PPI network (Dyer et al., 2007) Experimental Data Cited literatures (text mining) (Jaeger et al., 2008) Detected PPI datasets (von Mering et al., 2002)  Methods for PPI prediction and evaluation is each developed based on an assumption which states certain criteria are more likely to occur between interacting proteins. These methods are often combined, usually in a Bayesian network (Huttenhower & Troyanskaya, 2006;Jansen et al., 2003;Lee et al., 2006;N. Lin et al., 2004;McDowall et al., 2009;Patil & Nakamura, 2005;Wang et al., 2009;Xu et al., 2011). Some criteria are more relevant to protein interactions than the others. When different methods are combined, the statistical confidence estimated by each method could be weighted according to the confidence level of corresponding assumption.

Interologs
Homologous proteins often conserve similar functions and PPIs across different organisms, especially in phylogenetically close-related species (Hirsh & Sharan, 2007). These conserved PPIs are designated as interologs.
In the interolog method, it is assumed that if a pair of proteins, A and B, interact and there are two other proteins, A' and B', of which A' is homologous to A and B' is homologous to B, then A' and B' are potentially an interolog to A and B. Interologs can occur among different species or in the same species (Mika & Rost, 2006). The conventional interolog method identifies homologous proteins by comparing the global sequences. For proteins of which only partial sequences are similar, sequence signatures may be compared instead of full sequences (Sprinzak & Margalit, 2001). Structurally similar proteins may also have similar protein interactions, but predicting PPIs by identifying proteins with similar structures is impeded by the limited structural information available Ogmen et al., 2005). The interologous relationship between the two pairs of proteins, one pair predicted and one pair detected, could be evaluated by functions such as the s score. (He et al., 2008). Homologous genes of model organisms can be identified using BLAST or in HomoloGene database that automatically identifies and collects homologs from fully sequenced genomes (Sayers et al., 2011). The interolog method has been used frequently. The first human PPI network, the Arabidopsis PPI network, and the rice blast fungus PPI network are a few examples constructed by predicted interologs (Geisler- Lee et al., 2007;He et al., 2008;Lehner & Fraser, 2004). Unfortunately, prediction of plant PPIs through a comparative interactome approach is challenged by the unique biology of plants which involves PPIs not commonly found in the other model organisms. Less than 50% of A. thaliana proteins have been found to have orthologs in the more extensively studied organisms such as yeast, Caenorhabditis elegans, fruit fly, or human (Gollery et al., 2006). Furthermore, the interolog method does not differentiate the functionally significant amino acid residues from the others; neglects the residue-specific requirements for interaction specificity and affinity (Uhrig & Hulskamp, 2006). For the highly homologous members of protein families, the interlog method could be prone to errors.

Phylogenetic relationship
Interacting proteins have been observed to have topologically similar phylogenetic trees for the corresponding protein families, presumably due to the co-evolution of cooperating proteins (Fryxell, 1996;Goh et al., 2000;Pages et al., 1997). Based on the above observation, the phylogenetic similarity method was proposed. To compare and construct the phylogenetic trees, firstly, the sequences of two potentially interacting protein families are aligned. Secondly, the evolutionary distance matrixes are calculated from the phylogenetic trees, one for each protein family. Finally, the Pearson's correlation coefficient between the two distance matrixes is calculated as an indication of the likelihood of interactions. Partial protein sequences could be used to construct the phylogenetic trees -for example, poorly conserved sequences have been removed to improve the performance of prediction (Kann et al., 2007). A similar approach is the phylogenetic profile method. Phylogenetic profile is the profile which records the presence and absence of a protein across all species. Also due to the presumably co-evolution of proteins involved in the same biological process, proteins with similar phylogenetic profiles are more likely to have interactions. The profiles could be compared by Hamming distance . Although this approach is powerful, it can be applied only to organisms which have been fully sequenced (Frishman, 2009). Additionally, there might be complications with essential proteins which are present www.intechopen.com in all organisms (Frishman, 2009). As the second generation sequencing (SGS) technologies exponentially accumulating full genome sequences of non-model organisms, this method is expected to become more favorable.

Gene fusion and neighboring
Genes which are previously separated in the genome of one organism can be fused into the same gene in another organism. Fused genes almost always encode functional related and physically interacting proteins (Enright et al., 1999;Marcotte et al., 1999). The fusion events might accelerate the formation of protein complexes by increasing the opportunity of correct physical contact between interacting sites. Similarly, in bacteria, genes which are consistently located in the same operon across many species are likely to express functionally related, and often physically interacting, proteins (Dandekar et al., 1998;Overbeek et al., 1999;Tamames et al., 1997).

Domain-domain interactions
Just like protein interactions, domain interactions can be predicted by sequence homology among two pairs of interacting domains, by investigating the evolutionary traits of domains, or by identifying conserved neighboring relationship between domains (Frishman, 2009).
Interacting proteins are also more likely to contain domains which have been detected or predicted to interact (Ng et al., 2003).

Co-expression
Interacting proteins are assumed to have similar expression patterns (Dyer et al., 2007). The co-expression correlation coefficients of seven model animals, including human, mouse, chicken, zebra fish, fruit fly, and Coenorhabditis elegans, and nematode, are recorded in COXPRESdb (Obayashi et al., 2008;. The coexpression correlation coefficients of A. thaliana and many other flowering plants are recorded in ATTED-II . High-throughput expression data are mostly available on Gene Expression Omnibus (GEO) or TAIR for A. thaliana experiments (Garcia-Hernandez et al., 2002;Sayers et al., 2011).

Gene Ontology (GO)
Interacting proteins are presumably to participate in related biological process and share similar cellular localization (Dyer et al., 2007;Shin et al., 2009). The GO project annotates the cellular components where a protein locates and the biological process in which a protein participates. The annotations are created by structured and controlled vocabularies. The semantic similarities between GO terms assigned to proteins are often used to evaluate the confidence levels of proposed PPIs (De Bodt et al., 2009;Jain & Bader, 2010).

Topology
As more and more PPIs are revealed, PPI networks can be constructed and analyzed by topology theories. It has been proposed that two proteins which interact with the same protein should have a shorter path between them on the PPI network (Dyer et al., 2007). It has also been proposed that interacting proteins might share more neighboring proteins on a PPI network (J. Chen et al., 2006;Chua et al., 2006).

Text mining
The protein interactions which have been reported repeatedly in more peer-reviewed literatures might be more trustworthy than the ones which have never or rarely been detected (Jaeger et al., 2008). However, it must be noted that proteins with more valuable functions, such as disease mechanisms, would have been studied more intensively and been documented more frequently. PubMed and GeneRIF are common sources of text mining materials. The automated data gathering (e.g. text mining via natural language processing or biomedical language processing) is not as reliable as manually curated data. It must be noted that manual curation is neither 100% correct due to human errors and inconsistent standards for curation.

Experimental detections
PPIs detected by low-throughput technologies are generally considered as error free. For the medium-to high-throughput technologies, the reliability of the results varied as listed in Table 1. In vivo experiments are usually more accurate than the in vitro experiments, as in vivo experiments were conducted in cellular environments. Interactions supported by more than one method are generally believed to be more reliable (von Mering et al., 2002). PPI datasets which are more reliable are assumed to have more intersections with the other datasets and higher averaged numbers of documented protein interactions (Shin et al., 2009). Reliable PPI datasets should also contain greater proportion of interactions which have interacting domain pairs (He et al., 2008).
In silica protein docking is another approach which could be used for predicting protein interactions; however, it is impractical for high-throughput predictions due to the extremely large amount of required computation and the lack of detected or predicted structures for most proteins.

Protein interaction databases
More than 30 PPI databases have been published and are mostly available online (Fischer et al., 2005). Table 3 listed the frequently referenced databases. The contents of these databases are often overlapped and integrated to create larger non-redundant databases. These collections of PPIs can be used as the foundation for predicting and evaluating the reliability of PPIs. MINT is one of the few repositories which provide confidence scores for experimentally detected PPIs. It uses the number and types of experiments in which a PPI is detected to estimate the confidence of data.
HiPredict is a repository which contains filtered high-confidence PPIs of nine model species from IntAct, BioGRID, and HPRD. While calculating the confidence of PPIs, HiPredict considers (1) the type of experiments which detect the PPIs, (2) the co-expression correlation coefficients of proteins, (3) shared GO terms of proteins, (4) presences of interologs in the same organisms, and (5) domain-domain interactions between proteins. These five criteria are combined in naïve Bayesian networks to give confidence scores. STRING is one of the largest and most comprehensive PPI repositories. It evaluates PPIs using multiple criteria, including (1) the probability of finding the interacting proteins on the same KEGG pathway, (2) co-mentioning of gene/protein names in PubMed abstracts, (3) co-expression / co-regulation of proteins, (4) presence of interologs, and (5) (Brandao et al., 2009). Geisler- Lee (2007) predicts PPIs by identifying interologs. AtPIN calculates confidence scores of PPIs based on (1) the detected or predicted co-localization of interacting proteins and (2) the number of shared neighboring proteins of interacting proteins on the PPI network. It also provides the score calculated by AtPID. AtPID combines indirect evidences of interactions, including interologs, phylogenetic profiles, domain interactions, co-expression profile, shared protein functions, protein co-localisation, and gene fusion, in naïve Bayesian networks to predict and evaluate the PPIs of A. thaliana (Cui et al., 2008;Li et al., 2011). TAIR is a multi-tasking project which participates in a broad range of A. thaliana researches. The data of protein interactions between hosts and pathogens are scarce. PIG integrates the manually curated human-pathogen PPIs from four databases, BIND, IntAct, REACTOME, and MINT, in one platform for searching, visualization, and analysis of PPI networks. The corresponding hyperlinks to UniProt database, Gene Ontology, InterProScan, and PubMed are filed under each protein entry in the user interface for convenient referencing. Similar to PIG, HPIDB integrates several host-pathogen PPI databases, including BIND, IntAct, REACTOME, MINT, GENERIF, and PIG. However, unlike PIG, the PPIs in HPIDB are not limited to human host. Although the majority of data is for human (22386 PPIs), HPIDB also contains host-pathogen PPIs for mouse (147 PPIs), A. thaliana (99 PPIs), rat (53 PPIs), cattle (30 PPIs), and chicken (19 PPIs).
A few repositories collect genes which are involved in host-pathogen interactions, but do not contain data on physical protein interactions. PHIDIAS is a centralized respiratory for host-pathogen interactions. It collects information for 98 pathogens of two hosts, human, and mouse (Xiang et al., 2007). PHI-Base contains information for 405 fungal, oomycete, and bacterial genes which participate in pathogenicity, virulence, and induction of disease resistance Winnenburg et al., 2006). 176 of these genes are from animal pathogens, 227 from plant pathogens, and 3 from pathogens of fungi. PathoPlant contains A. thaliana genes which are responsible in the defense against pathogens (Bulow et al., 2007).

Identification of drug targets within human-pathogen interactions network
The evolutionary history of human has never been parted with pathogens. Viruses, bacteria, fungi, and nematodes all play critical roles in shaping the human race. Recent advances in metagenomics and human microbiomes suggest that commensal microorganisms have significant influences to the metabolism, immune systems, general wellbeing, and even behaviour patterns of animal hosts. Despite enormous efforts in preventing, diagnosing, and treating infectious diseases, pathogens still cause insurmountable burden and social-economical impacts to human. The developments of vaccines and drugs have helped to diminish several devastating diseases; however, emerging diseases caused by novel or previously unknown pathogens continuously lead to unexpected outbreaks. To account for current and future threats imposed by pathogens, it is necessary to understand human-pathogen interactions at the molecular level. Viruses require host factors for recognition, entrance, replication, and release. Their gene products form dense interaction networks with host proteins. Most bacteria, fungi, and nematodes, on the other hand, proliferate outside of human cells and interact with host cells with extracellular signals and receptors. The following sections of this chapter review previous works on high-throughput characterization of humanpathogen interactions interactions, mostly between human and viruses. Most works have focused on human-virus interactions.

Human-virus interactions
High-throughput characterization of intra-species interactions has been the focus of early day PPI studies. Inter-species interactions still constitute a minor part of most interactome databases. Beginning from 2007, several works on high-throughput human-virus interactions and host factor characterization have been published, including the ones for Epstein-Barr virus (EBV), hepatitis C virus (HCV), and influenza virus. Among these sparse inter-species interactions, those between human immunodeficiency virus 1 (HIV-1) and human are most abundant due to the research efforts devoted to this notorious virus. These datasets are summarized in Table 4. Among these datasets, the HIV-1 human protein interaction database is so far the most comprehensive in terms of recorded interaction number and annotations. The number of human-virus PPIs can be estimated based on the number of human-HIV PPIs, which presumably have not been fully exposed, and the number of human viruses. A severe under-estimated number of human viruses is 200 ~ 1 000 species, which can be deduced to at least 1 ~ 5 million human-virus PPIs yet to be discovered. Despite the small number of human-virus PPIs being detected or predicted, this data is a start point to the research of the viral disease mechanisms and treatments.

HIV-1 interactions
The HIV-1, Human Protein Interaction Database (Pinney et al., 2009) was compiled by National Institute of Allergy and Infectious Diseases (NIAID), and hosted by NCBI. Interaction data in this database was collected from published literatures. Unlike other interaction data, entries in this dataset were associated with detailed annotations, including PubMed ID list for references, short phrases describing the interactions, and texts excerpted from the source literature. Interactions in this database are not just revealed by conventional Y2H or immune-co-precipitation, but 70 interactions were annotated with details. For example, the statement "HIV retropepsin cleaves human actin" is supported by four publications and attached with descriptions of the HIV retropepsin and human actin. Occassionally, the texts from the source literatures would provide additional information. In the case of "HIV retropepsin cleaves human alpha-2-macroglobulin precursor", the GeneRIF text states "the cleavage site of alpha 2-Macroglobulin by HIV-1 protease is the Phe684-Tyr685 bond", which depicts the interaction (cleavage) site. Interaction types include cleavage, binding, regulation/modulation, and post-translational modifications. Analysis of this database found that there were 21 HIV gene products interacting with 1433 human proteins. The top 10 HIV and human proteins which participate in most HIV-human interactions are listed in Table 5. By simply counting the numbers of PPIs in Table 5, critical host factors in HIV infections could be identified. The C-C chemokine receptor type 5 (CCR5) variants have been implicated in HIV-resistance and immunity (Blanpain et al., 2002). Stem cell-based gene therapy has successfully "cured" HIV with this genetic variant in early phase clinical trials (Symonds et al., 2010). Some host factors were also involved in various types of processes and diseases, such as tumour necrosis factor (TNF), which regulates cell proliferation, apoptosis, and has been implicated in cancer. Cyclin-dependent protein kinase 9 30 Table 5. HIV and human proteins participate in largest numbers of human-HIV interactions.
Table 5 also suggests that Tat could be a potential drug target. The crystal structure of Tat which forms complex with cyclin-denpendent protein kinase 9 (CDK9) and cyclin T1 has been solved (Tahirov et al., 2010). The complex structure (PDB ID: 3MI9) reveals that the most part of Tat has physical contact with cyclin T1 and has only a small loop contacting CDK9. The structural information provides valuable insights to the design of Tat inhibitors.

Epstein-Barr virus interactions
Epstein-Barr virus (EBV) infects human epithelial cells, and is implicated in various types of cancer, such as Burkitt's lymphoma and nasopharyngeal carcinoma. The interactions within EBV proteins, and between EBV and human proteins, have been characterized using Y2H method (Calderwood et al., 2007). Overall, 43 EBV-EBV and 173 human-EBV interactions have been validated with experimental evidences. Network analysis reveals that most EBV-EBV interactions take place among conserved "core" proteins, thus these interactions may be responsible for the general infection/replication of herpesviruses. On the other hand, most human proteins targeted by EBV are proposed as "hub" proteins, which participate in more human-human interactions and may have crucial roles in the underlying biological processes. The EBV protein targeting most human proteins is BFLF2, with 21 interaction partners. BFLF2 interacts with BFRF1 and changes cellular localization (Gonnella et al., 2005). Deletion of BFLF2 also impairs viral DNA packaging (Granato et al., 2008). The most targeted human proteins are HOMER3 and GRN. HOMER3, which binds to numerous receptors, is involved in diverse biological functions such as neuronal signalling and T-cell activation. Granulin (GRN) is a secreted glycosylated peptide which regulates cell growth and implicates in wound healing and tumorigenesis. BFLF2 interacts with both HOMER3 and GRN; however, the functional implications of interactions were not clear.

Hepatitis C virus interactions
Hepatitis C virus (HCV) is the pathogen which causes the chronic hepatitis infection. Infection with HCV may lead to cirrhosis and hepatocarcinoma if not properly treated with www.intechopen.com antiviral drugs or interferon. Unfortunately, current HCV treatments are expensive and can have severe adverse effects. The human-HCV interaction map would allow us to understand the mechanisms of HCV infection and its chronic nature. The human-HCV interaction network is constituted by 481 HCV-human interactions (de Chassey et al., 2008). Among these interactions, 314 were determined with Y2H experiments, and others were identified from literature reviews. The most connected HCV proteins include NS3, NS5A, and CORE. Human proteins targeted by most HCV proteins include nuclear receptor subfamily 4, group A, member 1 (NR4A1), homeobox D8 (HOXD8), and SET domain containing 2 (SETD2). NR4A1 is a nuclear transcription factor, which is highly expressed in adrenal cortex, lung, and prostate; however its expression level in liver is low. HOXD8 is important to development; its deletion leads to limb deformation. Expression level of HOXD8 is highest in kidney. SETD2 is a histone methyltransferase and also contains transcription activation domain. Recently, SETD2 has been found as a tumour suppressor gene (Duns et al., 2010). However, the roles of HCV-SETD2 interactions in tumorigenesis remain elusive. The analyses of EBV-human and HCV-human interaction networks found that viral proteins tend to interact with "hubs" in human protein-protein interaction networks. In human proteins targeted by HCV, three KEGG pathways were significantly enriched, including insulin signalling pathway, TGF signalling pathway, and Jak-STAT signalling pathway (de Chassey et al., 2008). Also, "focal adhesion" pathway has been identified as a novel pathway targeted by HCV. In our own analysis using bootstrap to estimate the statistical significance of HCV targeted gene numbers, we have also identified that "focal adhesion" and "ECMreceptor interaction" pathways may be perturbed by HCV infection (

Influenza virus host factors
Influenza A virus causes epidemics every now and then. The high transmission rate of influenza virus makes it one of the greatest threats to public health, especially when long diminished strains or emerging strains turned to the surface. The rapidly evolving virus makes it difficult to predict and prepare seasonal vaccines. Drug-resistant strains also challenge our ability to treat and control the disease. The identification of host factors required by influenza virus may contribute to the prevention and treatment of the virus. Host factors involved in early stage influenza virus replication have been characterized with genome-wide RNA interference (RNAi) screening (Konig et al., 2010). Unlike Y2H experiments, host factors identified with RNAi do not necessary interact with viral proteins directly. Nevertheless, these findings imply that viral diseases may be treated by regulating some of these host factors. One example is the inhibitor for the host factors, CAMK2B, which impedes viral growth and may be developed to new antiviral drugs.

Human-bacteria interactions
Bacteria cells can reproduce without the cellular machinery of hosts. Studies on humanbacteria interactions thus have been focused on cellular-level interactions. So far, only limited efforts have been devoted to the identification of human-bacteria interactions at the molecular-level. The interactions between three pathogenic bacteria, Bacillus anthracis, Francisella tularensis, and Yersinia pestis, have been characterized using highthroughput Y2H experiments (Dyer et al., 2010). The reported dataset includes 3,073, 1,383, and 4,059 interactions between human or B. anthracis, F. tularensis or Y. pestis, respectively. The topology of these human-bacteria and human protein-protein interaction networks revealed that many bacteria proteins target "hubs" in human PPI networks. Specifically, several host defence pathways have been identified, including innate immunity and inflammation. Comparative analysis of the three human-pathogen interaction networks also confirmed these findings. Several methods have been used to identify the conserved protein interaction modules (CPIM), and found that these bacteria may have interfered host innate immune responses, including antigen binding and processing, and several immune response pathways. Analysis of these interactions faces some obstacles. Large proportions of the bacteria proteins (285/943, 66/349, 630/1,218 protiens for B. anthracis, F. tularensis, and Y. pestis, respectively) which interact with human proteins are putative, hypothetical, or uncharacterized. Without sufficient functional annotations, interpretations of these interactions can be superficial. Furthermore, bacterial proteins and human proteins were confined within the membrane of respective cells, and only certain types of proteins can be exported or internalized by host cells. Thus, the annotations or predictions of protein subcellular localization are important tasks for the interpretation and refinements of humanbacteria interaction networks.

Systematic analysis of host-pathogen interactions
Human-pathogen interactions have been collected and analyzed for their network properties (Dyer et al., 2008). Pooling together human-pathogen interactions should enable identification of common targets and biological processes perturbed by pathogens. A total of 10477 interactions between human and the 190 pathogen strains have been collected from several public databases. Networks for human-bacteria and human-virus interactions have been constructed separately. Most of the interactions were human-virus interactions, notably human-HIV interactions. Special attention has been paid to human proteins which interact with multiple pathogen groups. Such proteins are believed to be the common targets of these pathogens, and may be the highlights of critical events during pathogen invasion. The analysis of human proteins targeted by multiple viral pathogens have revealed that viruses perturb host cells mainly through controlling cell cycle, regulating apoptosis, and transporting viral particles across membrane (Dyer et al., 2008). Human-bacteria interactions, on the other hand, perturb Gene Ontology processes like "immune system process" and "immune response". It is notable that much of these perturbed pathways were linked to inflammatory and cancer, suggesting multiple roles of pathogens in various diseases.
Another analysis on human-viral interaction network also highlighted the mechanisms of non-infectious diseases (Navratil et al., 2011). Totally 2,099 manually curated interactions of 416 viral proteins from 110 species have been collected. This human-virus interaction network has been integrated with human PPI network. Disease gene annotations from OMIM have been evaluated for their associations with viral proteins. Links between virus and auto-immune diseases have been found, including type 1 diabetes. A comparison between human-virus interaction network and human type I interferon network also revealed that viruses attack host at multiple levels, from receptors to transcription factors (Navratil et al., 2010). We have also performed similar analysis with human-virus interactions collected from NCBI interactions 2 , IntAct (Aranda et al., 2009), and other sources. The association of KEGG (Kanehisa et al., 2010) disease pathways and human-virus interactions have been analyzed. Several KEGG pathways have been identified with high significance, including "systemic lupus erythematosus", "pathways in cancer", "chemokine signalling pathway", "focal adhesion", and "T cell receptor signalling pathway". These findings are in par with studies described in previous sections; all pointing to pathogens gain their foothold in host cells through modulating host defence mechanisms. In the meantime, inflammation, autoimmune diseases and cancers may arise as results of these modulations.

Conclusion
At the present, the number of confident PPI data is scarce, especially for non-human organisms and inter-species interactions. The prediction of PPIs, as well as the evaluation of accuracy of detected and predicted PPIs, are important topics which require further advances in methodology, tools and data generation. It is believed that, in recent years, as the second generation sequencing (SGS) rapidly discloses full genome sequences and exponentially accumulates high-throughput expression data, more and more inter-and intra-species networks PPI will be constructed for, not only model organisms, but also crops, biofuel producing algae and bacteria, between host-pathogens, and between symbiotic organisms. Whereas some â€oemicroarrayâ€ or â€oebioinformaticsâ€ scientists among us may have been criticized as doing â€oecataloging researchâ€ , the majority of us believe that we are sincerely exploring new scientific and technological systems to benefit human health, human food and animal feed production, and environmental protections. Indeed, we are humbled by the complexity, extent and beauty of cross-talks in various biological systems; on the other hand, we are becoming more educated and are able to start addressing honestly and skillfully the various important issues concerning translational medicine, global agriculture, and the environment. The two volumes of this book presents a series of high-quality research or review articles in a timely fashion to this emerging research field of our scientific community.