HIV-1 Selectively Integrates Into Host DNA In Vitro

The biochemistry of retroviral integration selectivity is not fully understood. We modified the previously reported in vitro integration reaction protocol and developed a novel reaction system with higher efficiency. We used a DNA target composed of a repeat sequence DNA, 5’-(GTCCCTTCCCAGT)6(ACTGGGAAGGGAC) 6-3’, that was ligated into a circular plasmid. Target DNA was reacted with a pre-integration (PI) complex that was formed by incubation of the end cDNA of the HIV-1 genome and recombinant integrase. It was confirmed that integration selectively occurred in the middle segment of the repeat sequence. On the other hand, both frequency and selectivity of integration markedly decreased when target sequences were used in which CAGT bases in the middle position of the original target sequence were deleted. Moreover, upon incubation with a combination of these deleted DNAs and the original sequence, the integration efficiency and selectivity towards the original target sequence were significantly reduced, which indicated interference effects by the deleted sequence DNAs. Efficiency and selectivity were also found to vary with changes in the manganese dichloride concentration of the reaction buffer, probably due to induction of fluctuation in the secondary structure of the substrate DNA. Such fluctuation may generate structural isomers that are favorable for selective integration into the target sequence DNA. In conclusion, there is considerable selectivity in HIV-integration into the specified target sequence. The present in vitro integration system will therefore be useful for monitoring viral integration activity or for testing of integrase inhibitors.


Summary
The biochemistry of retroviral integration selectivity is not fully understood. We modified the previously reported in vitro integration reaction protocol and developed a novel reaction system with higher efficiency. We used a DNA target composed of a repeat sequence DNA, 5'-(GTCCCTTCCCAGT) 6 (ACTGGGAAGGGAC) 6 -3', that was ligated into a circular plasmid. Target DNA was reacted with a pre-integration (PI) complex that was formed by incubation of the end cDNA of the HIV-1 genome and recombinant integrase. It was confirmed that integration selectively occurred in the middle segment of the repeat sequence. On the other hand, both frequency and selectivity of integration markedly decreased when target sequences were used in which CAGT bases in the middle position of the original target sequence were deleted. Moreover, upon incubation with a combination of these deleted DNAs and the original sequence, the integration efficiency and selectivity towards the original target sequence were significantly reduced, which indicated interference effects by the deleted sequence DNAs. Efficiency and selectivity were also found to vary with changes in the manganese dichloride concentration of the reaction buffer, probably due to induction of fluctuation in the secondary structure of the substrate DNA. Such fluctuation may generate structural isomers that are favorable for selective integration into the target sequence DNA. In conclusion, there is considerable selectivity in HIV-integration into the specified target sequence. The present in vitro integration system will therefore be useful for monitoring viral integration activity or for testing of integrase inhibitors.

Background
Retroviral integration into host DNA is a critical step in the viral life cycle. Once integrated, the proviral genome will be stably duplicated along with the host cellular DNA duplication and will be transmitted to the daughter cells. Retroviruses can thus serve as powerful tools for the integration of foreign genes into a host genome (Fig. 1), and an MLV vector can be used to examine the function of introduced genes and the development of induced pluripotent stem (iPS) cells [1]. However, integrated retroviral genomes also have the potential to cause unexpected transformation through up-regulation of target genes by retroviral promoter elements located at the long terminal repeat (LTR) following integration [2]. Although After the HIV-1 RNA genome has been transcribed into double-stranded DNA, the viral protein integrase binds to the termini of the viral DNA ends in a tetrameric fashion and the integrase creates overlapping 5'-ends by removing two nucleotides from the 3'-ends (3'endprocessing). The HIV-1 DNA and the host cell DNA are ligated by synthesis of phosphodiester bonds between the terminal nucleotides of the viral 3'-ends and overlapping 5'-ends of the host chromosome. The non-homologous 5'-ends from the viral DNA are removed by integrase. Finally, the gaps are filled up by host cellular repair proteins, which recognize single strand breaks. A five base-repeat is observed in the flanking sequence after the gap repair reaction. integration events have long been considered to be random, several recent findings have shown that integration of the murine leukemia retrovirus (MLV) and of HIV-1 is detected more frequently in actively transcribed genes [3,4] or in promoter regions [5]. Previous statistical studies have also demonstrated that weak palindromic sequences are a common feature of the sites targeted for retroviral integration [6,7]. Similar target preferences have also been reported for human T cell leukemia retrovirus type I (HTLV-I) integration sites [8].
Because of these findings, we investigated the biophysical mechanisms underlying in vitro integration. In the present study, we aimed to establish an in vitro integration assay using retroviral cDNA and integrase. Yoshinaga et al. previously reported the development of an in vitro integration assay using recombinant HIV-1 integrase, and short viral and target DNA sequences [9]. Using this method, they successfully detected retroviral cDNA-target DNA complexes in vitro and reported that the dinucleotide motif 5'-CA that is located at the proviral genome termini was essential for HIV-1 integration. Yoshinaga et al. called these dinucleotides the integration signal sequence. We modified Yoshinaga's method of in vitro integration in order to identify the precise HIV-1 integration sites using a target DNA that corresponds to an actual gene sequence.
In our previous study, we identified a common MLV integration site within the signal transducer and activators of transcription 5a (Stat5a) gene in MLV-induced spontaneous murine lymphoma in an inbred strain of mice, SL/Kh [10]. This is the first report of MLV integration target sequence. It has also been previously demonstrated that the Stat5a gene represents one of the common integration sites of MLV (the Mouse Retrovirus Tagged Cancer Gene Database (MRTCGD) (http://rtcgd.ncifcrf.gov/cgi-bin/mm7/easy_search.cgi) [11]. The encoded STAT5A protein is a transcription factor that is known to play an essential role in the development of myelo-and lympho-proliferative disease [12,13]. In the current study, we modified this Stat5a gene sequence for use as a target for HIV-1 integration in vitro. This target gene consists of a 5'-CA-rich sequence, which may provide a useful clue for preparation of target DNA sequences, because terminal CA dinucleotide motifs are shared by MLV and HIV-1, as well as by HTLV-I proviruses.

In vitro integration assay used in previous studies
In many previous in vitro integration protocols, the double stranded DNA of the HIV-1 3'-LTR proviral end alone and the substrate DNA are mixed in an appropriate buffer containing MnCl 2 [14] (Fig. 2, 3). Subsequently, a PCR reaction using a primer set targeted to the proviral Fig. 2. Scheme of the previously described incubation with recombinant integrase www.intechopen.com DNA and the target DNA amplifies a DNA segment that includes the viral-host DNA junction. In some protocols, even the viral DNA itself is used as the target DNA. Thus, previous studies paid little attention to the target sequence. It is commonly known that integrase binds to the proviral DNA in a regular tetrameric fashion. Indeed, some sequence motifs should be favored by an integrase oligomer, because a dimeric transcriptional factor protein has the ability to bind to palindromic sequence motifs such as E-box and GAS elements.
The 3' end of the HIV-1 LTR sequence is shown. Red letters indicate the conserved dinucleotide motif. Incubation of the 3' end of the HIV-1 LTR sequence DNA with integrase results in processing of the end of the 5'-GT dinucleotide. The resulting exposed hydroxyl group then attacks the target DNA.

5' AAAATCTAGCA -3' 3'
TTTTAGATCGTCA-5' A nick is introduced into the host DNA that is attached to a CA dinucleotide in the HIV-1 DNA end. Red letters indicate the conserved dinucleotide motif. Arrows indicate the PCR primer set and the direction of DNA polymerization.

Preparation of target sequences for in vitro integration
We recently reported the target sequence of MLV integration. We developed an inbred strain of mice suffering from spontaneous B cell lymphoma by MLV integration. MLV integration into Stat5a was identified in 25% of the lymphoma genome [15,16] (Fig. 4). As depicted in Fig. 4, the hot spot of integration included a 5'-CA-rich sequence as well as a palindromic motif. Downward facing arrows in the figure indicated the MLV integration sites. The abundance of 5'-CA dinucleotides in the integration hot spots provided us with a hint for the preparation of target DNA for HIV integration, because these motifs are shared by the genome ends of MLV and HIV-1 (Fig. 5). We hypothesized that HIV-1 DNA also favors such a 5'-CA-rich sequence motif.  A box indicate the integration signal sequence reported by Yoshinaga et al (9).

Modified In vitro integration assay
We then prepared a target sequence for HIV-1 integration. A repeat sequence was prepared in order to enhance integration efficiency. We used the repeat sequence, 5'www.intechopen.com (GTCCCTTCCCAGT) 6 (ACTGGGAAGGGAC) 6 -3', or a modification of this sequence, which was ligated into a circular plasmid. The sequence within parenthesis is the unit of the repeat. This target sequence includes the 5'-CA dinucleotide motif, and includes 5'-AC at the HIV-1 DNA termini (Fig. 5). 5'-CAGT and 5'-ACTG (shown in italics in the above sequence) in the repeat units are also present in the HIV-1 proviral genome ends. This target DNA was reacted with recombinant integrase and formed a pre-integration (PI) complex. Figures 6  and 7 show our scheme of in vitro integration as well as the sequences of the HIV-1 proviral 5'-and 3'-ends. Following incubation of the proviral LTR sequence DNAs with recombinant integrase, the resultant pre-integration complexes were reacted with the target DNA. PCR amplification was performed and the integration sites were analyzed by direct sequencing. Unlike previously reported protocols, we used both 5'-and 3'-LTR sequences in our protocol. Such a target sequence unit was expected to directly interact with complementary HIV-1 DNA end sequences present in the target DNA. Complementarity between HIV-1 DNA and host DNA is shown in Fig. 8. The red segment in the target sequence DNA that includes a circular plasmid represents the 144-bp target DNA, and the black line represents the remainder of the circular plasmid DNA used for ligation. Following incubation of the proviral LTR sequence DNAs with integrase, the resultant pre-integration complexes were reacted with the substrate DNA. Red letters in the HIV-1 cDNA represent the LTR termini. PCR amplification was performed using primers corresponding to regions in the proviral ends and a plasmid region. The integration sites were analyzed by direct sequencing. "Q" in the PCR product indicates the junction between the provirus and the target DNA and R&S represents the 3'-ends of the primer within the HIV-1 DNA and the plasmid DNA, respectively [14]. The top sequences show the termini of the HIV-1 provirus. The bottom sequence indicates the target sequence and highlights the dinucleotide motif CA/TG (red), and the AC bases (yellow) that are also present at the HIV-1 DNA termini. We prepared a repeat sequence in order to enhance integration efficiency. The sequence shown in parenthesis is the unit of the repeat. Our protocol differs from previous protocols in that we used both 5'-and 3'-LTR sequences rather than a single 3'-LTR DNA.  The repeat sequence unit in the target sequence was expected to directly interact with HIV-1 DNA. The dotted lines indicate complementarity between target and viral DNA. The dsDNA sequence at the bottom indicates the 5'-and 3'-ends of the proviral HIV-1 DNA.

Reaction protocol
The detailed reaction protocol that we used is as follows. First, 75 ng of the U5'-LTR cDNA sequence of HIV-1.; (+) 5'-TGT GTG CCC GTC TGT TGT GTG ACT CTG GTA ACT AGA  GAT CCT CAG ACC TTT TTG GTA GTG TGG AAA ATC TCT  After incubation, the double-stranded (ds) 5'-LTR DNA was combined with the ds 3'-LTR DNA for 1 h at 30 °C, and the LTR DNA was then further incubated with the target DNA for 1 h at 30°C. As controls, ds 5'-LTR DNA and ds 3'-LTR DNA were also individually incubated with the target DNA. For control target DNAs we synthesized four random 144-bp sequences, which were designed by a random number generator, and we ligated these sequences into circular DNA in the same manner as described below for the target DNA In order to prevent non-specific reactions at the target DNA sequence, we ligated the target sequence DNA into circular plasmid DNA (Invitrogen pCR2.1 TOPO vector) and used this entire DNA as the target DNA for the assay (Fig. 6). The proportion of LTRs and target DNAs was optimized to prevent both non-specific reactions and integration due to an excess of LTRs. The DNA reacted in the buffer was purified using a QIA quick column (QIAGEN, GmbH, Germany). PCR amplification was then performed using retroviral primers: the HIV-1 U5'-LTR primer, 5'-GTG TGC CCG TCT GTT GTG TGA CTCTGG-3', or the HIV-1 U3'-LTR primer, 5'-CTG GGA AGT AGC CTT GTG TGT TAT AG-3', and a TOPO vector primer 5'-TCA CTC ATG GTT ATG GCA GC -3' whose first nucleotide corresponds to nucleotide position 2222 in the TOPO-pCR2.1 plasmid (Invitrogen, Carlsberg, CA). Amplicon copy number was quantified following identification of the HIV-1-substrate DNA junction [14]. Figures 9 and 10 show the percentage of viral DNA integration into the target sequence or into the same length random sequences. Four types of the same length random sequences were used as controls. The horizontal blue line shows the percentage integration when uniform integration into the substrate DNA, including into the target DNA plus the circular plasmid, was thought to occur. These data indicate that the percentage of integration into the target sequence was significantly higher than that into random sequences. Also, when a target sequence was used in which the middle 5'-CA and 5'-GT nucleotides were deleted, the integration efficiency was significantly decreased. Thus, local nucleotide motifs within the target sequence affect integration efficiency. The percentage of PCR product copies derived from viral DNA that had integrated into the target sequence or into random sequences is plotted vs. the total number of PCR product copies, including the PCR products that were integrated into the remainder of the DNA sequence of the plasmid. The horizontal line shows the ratio of these PCR products when integration was thought to occur in a uniform manner in the 4-kb substrate DNA. The left arrows represents the percentage (~2.3%) of integration into target DNA to integration into control when integration was thought to occur in a uniform manner in the substrate DNA, the target sequence DNA plus the circular plasmid with which it was www.intechopen.com ligated. The integration efficiency was significantly decreased when CA and GT were removed from the middle region of the repeat. Thus, local nucleotide motifs affect integration efficiency (****P < 0.001). Figure 11 shows the in vitro integration site in the target repeat sequence DNA. The entire target sequence is shown in this figure. The vertical axis indicates the percentage of PCR amplicons derived from the integration of individual LTR units. Integration efficiency was significantly higher when both the 5'-and the 3'-LTR DNA were used than when either LTR DNA was used alone. The use of both 5'-and 3'-LTR DNA is one of the unique points of our protocol, since previous protocols used a single 3'-LTR DNA (Figs. 7 and 11).

5'-LTR 3'-LTR
Target DNA Fig. 11. In vitro integration site in the target repeat sequence DNA.
The vertical axis indicates the percentage of the PCR amplicons derived from proviral DNA integrating into individual units. The entire target sequence is shown. The integration efficiency was significantly higher when both 5'-and 3'-LTR DNA were used rather than when a single LTR DNA was used. The use of both 5'-and 3'-LTR DNA is one of the unique points of our protocol. x, GTGGAGGGCAGT; y, ACTGCCCCCAC. (*** P < 0.001) Interestingly, we found that the middle segment of the target sequence was more favorable for integration, even though the same sequence units were repeated in the target sequence.
To explain this observation, we considered the possibility that a structural factor may contribute to selective integration into the middle segment. Thus, if a single strand of the target DNA focally appeared by rewinding of the target double strand DNA, a long hairpin or cruciform structure may form in the target sequence site. It is probable that, if the target sequence DNA is open, or rewound, then the top of such a secondary structure would be favorable for integration. DNA folding thermodynamic analysis was performed to determine secondary structure in the target DNA and a hairpin structure was indeed predicted by this analysis (Fig. 12).

Decoy effect of modified target sequences
We prepared two modified DNA target sequences in which the 5'-CA and 5'-GT were removed from the repeat unit at the middle site, termed modified sequence I and II, respectively (Fig. 13). PCR analysis of in vitro integration into modified sequence I or II revealed significant reductions in the number of copies of the PCR products compared to integration into the unmodified sequence. In addition, integration selectivity was not evident when using the modified DNA sequences (P < 0.05). We next mixed substrate DNA containing the target sequence with substrate DNA containing modified sequence I or II in equal amounts, and examined the number of PCR product copies that originated from integration into the non-modified target sequence. Integration into the original, nonmodified target sequence of the substrate DNA was significantly reduced when this DNA was mixed with the modified sequences (Fig. 14).
Two modified DNA sequences were prepared in which CA and GT were removed from the repeat unit at the middle site, termed modified sequence I and modified sequence II respectively. Red letters represent the TG/CA motifs. Yellow letters represent the GT/AC motifs that are observed in the HIV-1 proviral genome.

Fig. 14. In vitro integration using modified sequence I or II
The result showed significant reductions in the number of copies of PCR products derived from integrated DNA. In addition, integration selectivity was evidently suppressed when using the modified DNA targets (left graph, *P < 0.05). Substrate DNA containing the target sequence was then mixed with substrate DNA containing modified sequence I or II in equal amounts, and the percentage of PCR product copies originating from integration into the original target sequence were determined. Integration into the original target sequence was significantly reduced when this target was mixed with the modified sequences (right graph, *P < 0.05).

Biochemistry of the integrase: DNA structure fluctuation enhances selective integration
We digested circular DNA with HIV-1 integrase in a buffer containing various concentrations of manganese dichloride and measured the band intensity of linearized DNA www.intechopen.com following electrophoresis. The relative band intensity increased when the concentration of MnCl 2 in the reaction buffer exceeded 40 mM (Fig. 15). This result raised the question of how such fluctuation in DNA structure influences the selectivity of in vitro HIV-1 integration. Furthermore, the percentage of integration into the target sequence DNA was found to increase significantly when the concentration of MnCl 2 exceeded 40 mM. The ratio of the PCR product number derived from integration into the target sequence DNA was also found to increase significantly when the MnCl 2 concentration exceeded 40 mM (Fig. 16).
In conclusion, such fluctuation may generate a favorable conformation of target DNA for integration of the HIV-1 LTR [16,17].

Conclusion
In conclusion, selective HIV-1 integration was proved at an in vitro level in this study. The factors that determine this selectivity are (i) a sequence motif, including CAGT, and (ii) a structural factor that can be induced by fluctuation of a high concentration of MnCl 2 [16,17]. The findings shown in Figs. 9 and 10 indicate that the percentage of integration into the target sequence was significantly greater than the integration rate into the random and deleted sequences. Moreover, the entire repeat sequence or secondary structure may be a target of integration.
In particular, our findings that sequences similar to the target DNA sequence interfere with integration (Fig. 14). Thus, a modified DNA can act as a decoy for the target DNA. In the present study, integration efficiency and selectivity were highly sensitive to MnCl 2 concentration in the reaction buffer. In particular, the integration efficiency and selectivity increased significantly when the MnCl 2 concentration was increased from 30 mM to 40 mM.
Fluctuations in the electrophoretic mobility of the substrate DNA also increased. These results suggest that there is a threshold concentration of MnCl 2 for in vitro integration, probably because MnCl 2 induces instability of secondary structure and therefore phase transition of the host DNA strand may occur. Target DNA can probably not generate the specified stable conformation under 40mM of MnCl 2 . Based on these data as well as the data shown in Fig. 15, we propose that there are close correlations between structural changes in substrate DNA and integration selectivity and efficiency (Fig. 16, 17). We have used MnCl 2 for studies of in vitro integration because this salt is more appropriate than other salts for the generation of in vivo integration. However, during in vivo integration into the host genome, numerous DNA binding proteins and metal ions regulate the reaction in a complex manner. Therefore, the present data cannot be immediately applied to in vivo systems and further investigation using cell culture systems are necessary. Nevertheless, this in vitro integration assay is expected to facilitate understanding of the pathogenicity of HIV-1.

Fig. 17. A model of integration
The top of the secondary structure may be favorable for integration when the target DNA sequence is open or rewound by protein binding to the upstream of the target DNA sequence.

Acknowledgments
This work was supported by a Grant-in-Aid for Cancer Research from the Ministry of Education, Culture, Sports, Science, and Technology, Japan, and a Grant for Strategic Research on Cancer from the Ministry of Health, Labor, and Welfare, Japan (No. 72602-010-A03-0001) (http://www.jsps.go.jp/j-grantsinaid/index.html). The funders had no role in study design, data collection or analysis, in the decision to publish, or in preparation of the manuscript. The method described in this manuscript has been registered as "Nucleic acid having retroviral integration target-activity," patent number 4631084 in Japan (GenBank, DD323298). In relation to this patent, we also declare no conflict of interest. We are grateful to Masakazu Hatanaka and Tomokazu Yoshinaga for their helpful advice and insightful comments regarding this manuscript. In particular, we are also grateful to Dr. Tasuku Honjo and Dr. Hiroshi Hiai (Kyoto University) for their review of this study and for providing critical advice, and to Miss Hiroko Saito for her excellent technical support.

References
[ Over the recent years, biochemistry has become responsible for explaining living processes such that many scientists in the life sciences from agronomy to medicine are engaged in biochemical research. This book contains an overview focusing on the research area of proteins, enzymes, cellular mechanisms and chemical compounds used in relevant approaches. The book deals with basic issues and some of the recent developments in biochemistry. Particular emphasis is devoted to both theoretical and experimental aspect of modern biochemistry. The primary target audience for the book includes students, researchers, biologists, chemists, chemical engineers and professionals who are interested in biochemistry, molecular biology and associated areas. The book is written by international scientists with expertise in protein biochemistry, enzymology, molecular biology and genetics many of which are active in biochemical and biomedical research. We hope that the book will enhance the knowledge of scientists in the complexities of some biochemical approaches; it will stimulate both professionals and students to dedicate part of their future research in understanding relevant mechanisms and applications of biochemistry.