Structure and Dynamics of Proteins from Nuclear Magnetic Resonance Spectroscopy

Nuclear Magnetic Resonance (NMR) spectroscopy has become an indispensable tool in the characterization of structure and dynamics of biological macromolecules such as proteins. In recent years, NMR spectroscopy has contributed to our understanding of protein biophysics, especially when considering time-averaged dynamical events spanning various time scales. It has also played an important role in structural genomics and protein structure initiatives. The main focus of this chapter is to introduce the NMR phenomenon by providing a broad description of NMR spectroscopy and its contribution to the characterization of structure and dynamics of proteins.


Introduction
Nuclear Magnetic Resonance (NMR) spectroscopy has become an indispensable tool in the characterization of structure and dynamics of biological macromolecules such as proteins. In recent years, NMR spectroscopy has contributed to our understanding of protein biophysics, especially when considering time-averaged dynamical events spanning various time scales. It has also played an important role in structural genomics and protein structure initiatives. The main focus of this chapter is to introduce the NMR phenomenon by providing a broad description of NMR spectroscopy and its contribution to the characterization of structure and dynamics of proteins.

Traditional experimental approaches to structure determination of proteins
Traditional experimental approaches to structure determination include X-ray crystallography and solution-state NMR spectroscopy. Presently, X-ray crystallography continues to be the most applied technique for the structural characterization of proteins and protein complexes at atomic resolution (Sali et al., 2003); as is evidenced by the Protein Data Bank (PDB), which currently reports approximately 87% of their structures as being acquired by this particular method (Berman et al., 2000). Crystallography begins with the expression and purification of the protein or proteins of interest. The subsequent step is to produce crystals of sufficient quality (at least 2.5Å), in order to obtain high-resolution data for structure determination (Liu & Hsu, 2005;Sali et al., 2003). Typical X-ray diffraction experiments require only a small single crystal sample (of a few micrometers) in order to physically interrupt the flow of X-rays from a source and cause them to scatter or diffract (Ooi, 2010). Diffracted X-rays are then identified by a detector (Ooi, 2010). The way in which X-rays are diffracted depends on the structure of the crystal; therefore the diffraction pattern that results is unique to each structure (Ooi, 2010). Data collected by the detector is then processed so as to create a visual image of the information. This allows for the direct inference of the types and arrangements of atoms, molecules and/or ions, as well as bond lengths and angles within the crystal (Ooi, 2010). Crystallization is often regarded as a slow and resource-intensive method (Liu & Hsu, 2005). What's more, since crystallization conditions cannot be predetermined, it is often necessary to screen a wide range of conditions related to pH, salt, protein concentration, and cofactors (Liu & Hsu, 2005;Sali et al., 2003). Although recent technologies allowing for the use of smaller sample volumes has www.intechopen.com Protein Structure 44 led to the automation of high-throughput crystallization, most structures will still require a great deal of time (sometimes as long as days or weeks) in order to produce a highresolution structure (Abola et al., 2000;Ooi, 2010;Sali et al., 2003). This includes the iterative process of refinement, in which the molecular model is continually compared to the experimental data utilizing statistical methods (Ooi, 2010). Except for the exceptionally rare case of well-ordered crystals from rigid molecules, disorder (which is commonly modeled as part of the refinement process) is a common phenomenon that occurs when some of the atoms in the structure adopt different orientations within different unit cells in the crystal (Ooi, 2010). Disorder may take on the form of discrete conformational sub-states for side chains or surface loops, or even small changes in the orientation of entire molecules throughout the crystal (Adams et al., 2003;Wilson & Brunger, 2000). In addition, crystal structures of multi-component systems and membrane proteins are still limited and refractory for structure determination (Liu & Hsu, 2005). Despite these limitations, X-ray crystallography represents a mature approach (Adams et al., 2003) and continues to be regarded as the 'gold standard' for structure determination (Sali et al., 2003).
The field of NMR, on the other hand, is still relatively young and constantly evolving (Markley et al., 2003). In fact only about 12% of the total structures deposited at the PDB are determined by NMR (Berman et al., 2000). Since NMR does not require crystals in order to produce a three-dimensional structure, samples appropriate for structure determination can be identified relatively quickly (Liu & Hsu, 2005). In addition, NMR experiments can be conducted in aqueous solutions under conditions that are physiologically similar to those in which the protein normally functions (Liu & Hsu, 2005;Montelione et al., 2000). Protein NMR methods have advanced to the point that small to medium sized protein domain structures may be determined rather routinely (Markley et al., 2003). Conventional protein NMR experiments first require successful expression and preparation (purification, isotopic labeling, sample concentration and stability) (Christendat et al., 2000;Markley et al., 2003). NMR methods resolve signals from 1 H, 15 N, and 13 C nuclei of a protein and assign them to specific nuclei in the structure of a molecule (Markley et al., 2003). The assigned chemical shifts are then able to provide reliable information, which reveal the secondary structure of the protein (Markley et al., 2003;Wishart & Nip, 1998;Wishart & Sykes, 1994;Wishart et al., 1991Wishart et al., , 1992. Similar to X-ray crystallography, refinement continues iteratively until a selfconsistent set of experimental constraints produces a collection of structures that also satisfies standard covalent geometry and steric overlap considerations (Markley et al., 2003). Additional structural restraints may be acquired by evaluating data from one or more different classes of NMR experiments. For example, NOE spectra provide 1 H-1 H distance constraints; three-bond J coupling experiments specify torsion angle restraints; and residual dipolar couplings from partially ordered proteins provide both distance and spatial constraints for pairs of coupled nuclei (Markley et al., 2003). NMR spectroscopy is capable of performing structural studies of small proteins which display any one of the following characteristics: partial disorder, multiple stable conformations, weak interactions with important ligands or cofactors, or do not crystallize readily (Markley et al., 2003). Moreover, it is a method that can also reveal critical information with regard to overall protein-folding, the existence of multiple-folded conformations, protein-ligand or protein-protein interactions, and even local dynamics (Markley et al., 2003). In fact, the clear advantage of NMR methods is that they are able to deliver the timescale of transitions (from picoseconds to seconds) at atomic resolution in steady-state conditions (Henzler-Wildman & Kern, 2007).

A brief introduction to NMR spectroscopy
NMR spectroscopy is a technique that relies on the magnetic precession that is observed when the nuclei of certain atoms are immersed in a magnetic field (Berg et al., 2002;Hornak, 1997). A limited number of isotopes display this property; the most biochemically common isotopes for experiments with proteins and nucleic acids include hydrogen-1 ( 1 H), carbon-13 ( 13 C), nitrogen-15 ( 15 N), and phosphorous-31 ( 31 P) (Berg et al., 2002). Protons, electrons, and neutrons possess quantum spin, which are described in multiples of ½ and can be either positive or negative (Hornak, 1997). For simplicity's sake we discuss NMR concepts using the hydrogen nucleus, which contains one proton, as an example. The spinning of a proton produces a magnetic moment, which can take on one of two orientations or spin states (referred to as  and ) when a magnetic field is applied (Berg et al., 2002). The energy difference between the two states is proportional to the strength of the magnetic field (Berg et al., 2002). Because the state is aligned with the field, it is slightly more populated and therefore has a slightly lower energy (Berg et al., 2002). By providing a pulse of electromagnetic radiation that corresponds to the energy difference between the and states, a spinning proton in the state can be raised to a or excited state, allowing a resonance to be acquired (Berg et al., 2002). A resonance spectrum can be obtained for any molecule either by altering the amount of electromagnetic radiation while the magnetic field stays constant, or by changing the magnetic field while the frequency of electromagnetic radiation remains constant (Berg et al., 2002). These properties can then be used to analyze the chemical environment of the hydrogen nucleus (Berg et al., 2002). The flow of electrons around a magnetic nucleus produces a small magnetic field which opposes that of the externally applied field (Berg et al., 2002). The electron density around each nucleus in a molecule varies as a result of the type of nuclei and bonds in the molecule (Hornak, 1997). Nuclei in different environments will also resonate at slightly different field strengths or radiation frequencies (Berg et al., 2002). Nuclei of a perturbed sample absorb electromagnetic radiation at a frequency which can be measured (Berg et al., 2002). The difference between the resonance frequency of the nucleus and a standard, relative to the standard is referred to as a chemical shift and is reported in parts per million (ppm, symbolized by ), usually with values between 0 and 9 (Berg et al., 2002;Hornak, 1997). For example, when using the water-soluble derivative of tetramethylsilane (TMS) as the standard compound, the chemical shift of a CH 3 proton usually exhibits a chemical shift of 1ppm compared to that of an aromatic proton which is typically 7ppm (Berg et al., 2002). The manner in which chemical shifts are calculated allows for NMR spectra, obtained using spectrometers at differing field strengths, to be compared (Hornak, 1997). Fig. 1 provides an example of a one-dimensional NMR spectrum of Galactose penta-acetate (C 16 H 22 O 11 ) with chemical shifts for the hydrogens clearly resolved. Nuclei experiencing the same chemical shift are referred to as equivalent, while those experiencing different environments or having different chemical shifts are considered nonequivalent (Hornak, 1997). Nuclei which are close to one another have an influence on each other's magnetic field; this effect, referred to as J coupling, is observable in the NMR spectrum when the nuclei are nonequivalent and their distance is less than or equal to three bond lengths (Hornak, 1997).
Utilizing the one-dimensional NMR technique, it is possible to resolve most protons for a few proteins; using the obtained information, we may then deduce changes to a particular chemical group under different conditions (Berg et al., 2002). However, in instances where one-dimensional NMR spectra are far too complex for interpretation due to the overlapping of signals (refer to Fig. 2), the introduction of additional spectral dimensions is not only helpful but necessary for resolving individual resonances in larger proteins.   In order to analyze NMR data, it is important to establish which chemical shift corresponds to which atom. This task is often referred to as resonance assignment, and is dependent upon the protein being isotopically labeled. Standard methods usually begin with twodimensional 1 H-15 N Heteronuclear Single Quantum Coherence (2D HSQC) experiments, serving as the initial reference spectrum for signal identification (Markley et al., 2003). In this experiment, magnetization is transferred from the Hydrogen attached to 15 N via J coupling; the chemical shift is then evolved on the Nitrogen with the magnetization being transferred back to the Hydrogen for detection (Cavanagh et al., 1995). This particular experiment, illustrated in Fig. 3, reveals all 1 H-15 N correlations, which are mainly backbone amide groups (Cavanagh et al., 1995). From this experiment, one can determine whether other experiments would be useful before spending the time and resources required for their implementation. In cases where significant degeneracy is present in the 2D HSQC, threedimensional spectrum (such as HNCO or HNCA) may prove useful in resolving spin systems which overlap (Markley et al., 2003). The HNCO experiment can be used to predict secondary structure, and does so by correlating the amide 1 H and 15 N chemical shifts of one residue with the 13 CO chemical shift of the preceding residue (Grzesiek & Bax, 1993;Kay et al., 1990;Muhandiram & Kay, 1994). Here, magnetization is passed from 1 H to 15 N and to the 13 C by way of the 15 N _13 CO J coupling and then passed back via 15 N to 1 H for detection (refer to Fig. 4a); the chemical shift is evolved on all 3 nuclei which results in a threedimensional spectrum (Grzesiek & Bax, 1993;Kay et al., 1990;Muhandiram & Kay, 1994). HNCA, on the other hand, correlates the intraresidue 13 C chemical shift with the amide 1 H and 15 N shifts (Farmer et al., 1992;Grzesiek & Bax, 1993;Kay et al., 1990). For this particular experiment, magnetization is passed from 1 H to 15 N and to 13 C via the 15 N-13 C J coupling and then passed back to the 15 N and 1 H hydrogen for detection as demonstrated in Fig. 4b (Farmer et al., 1992;Grzesiek & Bax, 1993;Kay et al., 1990). In addition, this experiment provides sequential connectivities by transferring the magnetic coherence from 15 N to 13 C of the previous amino acid (Cavanagh et al., 1995). Because the amide Nitrogen is coupled to both the C of its own residue and that of the preceding residue, peaks for both C 's will be visible in the spectrum; peaks with a greater intensity, usually correspond to the C 's that are directly bonded to the amide Nitrogen (Farmer et al., 1992;Grzesiek & Bax, 1993;Kay et al., 1990). The chemical shift is then evolved for 1 H, 15 N and 13 C , resulting in a 3dimensional spectrum (Farmer et al., 1992;Grzesiek & Bax, 1993;Kay et al., 1990).

A brief discussion of Nuclear Overhauser Effect (NOE) and its implication in modern NMR spectroscopy
Even more structural information can be acquired by examining how the spins of different protons affect neighboring protons (Berg et al., 2002). This is possible through the Nuclear Overhauser Effect (NOE) observed by NMR spectroscopy, which states that an interaction between nuclei is proportional to the inverse sixth power of the distance between them (Berg et al., 2002). Therefore, the distance between nuclei is determined according to the intensity of the peak. By inducing a transient magnetization in a sample through radiofrequency pulse, it is possible to both alter the spin of one nucleus and examine the effect on the spin of a neighboring nucleus (Berg et al., 2002). NOE differs from J coupling in that it identifies pairs of protons that are within close proximity relative to the protein's threedimensional structure, even if they are not close together with regard to the primary sequence (Berg et al., 2002). J coupling, on the other hand, is only observed when atoms are connected by 2 to 3 covalent bonds, as mentioned in section three. The two-dimensional spectrum acquired by Nuclear Overhauser Enhancement SpectroscopY (NOESY) graphically displays pairs of protons that are close in proximity within the threedimensional structure of the protein (Berg et al., 2002). As long as nuclei are within ~5Å, the magnetization from an excited nucleus is transferred to that of an unexcited nucleus (Berg et al., 2002). Fig. 5 provides an example of a one-dimensional NOESY spectrum. The diagonal corresponds to a one-dimensional spectrum, whereas the off-diagonal peaks identify the pairs of protons that are within 5Å of each other (Fig. 5). Similarly, in a two-dimensional NOESY spectrum, off-diagonal peaks reveal short proton-proton distances (refer to Fig. 6). Fig. 5. Example of a one-dimensional NOESY spectrum. The five diagonal peaks correspond to the five protons in the image to the left.. The peaks above the diagonal and the symmetrically related one below reveal that proton H18 is close to proton H3. Three-dimensional protein structures may be calculated nearly uniquely if a sufficient number of distance constraints are applied, and are reconstructed such that proton pairs identified from NOESY spectra are close to one another in the three-dimensional structure (Berg et al., 2002). Families of related structures may also be generated in cases where: not enough constraints are experimentally accessible to fully describe the structure; distances obtained from NOESY analysis are only approximate; as opposed to utilizing a single molecule, experimental observations are made on a number of molecules in solution which may have slightly different structures at any given moment (Berg et al., 2002). It is important to note that the efficient and accurate assignment of NOEs for structure determination is highly dependent upon the completeness and precision of the chemical shift assignments (Markley et al., 2003).

NMR approaches for the study of internal dynamics
The investigation of protein dynamics relies mostly upon the use of NMR techniques. This is due to the fact that biological functions span a range of timescales for which various NMR experiments are sensitive. Here we briefly introduce a number of NMR methods, and summarize for each the timescales for which they are capable of acquiring experimental data (refer to Fig. 7). Real-Time NMR (RT NMR) -encompasses slower dynamic processes in the seconds range (Kleckner & Foster, 2011) and was originally developed as a method to follow protein folding, combining the availability of high resolution data with kinetics experiments to allow for detailed examination of protein structure during different steps of the folding process (Zeeb & Balbach, 2004). The experiment consists of physically initiating a process of interest and subsequently acquiring a sequence of NMR spectra, which typically demonstrate a progressive weakening of an initial set of signals along with a gradual strengthening of a new set of signals -the result of time-dependent changes from differing conformations and or local structures (Kleckner & Foster, 2011).
EXchange SpectroscopY (EXSY) -is used to quantify exchange kinetics for dynamic processes in the 10 millisecond to 5 second timescale (Kleckner & Foster, 2011) and encompasses slow conformational changes such as domain movements (Key et al., 2009) and ligand binding (Demers & Mittermaier, 2009). In this experiment, kinetic information is obtained from a quantitative analysis of the magnetization transfer and spectral broadening that results from the exchange between bound and free states in a partially ligand-saturated sample (Demers & Mittermaier, 2009).
Lineshape Analysis -is another approach reporting exchange events that take place roughly between 10 milliseconds and 0.1 seconds (Kleckner & Foster, 2011). In this experiment, chemical exchange processes are identified by characteristic changes of the NMR line shape (Jeener et al., 1979), which are typically the result of spectra acquired along a titration coordinate (in the form of ligand concentration, temperature or pH for example) allowing for the observation of their incremental effect upon the NMR observables (Kleckner & Foster, 2011).

Carr-Purcell Meiboom-Gill Relaxation Dispersion (CPMG RD)
-is a technique used to obtain kinetic, thermodynamic, and structural information and applies to exchange processes occurring in the 0.3 to 10 millisecond time frame (Kleckner & Foster, 2011;Loria et al., 2008). The purpose of this particular experiment is to use a series of spin echo pulses to transverse magnetization during a relaxation delay in order to refocus exchange broadening (Kleckner & Foster, 2011).

Rotating Frame Relaxation Dispersion (RF RD)
-may be used to study exchange processes that occur within the 20 to 100 microsecond range (Kleckner & Foster, 2011;Loria et al., 2008). This particular experiment is very similar to that of CPMG, the only difference is in the range of the spin echo pulses used (25-1200 Hz for CPMG and 1-50kHz for RFRD), this allows RF RD to study exchange events via the same principles as CPMG, but on a faster timescale (Kleckner & Foster, 2011).
Paramagnetic Relaxation Enhancement (PRE) -is a method which allows for the study of protein dynamics within the microsecond timescale and is most appropriate for the examination of non-specific interactions and complexes between binding partners (Clore & Iwahara, 2009;Kleckner & Foster, 2011). This approach results from the identification of the magnetic dipole interaction between a nucleus and an unpaired electron (Kleckner & Foster, 2011).
Nuclear Spin Relaxation (NSR) -may be used to study the details of protein dynamics on fast timescales (picoseconds to nanoseconds) and is made possible by the presence of NMR probes throughout the molecule (Case, 2001). This experiment is based on the weak coupling between spin variables and molecular motion which are then manifested into much slower relaxation of the spins which can be readily studied (Case, 2001).
Full characterization of inter-molecular dynamics has been limited to NMR spectroscopic study of the protein (or complex of proteins) of interest and are typically performed in separate steps -with the protein's structure determined first under the assumption of www.intechopen.com Protein Structure 52 rigidity, and its motion characterized later. Although structure determination protocols based on the assumption of molecular rigidity will produce a single structure, the degree of similarity between the static model of a protein structure and the many conformations of the dynamic model is not always clear and is poorly investigated. Another source of data capable of studying both structure and dynamics on timescales ranging from picoseconds to microseconds is that of Residual Dipolar Couplings or RDCs, which we discuss in further detail in the section that follows.

Emerging methods in simultaneous study of structure and dynamics of proteins
Recent advances in NMR spectroscopy have enabled the acquisition and analysis of data other than the traditional distance constraints. These new sources of data include Residual Dipolar Couplings (RDC), Pseudo Contact Shifts (PCS), and Paramagnetic Relaxation Enhancement (PRE), which provide orientational restraints as well as distance restraints. The introduction of orientational restraints has produced a shift in paradigm of structure determination that has necessitated alternative approaches to the analysis of NMR data. In this section the utility of orientational restraints has been discussed with references to the software development track that has been the subject of additional investigations. Finally, the focus of this section will be aimed at RDC data.
Historically, the use of RDCs has been limited by two factors: data acquisition, and data analysis. The introduction of a variety of alignment media, combined with advances in instrumentation and data acquisition have mitigated the experimental limitations in obtaining RDCs. The major bottleneck in utilization of RDC data in recent years has been attributed to a lack of powerful and yet user-friendly RDC analysis tools capable of extracting the pertinent information embedded within this complex source of data.

Residual Dipolar Couplings (RDCs)
Residual dipolar couplings (RDCs) have been observed as early as 1963 (Saupe & Englert, 1963) in nematic environments. A number of recent applications (Al-Hashimi & Patel, 2002;Bax et al., 2001;Blackledge, 2005;de Alba & Tjandra, 2002;Prestegard et al., 2000;Tolman, 2001;Zhou et al., 1999) have reignited their wide use in application to a broad spectrum of biomolecules. RDCs arise from the interaction of two magnetically active nuclei in the presence of the external magnetic field of an NMR instrument (Bax et al., 2001;Prestegard et al., 2000;Tjandra et al., 1996;Tolman et al., 1995). This interaction is normally reduced to zero, due to the isotropic tumbling of molecules in their aqueous environment. The introduction of partial order to the molecular alignment, by minutely limiting their isotropic tumbling, will resurrect the RDC observable. This partial order can be introduced by either magnetic anisotropy of the molecule (Prestegard et al., 2000), a crystalline aqueous solution (Prestegard & Kishore, 2001), or incorporation of artificial tags with magnetic anisotropy susceptibility such as Lanthanide (Nitz et al., 2004). RDCs are measured relatively easily and represent an abundant source of highly precise information, such as the relative orientations of different inter-nuclear vectors within a molecule. Equation 1 describes the time average observable of the RDC interaction between a pair of spin ½ nuclei.
Here, D ij denotes the residual dipolar coupling in units of Hz between nuclei i and j, i and j are nuclear gyromagnetic ratios of the two interacting nuclei, r is the internuclear distance (assumed fixed for directly bonded atoms) and θ ij (t) is the time dependent angle of the internuclear vector with respect to the external magnetic field. The angle brackets signify the time average of the quantity.

Molecular frame, alignment frame and order tensor
Proper understanding and interpretation of the Order Tensor Matrix (OTM) is central to the study of structure and dynamics of biological macromolecules from orientational restraints, and therefore requires a brief discussion. Upon successful determination of a structure, its atomic coordinates are described within some arbitrarily selected coordinate system. Since this structure is independent of any rotation or displacement within a given frame, the selection of a coordinate system is inconsequential. This arbitrary coordinate system is denoted as the "molecular frame" (MF). On the other hand, since RDC data are capable of describing the preferred alignment of a molecule, a more descriptive frame can be selected in which the atomic coordinates of the molecule of interest are described in the appropriate orientation. This more descriptive frame is defined as the "principal alignment frame" (PAF). Rotation of the molecule within this frame is consequential in the representation of its order tensor while any translation in space is not. Alignment properties of a molecule can be described in the form of a Saupe order tensor matrix (Saupe & Englert, 1963;Valafar & Prestegard, 2004). Reformulation of Equation 1 in the matrix form collects and defines the Saupe order tensor matrix (or OTM for short) as represented by S in Equation 2. The entity v in this equation, represents the Cartesian coordinates of the normalized vector, and describes the relationship between a pair of interacting nuclei. Jacobi transformation (Press et al., 2002) of this symmetric and traceless matrix can separate two important information contents of the molecular alignment as shown in Equation 3. In this equation, a 3×3 order tensor represented by the elements s ij can be decomposed to produce the diagonal form of the order tensor matrix and a corresponding rotation matrix denoted by R. The three elements of the resulting diagonal matrix (S xx , S yy and S zz also referred to as the order parameters) represent the strength of alignment along each of the principal axes x, y and z within the PAF. Comparison of the order parameters obtained from different regions of a macromolecular complex can provide information regarding their rigidity with respect to each other. Analysis of the R matrix in turn can provide the preferred direction of orientation with reference to the starting molecular frame. The preferred alignment can be identified through the decomposition of the rotation matrix R (shown in Equation 3) into three distinct rotational operators along z, y and z axes of the PAF. These three rotations denoted by , , and fully define the orientational relationship between the arbitrary MF and the PAF, and can be used to assemble molecular complexes. In summary, an order tensor encapsulates five independent pieces of information ( , , , S yy , S zz and S xx =-S yy -S zz ). Careful study of the order tensor matrix can provide the preferred alignment of the molecule with respect to the molecular frame (, , ) and strength of alignment (S yy , S zz ) along each of the principal axes of alignment within the PAF. When RDC data are assigned to specific locations in a given structure, the elements of the order tensor can be obtained (Blackledge, 2005;Clore et al., 1998;Dosset et al., 2001;Losonczi et al., 1999;Valafar & Prestegard, 2004). Equally as well, given a structure and the elements of the order tensor, theoretical RDC data can be calculated easily.
The attainment of an order tensor is an important requisite step in extracting structural information from RDC data.

Structure determination by RDCs
Structure determination approaches from RDC data utilize the rotational component of order tensor matrices in order to assemble a protein from rigid structural elements. The rigid structural elements can consist of units as small as peptide planes (Bernado & Blackledge, 2004a;Bryson et al., 2008) or as large as individual structural domains (Delaglio et al., 2000;Fowler et al., 2000). Nearly all of the existing NMR data analysis packages such as Xplor-NIH , and CNS (Brunger, 2007) have been modified to incorporate orientational restraints as part of their analysis. However, this shift in paradigm from distance-based to orientation-based structure determination, has necessitated the development of appropriate analyses. A number of such software packages have been introduced in recent years such as REDCAT (Valafar & Prestegard, 2004), REDCRAFT (Bryson et al., 2008) and others (Delaglio et al., 2000;Fowler et al., 2000).
The prospect of structure determination of macromolecules from orientational restraints has many advantages. First, a carefully selected set of RDC data originating from the backbone of a protein can be used to directly investigate structural parameters such as the backbone torsion angles. For example, backbone N-H and C -H RDCs can be used to directly restrain the backbone structure of a protein (Bryson et al., 2008;Marassi & Opella, 2002, 2003Prestegard et al., 2005;Tian et al., 2001b;Valafar et al., 2005;Wang & Donald, 2004). However, in order to address degeneracies (Al-Hashimi et al., 2000b) and variable sensitivity of RDC data (Bryson et al., 2008), it is necessary to acquire orientational restraints from two or more independent alignment media. Therefore it can be argued that in theory, the structure of any protein can be determined with as little as two RDC data per residue from two alignment media. The main reason for this reduction in the data requirement is based on independent investigation of backbone structure from investigation of the side chains. This significant reduction in the amount of required data is of paramount importance in reducing the cost (temporal and financial) of structure determination, and extending the applicability of NMR spectroscopy to challenging proteins such as membrane proteins. Nearly 30% of the human proteome is predicted to consist of membrane proteins. Despite their functional importance and frequency of occurrence, only ~100 unique membrane proteins have been structurally characterized and included in the PDB database (Berman et al., 2000). Their low level of inclusion is because they neither crystallize for X-ray crystallography nor produce the conventional NOE data (12-15 NOEs per residue) through NMR spectroscopy that has been required for successful structure determination. On the other hand, acquisition of two or three RDC data per residue for membrane proteins or large deuterated proteins is feasible with today's technology. Furthermore, because of the direct relationship between the RDC data and backbone conformation of proteins, it is easy to theoretically support the sufficiency of two or three RDC data points per residue for meaningful structure determination.

Simultaneous study of structure and dynamics
Study of internal dynamics of macromolecules has been one of the long standing challenges in structural and molecular biology. While the importance of elucidating the structure of pharmaceutical targeted proteins is widely accepted, the importance of understanding their associated internal dynamics is less widely recognized. This neglect is due, in part, to a lack of experimental methods capable of probing dynamics on biologically relevant timescales. Although, various techniques for the study of internal dynamics exist (Henzler-Wildman & Kern, 2007), they usually apply to faster timescales or provide little information regarding conformational changes for slower dynamics.
Traditionally, full characterization of inter-molecular dynamics by NMR spectroscopy is separated from structure elucidation, increasing the cost of these studies. Furthermore, conceptually, it is difficult to separate structure from dynamics since the two are intimately related and any attempt in structure elucidation that disregards the dynamics (or vice versa) can produce faulty results. A structure that has been determined from data perturbed by internal dynamics, is likely to produce a compromised structure. Relying on a false structure to study internal dynamics is likely to produce an inaccurate model of the internal motion. Disregarding internal dynamics during the course of structure determination may have catastrophic effects. Fig. 8 demonstrates the effects of structure determination of a mobile terminal helix while disregarding internal dynamics (Bryson et al., 2008). Although the entire helix maintains its secondary structure, the recovered structure from traditional methods does not bear any resemblance to the actual structure (Fig. 8 green). Fig. 8. The structure of a mobile terminal helix (red) that has been determined by conventional approaches (green).
As discussed in section 6.3, proper analysis of RDC data can reveal information regarding the relative stability of various internal regions of a protein. This information can be obtained by comparing the order parameters that are obtained independently from each suspect region. Similar order parameters reported from different regions of a protein imply internal rigidity, while dissimilar order parameters can be interpreted as presence of internal dynamics. In this regard, RDCs prove to be exceptional probes for the inspection of dynamics in biomolecules on timescales that are biologically relevant (Bouvignies et al., 2005). Therefore proper investigation of structure and dynamics of proteins from RDC data should theoretically be possible. However, proper treatment of RDCs requires analysis software packages that are specifically designed for this purpose. Study of structure and dynamics of RDCs with traditional software packages such as Xplor-NIH  or CNS (Brunger, 2007) can be very daunting. Here we utilize the membrane bound form of the bacteriophage Pf1 protein (mbPf1) to illustrate the point. The structure of the mbPf1 protein consists of two helices, a longer transmembrane helix and a short amphipathic helix. A two-state jump model of motion has been applied to the amphipathic helix of this protein. Appropriate averaged RDC data have been generated to reflect the effect of the modeled internal dynamics. Fig. 9 provides an illustration of the two states of the amphipathic helix in red and green. An attempt to recover the structure of this protein from RDC data after insisting on helical secondary structural elements and using conventional analysis methods, produces the ensemble of structures shown in cyan in Fig.  9a. A recently introduced RDC analysis method named REDCRAFT, has demonstrated the possibility of successful study of structure and dynamics of proteins. Application of REDCRAFT analysis to the two-state jump problem successfully recovers the structure and orientation of the two states as shown in Fig. 9b in cyan. Note that the reconstructed states in Fig. 9b exhibit less than 1.3Å bb-rmsd. Fig. 9. Structure of the membrane bound form of the bacteriophage coat protein pf1 with a hypothetical and simulated model of dynamics. The two states of the amphipathic helix are illustrated in red and green. Attempts to recover the structure of this protein from RDC data using conventional analysis methods produces an ensemble of cyan structures (a) while REDCRAFT successfully recovers the structure and orientation of the two states in cyan (b).

A comparison of NOE versus RDC based approaches to study structure and dynamics of proteins
Over the past few years, the utility of residual dipolar couplings (RDCs) for structure determination has increased precipitously. This explosion can be attributed to its distinct advantages over the traditional distance constraints (Bryson et al., 2008;Prestegard et al., 2005Valafar et al., 2005). Generally, RDC data are more precise, easier to measure, and are capable of providing informative structural and dynamic information. The direct relationship between a carefully selected set of RDC data and structural parameters, such as backbone torsion angles, is another notable advantage of RDC data over NOE data. Given the alignment of an unknown protein, a single RDC datum can limit the orientation of the corresponding vector to within two symmetrical cones as illustrated by Tjandra and Bax (Tjandra & Bax, 1997;Tjandra et al., 1996. In addition, the number of NOE requirements for an unambiguous recovery of a structure is heavily related to structural complexity of the protein, which is unknown a priori to structure determination. Fig. 10 illustrates the tertiary structure of two proteins (3LAY and 1A1Z) of similar sizes, which clearly require disparate amounts of NOE data for successful description of their structures. Even the assembly of their secondary structural elements will require a different number of distance constraints. The lack of an understanding, as to the amount of data that is required, will have a direct impact on the cost and success of protein structure characterization. RDC

(a) (b)
data, on the other hand, are more suitable for theoretically understanding datarequirements, independent of structural complexity. As mentioned previously, strategically collected data can directly constrain a related torsion angle. Therefore, it is of no surprise that 2-3 RDC data points per residue should suffice for successful determination of a protein's backbone structure, regardless of the structural complexity. A priori knowledge of data requirements is helpful for many reasons. Proper understanding of data requirements allows establishing the completeness of the acquired data prior to analysis.
(a) (b) Fig. 10. Two helical proteins 3LAY (a) and 1A1Z (b) of nearly the same size and different structural complexity.
Another advantage in using RDCs is that the collected data from one portion of a structure, can act as a constraint on any other part of the structure, since all measurements are made with respect to a global point of reference (the common order tensor of the molecule). RDC data collected from the N-terminus of a protein must report the same order tensor as that described by the C-terminus of that protein. This underlying, global relationship between all RDCs significantly enhances their efficacy as global structural constraints. Furthermore, any discrepancy in the assumption regarding the rigidity of a protein can at least be evaluated. This is not the case with NOE data. For example, referring to Fig. 10, once the helical fragments are folded with the measured short-range NOEs, they do not infer any structural restraints for other parts of the molecule.
Piecewise structure determination from NOE data is not always possible, and is often very unlikely, since NOE interactions are normally observed between two atoms that may be anywhere along the backbone of a protein. Therefore, the structure of the entire protein, including that of the side chains, needs to be addressed simultaneously. Simultaneous investigation of the entire protein leads to an exponentially expanding search space that is riddled with many local minima. Although simulated annealing approaches can in theory resolve entrapment in local minima, in practice this requires a large number of redundant structure determination sessions in the hopes of discovering a more suitable structure. The use of RDCs enables the construction of a protein's backbone structure incrementally, through the addition of one amino acid at a time. This progressive strategy is computationally more convenient and allows the direct investigation of the backbone with reduced risk of entrapment in local minima. Addition of the side chains can take place after structure determination of the backbone, thereby benefiting from significant reduction in complexity of the solution-space. Fig. 11 provides direct evidence of the functional importance of RDC data in high-resolution structure determination. Here we have utilized the solution state structure of Ubiquitin/UIM fusion protein (2KDI) to generate precise NOE and RDC (backbone N-H and C -H ) data using typical order tensors. The computed data were then corrupted through the addition of uniformly distributed noise. The original 2KDI structure was also used to generate 5000 random derivative structures with 0-23Å of deviation, as measured over the backbone atoms with respect to 2KDI. Fig. 11 illustrates the fitness to the simulated data of the 5000 randomly generated structures, versus their backbone deviation (bb-rmsd) from the 2KDI structure. Fitness to the experimental data (RDC in blue and NOE in magenta) is plotted on the vertical axes while bb-rmsd to the high-resolution structure is plotted on the horizontal axis. This figure can be used to ascertain the information content of NOEs versus RDCs in guiding any protein folding strategy. Several conclusions can be made from Fig. 11. First, this figure suggests that backbone N-H and C -H RDCs are sufficient enough to obtain a protein structure. Second, Fig. 11 suggests that NOEs tend to plateau (lose sensitivity) as the calculated structure approaches the actual structure, while RDCs become more sensitive. Therefore, NOEs may be indiscriminate probes when operating in the range of 0-3.5Å from the actual structure. In contrast, the use of RDC data may very well provide structures within less than 1.0Å from the actual structure. This observation is in agreement with the community wide consensus that X-ray structures fit RDC data better than the NOE based NMR structure of proteins. The final conclusion is that RDCs may be an indispensable source of data in highresolution structure determination by NMR spectroscopy. Fig. 11. NOE and RDC fitness of 5000 structures generated randomly from a known structure versus their backbone rmsd to the actual structure.

The role of NMR spectroscopy in the era of Computational Biology
Structure determination utilizing traditional NMR techniques, relies on the measurement of Nuclear Overhauser Effects (NOEs) and scalar couplings in order to derive distance and torsion angle constraints, respectively (Montelione et al., 2000). Although NOE constraints will continue to be important for high-throughput structure determination, the measurement of residual dipolar couplings (RDCs) will prove valuable in structure genomics efforts (Montelione et al., 2000). By providing new structural information in qualitative form (Montelione et al., 2000), RDC experiments result in orientational constraints that are complementary to the distance-based constraints available through NOEs (Prestegard, 1998). Exceptionally challenging proteins, such as membrane bound/associated or glycosylated proteins, are refractory for structure determination by traditional approaches (NMR and X-ray crystallography). This can be attributed to either an insufficient number of the required conventional (distance or NOE) constraints in the case of NMR spectroscopy, or the failure to produce a diffraction-quality crystal for X-ray crystallography. Residual Dipolar Couplings (RDC), are a new type of data that have been anticipated to be instrumental in structural characterization of large proteins, membrane proteins or homo-multimeric protein complexes, to name a few. This is in part due to their rich information content. RDCs also possess the potential for integrating structure determination by NMR spectroscopy (Bryson et al., 2008;Park et al., 2009;Prestegard et al., 2005;Tian et al., 2001b;Valafar et al., 2005), X-ray crystallography Ulmer et al., 2003;Valafar & Prestegard, 2003), and computational modeling Raman et al., 2010b;Valafar & Prestegard, 2003) methods into one unified approach for structural elucidation of biological macromolecules. Because RDCs may be used to characterize the structure and dynamics of challenging proteins, it presents a viable, costeffective method with the benefit of producing rapid, comprehensive and automated results (Al-Hashimi et al., 2002b;Bailor et al., 2007;Liu et al., 2010;Park et al., 2009;Prestegard et al., 2000;Tian et al., 2001a;Wang et al., 2007).
Another important development to consider, is the automated analysis of NMR data. Many of the interactive tasks related to spectral analysis which are currently performed by experts could, in principle, be performed more efficiently using computational systems (Montelione et al., 2000). This has in fact been demonstrated with proteins ranging from 50 to 200 residues in length (Moseley & Montelione, 1999).

Concluding remarks
Here we conclude by summarizing some of the limitations related to modern NMR spectroscopy and briefly describe a method which may help to mitigate the limitations of NMR spectroscopy with respect to large molecules and macromolecular assemblies.

Limitations of modern NMR spectroscopy
Similar to other methods of structure determination, the accuracy of protein structures determined by NMR are dependent upon the extent and quality of the data that can be obtained (Liu & Hsu, 2005;Montelione et al., 2000). NMR spectroscopy is considered relatively insensitive, typically requiring samples of about 1mM protein concentration; preventing studies of proteins with very low solubilities, thereby limiting certain experimental designs (Montelione et al., 2000). Limitations such as these effect constraints on pulse sequence design and sample stability (Montelione et al., 2000). Although multiple samples may be utilized for the process of structure determination, each sample must be stable (with regard to precipitation, aggregation, and other types of degradation) for any amount of time ranging from days to weeks (Montelione et al., 2000). Furthermore, manual analyses of these multiple NMR datasets are not only laborious and time-consuming, but require expertise (Montelione et al., 2000). Although recent developments hold great promise in reducing the amount of time in structure determination using automated analysis of NMR assignments and 3-dimensional structure (Moseley & Montelione, 1999), general methods for automated analysis of side chain resonance assignment are not yet well developed (Liu & Hsu, 2005;Montelione et al., 2000).
Yet another limitation of NMR analysis is that the density of constraints is occasionally insufficient for accurate structural analysis (Montelione et al., 2000). More specifically, general methods for cross-validation similar to the free R-factor (a statistical measurement used in crystallographic studies for evaluating how well a structure model fits the diffraction data) are not currently available (Montelione et al., 2000). With regard to Residual Dipolar Couplings (RDCs), current limitations include: the efficient identification of alignment media (Montelione et al., 2000), available methods for data extraction and analysis (Jung & Lee, 2004), The major challenge of NMR spectroscopy, however, is in reducing the amount of time in data collection for structure determination (Liu & Hsu, 2005). The construction of new highfield magnets for enhanced sensitivity, exemplify technological advancements that are of particular interest, yet it is the performance of the NMR probe (used to detect NMR signals) for which the sensitivity of the acquired NMR data depends (Montelione et al., 2000). This can be improved through the introduction of new probes, but is also dependent upon advancements in partial deuteration, which can improve the signal to noise ratios that result from sharper linewidths and longer transverse relaxation times (Gardner & Kay, 1998;Montelione et al., 2000). Transverse Relaxation-Optimized SpectroscopY (TROSY) is another novel technique that may provide significant sensitivity for large proteins by slowly relaxing NMR transitions (Montelione et al., 2000;Pervushin et al., 1997;Wider & Wüthrich, 1999;Wüthrich, 1998). Finally, the proper combination of various sources of data can be very beneficial in overcoming some of the fundamental challenges in NMR spectroscopy. Based on results shown in Fig. 11, it can be concluded that NOE data are much more effective in guiding the protein structure from an extended state to near its native conformation. However, NOE data alone seem to lose structural sensitivity around 3Ǻ from the native structure. RDCs on the other hand seem to provide the needed sensitivity as the structure determination converges toward the native conformation. It is therefore reasonable to speculate that the most effective approach to protein folding from NMR data consists of initial rounds of structure determination by NOE data, followed by structure refinement guided by RDC data in the absence of NOE constraints.

Contribution of TROSY in mitigating limitations of NMR spectroscopy
Many biologically relevant macromolecules and macromolecular complexes are simply too large for traditional NMR spectroscopy studies, with molecular masses beyond the practical range (Fernandez & Wider, 2003). In fact, conventional NMR based techniques often identify two main problems associated with the solution state study of large molecules and macromolecular assemblies: 1. the large number of acquired resonances causes signals to overlap, making spectral analysis very difficult; and 2. because NMR signals of larger molecules relax faster, it often results in line broadening, poor spectral sensitivity, and eventually no NMR signals (Fernandez & Wider, 2003). TROSY is a technique that has profoundly extended the size limit of macromolecules able to be investigated by NMR, making the analyses of molecular systems of up to 1 kDa possible Riek et al., 2002). The application of this particular technique has made a wide range of novel applications possible (Fernandez & Wider, 2003) by providing much better sensitivity and line width for large proteins by reducing transverse nuclear spin relaxation during chemical shift evolution (Pervushin et al., 1997). This makes the critical step of resonance assignment possible and allows backbone assignment and secondary structure to be obtained, as first demonstrated with a homo-octameric protein of 110 kDa (Salzmann et al., 2000). In addition, applications of TROSY for side chain resonance assignments have also been demonstrated (Hilty et al., 2002). TROSY has even been incorporated into experimental studies which focus on the dynamics of macromolecules (Zhu et al., 2000). Moreover, TROSY has proven successful in determining structures for some of the most difficult of proteins -membrane proteins (Fernandez et al., 2001a(Fernandez et al., , 2001b. Because TROSY also provides a wide range of NMR measurements with regard to the functional properties of larger macromolecular complexes, it demonstrates great potential for providing clues to the physiological roles of novel proteins and may prove beneficial for drug discovery (Fernandez & Wider, 2003). Other practical applications of TROSY include: the discovery of scalar spin-spin couplings across hydrogen bonds (Cordier & Grzesiek, 1999;Dingley & Grzesiek, 1998;Pervushin et al., 1998), which can be utilized for structure refinement; the measurement of dipolar couplings in large molecules, to determine much larger 3D structures by NMR (Evenäs et al., 2001;Lerche et al., 1999;Yang et al., 1999); and increasing the sensitivity of some tripleresonance experiments for 13 C and 15 N labeled nucleic acids, so as to increase the range of their functionality to even bigger oligonucleotides (Brutscher & Simorre, 2001;Fiala et al., 2000;Riek et al., 2001;Simon et al., 2001).
In short, the many applications of TROSY and the data made available by its experiments, will contribute significantly by providing important information aimed at solving both future and present biological problems related to the structure and function of large and complex biological molecules.