Documentation

Introduction to the PED

The structural characterization of intrinsically disordered proteins (IDPs) has obvious inherent difficulties compared to proteins with more rigid structures. Structural information of a single conformation is often insufficient to get an insight on the functions carried out by IDPs, therefore these proteins can better be described as ensembles of a high number of conformations. These ensembles are calculated based on experimental data coming mainly from Nuclear Magnetic Resonance (NMR) and Small-angle X-ray Scattering (SAXS) measurements, and provide valuable functional information that would otherwise prove difficult to elucidate using the classical approaches. However, meaningful ensembles might also be constructed using Molecular Dynamics (MD) simulations, therefore PED is hosting such ensembles too, however, experimental validation is advised for these models.

The goal of the PED project is to provide a platform for the IDP community where ensembles and their corresponding primary data can be stored and used as benchmarking datasets to facilitate the development of new ensemble calculation methods.

PED is a joint initiative of international groups working on the structural characterization of IDPs. The developer and administrator of PED is Mihaly Varadi in the PDBe group of EMBL-EBI, Hinxton, Cambridge.

More about the people involved...
Caution is advised:

Structural ensembles are based on experimental data, and represent the state-of-the-art in the description of IDP structures, but it is important to keep in mind that ensembles are models, and due to the ambiguous nature of the experimental data and the high degree of freedom in the conformational sampling of IDPs, different models can explain the data equally well. Structural characterization of IDPs is still an emerging field, and the accuracy of the methods is bound to improve.

More about intrinsically disordered proteins...
More about ensemble calculation...

User guide

  1. Browsing
  2. Searching
  3. Entries
  4. Visualizing

Browsing

To browse in PED, use the "browse" button on the navigation bar found at the top of the screen. The database can be browsed by entries, proteins, data types (NMR, SAXS, both or MD only), authors are publications. The browsing options lead to tables listing all the entries according to relevant information. These browsing tables can be rearranged by clicking on a column header (e.g. "PED ID" or "data type").

Searching

Users may search for entries in the PED by typing a query string in the text field (red arrow), and clicking on the 'Submit' button. Optionally, the type of the string can be specified with the scroll-down menu (yellow arrow) found on the right of the text-field (for example searching specifically by UniProt ID, etc.) and the type of experimental data can be selected using the radio buttons under the text-field (green arrow).

By using either the basic or the advanced search functions, the users are directed to a window displaying the corresponding entries. The entry screens can be displayed by either clicking on the figure or by clicking on the title of the accession at the top (yellow arrows).

By clicking on the check boxes (red arrow) next to the titles or clicking on the 'Select all' button and then pressing the 'Download selected' button (green arrow), the user will be directed to a new window where a list of files can be found allowing the download of the experimental data, the structural ensembles (PDB format), the sequence files (FASTA format) or the complete data archives.

Entries

The accession screen displays every information that is available for a specific entry. Users may use the quick links to directly access the experimental data, the PDB files, the sequences or the complete, compressed entry archives (yellow arrow). The 'Description of the entry' and 'Sample images' sections are expanded by default, users can view other sections by clicking the 'Show/Hide' button found at the top right of each section (green arrow).

On the top of the 'Description of the entry' section five boxes indicates if the ensemble was generated using a random pool, MD simulation; if it relies on SAXS or NMR data; and if the ensembles were experimentally validated. Green boxes are 'true' for the given entry, red boxes are 'false'.

This section displays the authors and provides a link to the original publication of the entry, along with the release date of the data in the PED. Additionally, a brief summary of the entry and the description of the modelling procedure is also provided here.

The 'Sample images' section displays three representative conformers from the Radius of gyration (Rg) spectrum of the ensembles. From left to right, the conformer with the lowest Rg value, the conformer that is the closest to the ensemble average, and the most extended conformer are shown. These figures are links to a JSmol applet, that can be used for visualizing the conformers (red arrow).

The protein information section stores the protein sequences, the species of origin, the expression system and the cross- references to other databases.

The SAXS, NMR and MD sections can be used to view relevant experimental parameters and settings, as well as to download the data archives and in the case of SAXS data to examine specific plots, such as the normalized Kratky plot, the P(r) distance distribution plot, the Guinier-plot, and the scattering curve itself. In the case of NMR data, a link to the BMRB NMR database is provided (if applicable).

Finally, the bottom sections provides a list of the software used for the modelling and data processing procedure along with links to the sources of the software.

Visualizing

By clicking on any of the figures in the image gallery, users are directed to the JSMol accession window. Here, each conformer of all the ensembles can be displayed in a fully customizable JSMol applet.

Users can also view the Rg (radius of gyration) distribution of each ensemble to get an overall idea on the size of the conformers. Rg distributions for each ensemble is plotted on the right side of the screen. By clicking on the 'Show List' button, users can access the unique conformers, and display them by clicking on the 'display' circle. By pressing the 'up' link next to the 'display' circle (marked by the red arrow) the user can jump to the top of the window, back to the top of the window.


What is an IDP?

Introduction to intrinsically disordered proteins

Intrinsically disordered proteins (IDPs) are defined by the lack of a single, stable tertiary structure under physiological conditions [1-2]. These proteins may have multiple conformations that are separated by low energy barriers, therefore their structure constantly fluctuates between these different states. IDPs have challenged the classical structure-function paradigm, because their function comes from transition between disordered and a few folded partner-bound states.

Disordered regions have been found in proteins involved in DNA and RNA binding, transcription, translation, cell-cycle regulation and membrane fusion, and often in pathology such as those involved in amyloid formation. These regions may function as entropic chains (such as flexible linkers between folded domains) or by transient (such as in post-translational modification) or permanent (such as scaffolds or effectors) partner binding [3]. Upon binding, some IDPs gain a stable folded structure (i.e. folding upon binding), while others retain their flexibility, forming a "fuzzy" complex [4].

Structural disorder can be inferred from the primary sequence, as disordered regions feature specific disorder-promoting amino acids, such as Glycine, Proline and charged residues, while certain order-promoting amino acids, such as hydrophobic residues, are depleted in the sequence. The disorder of IDPs can also be inferred from residues missing in X-ray structures, Kratky-plots from SAXS measurements and a variety of NMR experiments.

Structural description of IDPs is not feasible using high-resolution techniques, such as X-ray crystallography, but Small-angle X-ray scattering (SAXS) coupled with Nuclear Magnetic Resonance (NMR) experiments measuring Residual Dipolar Couplings (RDCs), Paramagnetic Relaxation Enhancement (PRE) and J-couplings yield meaningful information on the shape and size distribution, long-range contacts and backbone flexibility of the disordered protein in solution [5-6]. This information can be used to describe the structure of an IDP as an ensemble of conformations.

References
  1. Dyson HJ, Wright PE (March 2005). "Intrinsically unstructured proteins and their functions". Nat. Rev. Mol. Cell Biol. 6 (3): 197-208
  2. Tompa P (2002) Intrinsically unstructured proteins. Trends Biochem Sci 27, 527-33
  3. Dunker AK, Silman I, Uversky VN, Sussman JL (December 2008). "Function and structure of inherently disordered proteins". Curr. Opin. Struct. Biol. 18 (6): 756-64
  4. Tompa, P. & Fuxreiter, M. (Jan 2008) "Fuzzy complexes: polymorphism and structural disorder in protein-protein interactions". Trends Biochem Sci 33,(1): 2-8
  5. Schneider R, Huang JR, Yao M, Communie G, Ozenne V, Mollica L, Salmon L, Jensen MR, Blackledge M. (2012) Towards a robust description of intrinsic protein disorder using nuclear magnetic resonance spectroscopy.Mol Biosyst. 2012 Jan;8(1):58-68. doi: 10.1039/c1mb05291h. Epub 2011 Aug 26.
  6. Bernado P, Svergun DI. (2012) Structural analysis of intrinsically disordered proteins by small-angle X-ray scattering. Mol Biosyst. 2012 Jan;8(1):151-67

What is an ensemble?

Introduction to structural ensembles

The shear number of conformers that coexist in solution for a flexible system precludes their description with X-ray crystallography. However, Nuclear Magnetic Resonance (NMR) and Small Angle X-ray Scattering (SAXS) experiments can provide geometric constraints that can be used in the selection of an ensemble of conformations that approximately describe (data on) the flexible system in solution [1-3]. There are two broad approaches to generating disordered state ensembles that fit experimental data [4]. The first is driving molecular dynamics (MD) simulations so that a set of structures fit the data, called ensemble-restrained MD. The second is more recent and involves the pre-generation of conformations and selecting subsets that fit the data. Although the latter approach is described in more detail here, PED is also hosting ensembles from MD simulations, be it experimentally constrained, or unconstrained. However, experimental validation is always advised.

Generating the starting pool

The procedure of ensemble calculation starts with generating a vast starting pool of conformations. These conformations can be completely randomized or may already be constrained by experimental or theoretical data such as psi/phi angles or secondary structure propensities. Some programs commonly used for this step are Flexible-Meccano (FM) [5], Ensemble Optimization Method (EOM) and TRaDES [6-7]. MD simulations may also be used to provide a starting pool. The conformers generated may need to be completed, for example FM conformers lack side-chains that need to be modeled in afterwards with software such as SCCOMP [8] or SCRWL [9].

Back-calculated data

After generating the starting pool, a variety of data-types are back-calculated from the conformers in order to enable comparison with experimental data. For SAXS data, the software CRYSOL [10] calculates scattering curves, averaging individual intensities for each conformer. For NMR data, FM can estimate Residual Dipolar Coupling values (RDCs), and Paramagnetic Relaxation Enhancement (PRE) and J-coupling values for the generated conformer pools or ENSEMBLE [11] can be used. ENSEMBLE utilizes CRYSOL for SAXS data, HYDROPRO [12] for NMR-derived Rh data, ShiftX [13] for chemical shift data, a local-alignment approach [14] for RDCs, and internal scripts for solvent accessibility, PREs, J-couplings, R2 relaxation rates and NOEs.

Fitting to the experimental data

The aim of the ensemble calculation is to select a subset of conformers whose back-calculated values fit the actual experimental data coming from the SAXS and NMR measurements. The software Gajoe, part of EOM, deals with the selection of the pool of conformers that fit the theoretical and experimental SAXS curves best. The program ASTEROIDS carries out a similar selection based on NMR parameters. ENSEMBLE similarly can select a subset of conformers on the basis of SAXS and a variety of different NMR data. The size of the final ensembles may range from only a few to hundreds of conformers; after a threshold, increasing the ensemble size may not improve the fit any further and the "final" ensembles are not unique.

Important!

As the final structural ensembles are models, there are several caveats to accepting them as true representations of structural reality. First, the quality of the final ensemble depends strongly on the quality of the experimental data. Aggregation, degradation or purity issues can severely affect the reliability of the ensemble. In case of techniques such as SAXS, experiments always yield results, even in case of severe errors. Therefore data has to be carefully examined and controlled. Furthermore, even in an ideal case it is currently impossible to describe the system with a single structural ensemble. Determining an ensemble is an ill-posed problem, and pools of different conformers may fit the experimental data equally well (ie. there is no unique solution). One purpose of the PED is to scrutinize these calculated ensembles for various measures of information content and quality. Even with all their shortcomings, structural ensembles represent our current best approach for describing the structure of IDPs.

References
  1. Schneider R, Huang JR, Yao M, Communie G, Ozenne V, Mollica L, Salmon L, Jensen MR, Blackledge M. (2012) Towards a robust description of intrinsic protein disorder using nuclear magnetic resonance spectroscopy.Mol Biosyst. 2012 Jan;8(1):58-68. doi: 10.1039/c1mb05291h. Epub 2011 Aug 26.
  2. Bernado P, Svergun DI. (2012) Structural analysis of intrinsically disordered proteins by small-angle X-ray scattering. Mol Biosyst. 2012 Jan;8(1):151-67
  3. Koch MH, Vachette P, Svergun DI. (2003) Small-angle scattering: a view on the properties, structures and structural changes of biological macromolecules in solution. Q Rev Biophys. 2003 May;36(2):147-227.
  4. Fisher CK, Stultz CM. (2011) Constructing ensembles for intrinsically disordered proteins.Curr Opin Struct Biol. 2011 Jun;21(3):426-31.
  5. Valery Ozenne; Frederic Bauer; Loic Salmon; Jie-rong Huang; Malene Ringkjobing Jensen; Stephane Segard; Pau Bernado; Celine Charavay; Martin Blackledge (2012) Flexible-meccano: a tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables, Bioinformatics 2012 28: 1463-1470
  6. Feldman, Howard J. and Hogue, Christopher W.V. (2000) A Fast Method to Sample Real Protein Conformational Space. Proteins: Structure, Function and Genetics, 39, 112-131.
  7. Feldman, Howard J. and Hogue Christopher W.V. (2002) Probabilistic Sampling of Protein Conformations: New Hope for Brute Force? Proteins: Structure, Function and Genetics, 46, 8-23.
  8. Eyal,E., Najmanovich,R., McConkey,B.J., Edelman,M. and Sobolev,V. (2004) Importance of solvent accessibility and contact surfaces in modeling side-chain conformations in proteins. J. Comput. Chem., 25, 712-724.
  9. A. A. Canutescu, A. A. Shelenkov, and R. L. Dunbrack, Jr. A graph theory algorithm for protein side-chain prediction. Protein Science 12, 2001-2014 (2003). S.F. Altschul et al., "Basic local alignment search tool," Journal of Molecular Biology, 215:403-10, 1990.
  10. Svergun D.I., Barberato C. and Koch M.H.J. (1995) CRYSOL - a Program to Evaluate X-ray Solution Scattering of Biological Macromolecules from Atomic Coordinates J. Appl. Cryst. , 28, 768-773.
  11. M. Krzeminski and J.D. Forman-Kay. Characterization of disordered proteins with ENSEMBLE, Bioinformatics, 29(3):398-399 (2013).
  12. Garcia De La Torre, J.; Huertas, M. L.; Carrasco, B. Calculation of hydrodynamic properties of globular proteins from their atomic-level structure. Biophys J 2000, 78, 719-30.
  13. Neal, S.; Nip, A. M.; Zhang, H.; Wishart, D. S. Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J Biomol NMR 2003, 26, 215-40.
  14. Marsh, J. A.; Baker, J. M.; Tollinger, M.; Forman-Kay, J. D. Calculation of residual dipolar couplings from disordered state ensembles using local alignment. J Am Chem Soc 2008, 130, 7804-5.

Calculation tools

Software for the various steps of ensemble modelling

Analyzing experimental data

Tools to analyize experimental data (NMR or SAXS).

Software Usage
 NMRPipe package NMR data processing and analysis
 CNS NMR data processing and analysis
 ATSAS package SAXS data processing and analysis

Sampling protein conformational space

Software to generate random or semi-random conformer libraries.

Software Usage
 Flexible-meccano Ensemble pool generator
 TRaDES Trajectory Directed Ensemble Sampling

Back-calculating data

Software to calculate theoretical parameters from ensembles of conformations.

Software Usage
 ATSAS package SAXS data processing and analysis
 ENSEMBLE Multiple data-type fitting
 ShiftX Chemical shifts calculation
 HYDROPRO Hydrodynamic properties calculation

Validating structures

Tools for validating conformers.

Software Usage
 PROCHECK Structure validation