Protein folding

Source: Wikipedia, the free encyclopedia.
Protein before and after folding
Results of protein folding

Protein folding is the

physical process by which a protein, after synthesis by a ribosome as a linear chain of amino acids, changes from an unstable random coil into a more ordered three-dimensional structure. This structure permits the protein to become biologically functional.[1]

The folding of many proteins begins even during the translation of the polypeptide chain. The amino acids interact with each other to produce a well-defined three-dimensional structure, known as the protein's

The correct three-dimensional structure is essential to function, although some parts of functional proteins

antibodies for certain protein structures.[5]

Denaturation of proteins is a process of transition from a folded to an unfolded state. It happens in cooking, burns, proteinopathies, and other contexts. Residual structure present, if any, in the supposedly unfolded state may form a folding initiation site and guide the subsequent folding reactions. [6]

The duration of the folding process varies dramatically depending on the protein of interest. When studied outside the cell, the slowest folding proteins require many minutes or hours to fold, primarily due to proline isomerization, and must pass through a number of intermediate states, like checkpoints, before the process is complete.[7] On the other hand, very small single-domain proteins with lengths of up to a hundred amino acids typically fold in a single step.[8] Time scales of milliseconds are the norm, and the fastest known protein folding reactions are complete within a few microseconds.[9] The folding time scale of a protein depends on its size, contact order, and circuit topology.[10]

Understanding and simulating the protein folding process has been an important challenge for computational biology since the late 1960s.

Process of protein folding

Primary structure

The primary structure of a protein, its linear amino-acid sequence, determines its native conformation.[11] The specific amino acid residues and their position in the polypeptide chain are the determining factors for which portions of the protein fold closely together and form its three-dimensional conformation. The amino acid composition is not as important as the sequence.[12] The essential fact of folding, however, remains that the amino acid sequence of each protein contains the information that specifies both the native structure and the pathway to attain that state. This is not to say that nearly identical amino acid sequences always fold similarly.[13] Conformations differ based on environmental factors as well; similar proteins fold differently based on where they are found.

Secondary structure

The alpha helix spiral formation
beta pleated sheet
displaying hydrogen bonding within the backbone

Formation of a Protein secondary structure|secondary structure is the first step in the folding process that a protein takes to assume its native structure. Characteristic of secondary structure are the structures known as

backbone to form a spiral shape (refer to figure on the right).[12] The β pleated sheet is a structure that forms with the backbone bending over itself to form the hydrogen bonds (as displayed in the figure to the left). The hydrogen bonds are between the amide hydrogen and carbonyl oxygen of the peptide bond. There exists anti-parallel β pleated sheets and parallel β pleated sheets where the stability of the hydrogen bonds is stronger in the anti-parallel β sheet as it hydrogen bonds with the ideal 180 degree angle compared to the slanted hydrogen bonds formed by parallel sheets.[12]

Tertiary structure

The α-Helices and β-Sheets are commonly amphipathic, meaning they have a hydrophilic and a hydrophobic portion. This ability helps in forming tertiary structure of a protein in which folding occurs so that the hydrophilic sides are facing the aqueous environment surrounding the protein and the hydrophobic sides are facing the hydrophobic core of the protein.

disulfide bridges formed between two cysteine residues. These non-covalent and covalent contacts take a specific topological arrangement in a native structure of a protein. Tertiary structure of a protein involves a single polypeptide chain; however, additional interactions of folded polypeptide chains give rise to quaternary structure formation.[16]

Quaternary structure

Tertiary structure may give way to the formation of quaternary structure in some proteins, which usually involves the "assembly" or "coassembly" of subunits that have already folded; in other words, multiple polypeptide chains could interact to form a fully functional quaternary protein.[12]

Driving forces of protein folding

All forms of protein structure summarized

Folding is a

co-translationally, so that the N-terminus of the protein begins to fold while the C-terminal portion of the protein is still being synthesized by the ribosome; however, a protein molecule may fold spontaneously during or after biosynthesis.[18] While these macromolecules may be regarded as "folding themselves", the process also depends on the solvent (water or lipid bilayer),[19] the concentration of salts, the pH, the temperature, the possible presence of cofactors and of molecular chaperones
.

Proteins will have limitations on their folding abilities by the restricted bending angles or conformations that are possible. These allowable angles of protein folding are described with a two-dimensional plot known as the Ramachandran plot, depicted with psi and phi angles of allowable rotation.[20]

Hydrophobic effect

Hydrophobic collapse. In the compact fold (to the right), the hydrophobic amino acids (shown as black spheres) collapse toward the center to become shielded from aqueous environment.

Protein folding must be thermodynamically favorable within a cell in order for it to be a spontaneous reaction. Since it is known that protein folding is a spontaneous reaction, then it must assume a negative Gibbs free energy value. Gibbs free energy in protein folding is directly related to enthalpy and entropy.[12] For a negative delta G to arise and for protein folding to become thermodynamically favorable, then either enthalpy, entropy, or both terms must be favorable.

Entropy is decreased as the water molecules become more orderly near the hydrophobic solute.

Minimizing the number of hydrophobic side-chains exposed to water is an important driving force behind the folding process.

amphiphilic molecule containing a large hydrophobic region.[23] The strength of hydrogen bonds depends on their environment; thus, H-bonds enveloped in a hydrophobic core contribute more than H-bonds exposed to the aqueous environment to the stability of the native state.[24]

In proteins with globular folds, hydrophobic amino acids tend to be interspersed along the primary sequence, rather than randomly distributed or clustered together.[25][26] However, proteins that have recently been born de novo, which tend to be intrinsically disordered,[27][28] show the opposite pattern of hydrophobic amino acid clustering along the primary sequence.[29]

Chaperones

Example of a small eukaryotic heat shock protein

disulfide bonds or interconversion between cis and trans stereoisomers of peptide group.[31] Chaperones are shown to be critical in the process of protein folding in vivo because they provide the protein with the aid needed to assume its proper alignments and conformations efficiently enough to become "biologically relevant".[32] This means that the polypeptide chain could theoretically fold into its native structure without the aid of chaperones, as demonstrated by protein folding experiments conducted in vitro;[32] however, this process proves to be too inefficient or too slow to exist in biological systems; therefore, chaperones are necessary for protein folding in vivo. Along with its role in aiding native structure formation, chaperones are shown to be involved in various roles such as protein transport, degradation, and even allow denatured proteins exposed to certain external denaturant factors an opportunity to refold into their correct native structures.[33]

A fully denatured protein lacks both tertiary and secondary structure, and exists as a so-called

solutes, extremes of pH, mechanical forces, and the presence of chemical denaturants can contribute to protein denaturation, as well. These individual factors are categorized together as stresses. Chaperones are shown to exist in increasing concentrations during times of cellular stress and help the proper folding of emerging proteins as well as denatured or misfolded ones.[30]

Under some conditions proteins will not fold into their biochemically functional forms. Temperatures above or below the range that cells tend to live in will cause

hyperthermophilic bacteria have been found that grow at temperatures as high as 122 °C,[39]
which of course requires that their full complement of vital proteins and protein assemblies be stable at that temperature or above.

The bacterium

bacteriophage T4, and the phage encoded gp31 protein (P17313) appears to be structurally and functionally homologous to E. coli chaperone protein GroES and able to substitute for it in the assembly of bacteriophage T4 virus particles during infection.[40] Like GroES, gp31 forms a stable complex with GroEL chaperonin that is absolutely necessary for the folding and assembly in vivo of the bacteriophage T4 major capsid protein gp23.[40]

Fold switching

Some proteins have multiple native structures, and change their fold based on some external factors. For example, the KaiB protein switches fold throughout the day, acting as a clock for cyanobacteria. It has been estimated that around 0.5–4% of PDB (Protein Data Bank) proteins switch folds.[41]

Protein misfolding and neurodegenerative disease

A protein is considered to be

misfolded if it cannot achieve its normal native state. This can be due to mutations in the amino acid sequence or a disruption of the normal folding process by external factors.[42] The misfolded protein typically contains β-sheets that are organized in a supramolecular arrangement known as a cross-β structure. These β-sheet-rich assemblies are very stable, very insoluble, and generally resistant to proteolysis.[43] The structural stability of these fibrillar assemblies is caused by extensive interactions between the protein monomers, formed by backbone hydrogen bonds between their β-strands.[43] The misfolding of proteins can trigger the further misfolding and accumulation of other proteins into aggregates or oligomers. The increased levels of aggregated proteins in the cell leads to formation of amyloid-like structures which can cause degenerative disorders and cell death.[42] The amyloids are fibrillary structures that contain intermolecular hydrogen bonds which are highly insoluble and made from converted protein aggregates.[42] Therefore, the proteasome pathway may not be efficient enough to degrade the misfolded proteins prior to aggregation. Misfolded proteins can interact with one another and form structured aggregates and gain toxicity through intermolecular interactions.[42]

Aggregated proteins are associated with

pharmaceutical chaperones
to fold mutated proteins to render them functional.

Experimental techniques for studying protein folding

While inferences about protein folding can be made through mutation studies, typically, experimental techniques for studying protein folding rely on the gradual unfolding or folding of proteins and observing conformational changes using standard non-crystallographic techniques.

X-ray crystallography

Steps of X-ray crystallography

multiple isomorphous replacement use the presence of a heavy metal ion to diffract the X-rays into a more predictable manner, reducing the number of variables involved and resolving the phase problem.[47]

Fluorescence spectroscopy

Fluorescence spectroscopy is a highly sensitive method for studying the folding state of proteins. Three amino acids, phenylalanine (Phe), tyrosine (Tyr) and tryptophan (Trp), have intrinsic fluorescence properties, but only Tyr and Trp are used experimentally because their quantum yields are high enough to give good fluorescence signals. Both Trp and Tyr are excited by a wavelength of 280 nm, whereas only Trp is excited by a wavelength of 295 nm. Because of their aromatic character, Trp and Tyr residues are often found fully or partially buried in the hydrophobic core of proteins, at the interface between two protein domains, or at the interface between subunits of oligomeric proteins. In this apolar environment, they have high quantum yields and therefore high fluorescence intensities. Upon disruption of the protein's tertiary or quaternary structure, these side chains become more exposed to the hydrophilic environment of the solvent, and their quantum yields decrease, leading to low fluorescence intensities. For Trp residues, the wavelength of their maximal fluorescence emission also depend on their environment.

Fluorescence spectroscopy can be used to characterize the

stopped flow, to measure protein folding kinetics,[54] generate a chevron plot and derive a Phi value analysis
.

Circular dichroism

stopped flow to measure protein folding kinetics and to generate chevron plots
.

Vibrational circular dichroism of proteins

The more recent developments of

FT-IR data for protein solutions in heavy water (D2O), or quantum computations
.

Protein nuclear magnetic resonance spectroscopy

Protein nuclear magnetic resonance (NMR) is able to collect protein structural data by inducing a magnet field through samples of concentrated protein. In NMR, depending on the chemical environment, certain nuclei will absorb specific radio-frequencies.[55][56] Because protein structural changes operate on a time scale from ns to ms, NMR is especially equipped to study intermediate structures in timescales of ps to s.[57] Some of the main techniques for studying proteins structure and non-folding protein structural changes include COSY, TOCSYHSQC, time relaxation (T1 & T2), and NOE.[55] NOE is especially useful because magnetization transfers can be observed between spatially proximal hydrogens are observed.[55] Different NMR experiments have varying degrees of timescale sensitivity that are appropriate for different protein structural changes. NOE can pick up bond vibrations or side chain rotations, however, NOE is too sensitive to pick up protein folding because it occurs at larger timescale.[57]

Timescale of protein structural changes matched with NMR experiments. For protein folding, CPMG Relaxation Dispersion (CPMG RD) and chemical exchange saturation transfer (CEST) collect data in the appropriate timescale.

Because protein folding takes place in about 50 to 3000 s−1 CPMG Relaxation dispersion and chemical exchange saturation transfer have become some of the primary techniques for NMR analysis of folding.[56] In addition, both techniques are used to uncover excited intermediate states in the protein folding landscape.[58] To do this, CPMG Relaxation dispersion takes advantage of the spin echo phenomenon. This technique exposes the target nuclei to a 90 pulse followed by one or more 180 pulses.[59] As the nuclei refocus, a broad distribution indicates the target nuclei is involved in an intermediate excited state. By looking at Relaxation dispersion plots the data collect information on the thermodynamics and kinetics between the excited and ground.[59][58] Saturation Transfer measures changes in signal from the ground state as excited states become perturbed. It uses weak radio frequency irradiation to saturate the excited state of a particular nuclei which transfers its saturation to the ground state.[56] This signal is amplified by decreasing the magnetization (and the signal) of the ground state.[56][58]

The main limitations in NMR is that its resolution decreases with proteins that are larger than 25 kDa and is not as detailed as X-ray crystallography.[56] Additionally, protein NMR analysis is quite difficult and can propose multiple solutions from the same NMR spectrum.[55]

In a study focused on the folding of an

amyotrophic lateral sclerosis involved protein SOD1, excited intermediates were studied with relaxation dispersion and Saturation transfer.[60] SOD1 had been previously tied to many disease causing mutants which were assumed to be involved in protein aggregation, however the mechanism was still unknown. By using Relaxation Dispersion and Saturation Transfer experiments many excited intermediate states were uncovered misfolding in the SOD1 mutants.[60]

Dual-polarization interferometry

conformation by determining the overall size of a monolayer of the protein and its density in real time at sub-Angstrom resolution,[61] although real-time measurement of the kinetics of protein folding are limited to processes that occur slower than ~10 Hz. Similar to circular dichroism, the stimulus for folding can be a denaturant or temperature
.

Studies of folding with high time resolution

The study of protein folding has been greatly advanced in recent years by the development of fast, time-resolved techniques. Experimenters rapidly trigger the folding of a sample of unfolded protein and observe the resulting

and Lars Konermann.

Proteolysis

Single-molecule force spectroscopy

Single molecule techniques such as optical tweezers and AFM have been used to understand protein folding mechanisms of isolated proteins as well as proteins with chaperones.[65] Optical tweezers have been used to stretch single protein molecules from their C- and N-termini and unfold them to allow study of the subsequent refolding.[66] The technique allows one to measure folding rates at single-molecule level; for example, optical tweezers have been recently applied to study folding and unfolding of proteins involved in blood coagulation. von Willebrand factor (vWF) is a protein with an essential role in blood clot formation process. It discovered – using single molecule optical tweezers measurement – that calcium-bound vWF acts as a shear force sensor in the blood. Shear force leads to unfolding of the A2 domain of vWF, whose refolding rate is dramatically enhanced in the presence of calcium.[67] Recently, it was also shown that the simple src SH3 domain accesses multiple unfolding pathways under force.[68]

Biotin painting

Biotin painting enables condition-specific cellular snapshots of (un)folded proteins. Biotin 'painting' shows a bias towards predicted Intrinsically disordered proteins.[69]

Computational studies of protein folding

Computational studies of protein folding includes three main aspects related to the prediction of protein stability, kinetics, and structure. A 2013 review summarizes the available computational methods for protein folding. [70]

Levinthal's paradox

In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. An estimate of 3300 or 10143 was made in one of his papers.[71] Levinthal's paradox is a thought experiment based on the observation that if a protein were folded by sequential sampling of all possible conformations, it would take an astronomical amount of time to do so, even if the conformations were sampled at a rapid rate (on the nanosecond or picosecond scale).[72] Based upon the observation that proteins fold much faster than this, Levinthal then proposed that a random conformational search does not occur, and the protein must, therefore, fold through a series of meta-stable intermediate states.

Energy landscape of protein folding

The energy funnel by which an unfolded polypeptide chain assumes its native structure

The

Peter Wolynes, proteins follow the principle of minimal frustration, meaning that naturally evolved proteins have optimized their folding energy landscapes,[73]
and that nature has chosen amino acid sequences so that the folded state of the protein is sufficiently stable. In addition, the acquisition of the folded state had to become a sufficiently fast process. Even though nature has reduced the level of frustration in proteins, some degree of it remains up to now as can be observed in the presence of local minima in the energy landscape of proteins.

A consequence of these evolutionarily selected sequences is that proteins are generally thought to have globally "funneled energy landscapes" (a term coined by José Onuchic)[74] that are largely directed toward the native state. This "folding funnel" landscape allows the protein to fold to the native state through any of a large number of pathways and intermediates, rather than being restricted to a single mechanism. The theory is supported by both computational simulations of model proteins and experimental studies,[73] and it has been used to improve methods for protein structure prediction and design.[73] The description of protein folding by the leveling free-energy landscape is also consistent with the 2nd law of thermodynamics.[75] Physically, thinking of landscapes in terms of visualizable potential or total energy surfaces simply with maxima, saddle points, minima, and funnels, rather like geographic landscapes, is perhaps a little misleading. The relevant description is really a high-dimensional phase space in which manifolds might take a variety of more complicated topological forms.[76]

The unfolded polypeptide chain begins at the top of the funnel where it may assume the largest number of unfolded variations and is in its highest energy state. Energy landscapes such as these indicate that there are a large number of initial possibilities, but only a single native state is possible; however, it does not reveal the numerous folding pathways that are possible. A different molecule of the same exact protein may be able to follow marginally different folding pathways, seeking different lower energy intermediates, as long as the same native structure is reached.[77] Different pathways may have different frequencies of utilization depending on the thermodynamic favorability of each pathway. This means that if one pathway is found to be more thermodynamically favorable than another, it is likely to be used more frequently in the pursuit of the native structure.[77] As the protein begins to fold and assume its various conformations, it always seeks a more thermodynamically favorable structure than before and thus continues through the energy funnel. Formation of secondary structures is a strong indication of increased stability within the protein, and only one combination of secondary structures assumed by the polypeptide backbone will have the lowest energy and therefore be present in the native state of the protein.[77] Among the first structures to form once the polypeptide begins to fold are alpha helices and beta turns, where alpha helices can form in as little as 100 nanoseconds and beta turns in 1 microsecond.[30]

There exists a saddle point in the energy funnel landscape where the transition state for a particular protein is found.[30] The transition state in the energy funnel diagram is the conformation that must be assumed by every molecule of that protein if the protein wishes to finally assume the native structure. No protein may assume the native structure without first passing through the transition state.[30] The transition state can be referred to as a variant or premature form of the native state rather than just another intermediary step.[78] The folding of the transition state is shown to be rate-determining, and even though it exists in a higher energy state than the native fold, it greatly resembles the native structure. Within the transition state, there exists a nucleus around which the protein is able to fold, formed by a process referred to as "nucleation condensation" where the structure begins to collapse onto the nucleus.[78]

Modeling of protein folding

Markov state models
, like the one diagrammed here, to model the possible shapes and folding pathways a protein can take as it condenses from its initial randomly coiled state (left) into its native 3D structure (right).

De novo or ab initio techniques for computational protein structure prediction can be used for simulating various aspects of protein folding. Molecular dynamics (MD) was used in simulations of protein folding and dynamics in silico.[79] First equilibrium folding simulations were done using implicit solvent model and umbrella sampling.[80] Because of computational cost, ab initio MD folding simulations with explicit water are limited to peptides and very small proteins.[81][82] MD simulations of larger proteins remain restricted to dynamics of the experimental structure or its high-temperature unfolding. Long-time folding processes (beyond about 1 millisecond), like folding of small-size proteins (about 50 residues) or larger, can be accessed using coarse-grained models.[83][84][85]

Several large-scale computational projects, such as Rosetta@home,[86] Folding@home[87] and Foldit,[88] target protein folding.

Long continuous-trajectory simulations have been performed on

ASICs and interconnects by D. E. Shaw Research. The longest published result of a simulation performed using Anton is a 2.936 millisecond simulation of NTL9 at 355 K.[89] The simulations are currently able to unfold and refold small (<150 amino acids residues) proteins and predict how mutations affect folding kinetics and stability. [90]

In 2020 a team of researchers that used

DeepMind placed first in CASP.[91] The team achieved a level of accuracy much higher than any other group.[92] It scored above 90 for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree to which a computational program predicted structure is similar to the lab experiment determined structure, with 100 being a complete match, within the distance cutoff used for calculating GDT.[93]

AlphaFold's protein structure prediction results at CASP were described as "transformational" and "astounding".

protein folding problem to be considered solved.[96] Nevertheless, it is considered a significant achievement in computational biology[93] and great progress towards a decades-old grand challenge of biology.[94]

See also

References

  1. .
  2. .
  3. .
  4. ^ .
  5. .
  6. .
  7. .
  8. .
  9. .
  10. .
  11. .
  12. ^ .
  13. .
  14. .
  15. ^ .
  16. ^ "Protein Structure". Scitable. Nature Education. Retrieved 2016-11-26.
  17. . Retrieved 2016-11-26.
  18. .
  19. .
  20. ^ Al-Karadaghi S. "Torsion Angles and the Ramachnadran Plot in Protein Structures". www.proteinstructures.com. Retrieved 2016-11-26.
  21. S2CID 20021399
    .
  22. .
  23. .
  24. .
  25. .
  26. .
  27. .
  28. .
  29. .
  30. ^ .
  31. ^ .
  32. ^ .
  33. .
  34. .
  35. .
  36. .
  37. .
  38. .
  39. .
  40. ^ .
  41. .
  42. ^ .
  43. ^ .
  44. .
  45. .
  46. .
  47. ^ a b Cowtan K (2001). "Phase Problem in X-ray Crystallography, and Its Solution" (PDF). Encyclopedia of Life Sciences. Macmillan Publishers Ltd, Nature Publishing Group. Retrieved November 3, 2016.
  48. .
  49. .
  50. ^ .
  51. .
  52. .
  53. .
  54. .
  55. ^ .
  56. ^ .
  57. ^ .
  58. ^ .
  59. ^ .
  60. ^ .
  61. .
  62. .
  63. .
  64. .
  65. .
  66. .
  67. .
  68. .
  69. .
  70. .
  71. ^ "Structural Biochemistry/Proteins/Protein Folding - Wikibooks, open books for an open world". en.wikibooks.org. Retrieved 2016-11-05.
  72. (PDF) on 2009-09-02.
  73. ^ .
  74. .
  75. .
  76. .
  77. ^ .
  78. ^ .
  79. .
  80. .
  81. ^ Jones D. "Fragment-based Protein Folding Simulations". University College London.
  82. ^ "Protein folding" (by Molecular Dynamics).
  83. PMID 27333362
    .
  84. .
  85. .
  86. ^ "Rosetta@home". boinc.bakerlab.org. Retrieved 14 March 2023.
  87. ^ "The Folding@home Consortium (FAHC) – Folding@home". Retrieved 14 March 2023.
  88. ^ "Foldit". fold.it. Retrieved 14 March 2023.
  89. S2CID 27988268
    .
  90. .
  91. ^ Shead, Sam (2020-11-30). "DeepMind solves 50-year-old 'grand challenge' with protein folding A.I." CNBC. Retrieved 2020-11-30.
  92. S2CID 247206999
    . Retrieved 25 March 2022.
  93. ^
    Science
    , 30 November 2020
  94. ^ .
  95. ^ @MoAlQuraishi (November 30, 2020). "CASP14 #s just came out and they're astounding" (Tweet) – via Twitter.
  96. ^ Balls, Phillip (9 December 2020). "Behind the screens of AlphaFold". Chemistry World.

External links