Protein folding
Protein folding is the
The folding of many proteins begins even during the translation of the polypeptide chain. The amino acids interact with each other to produce a well-defined three-dimensional structure, known as the protein's
The correct three-dimensional structure is essential to function, although some parts of functional proteins
Denaturation of proteins is a process of transition from a folded to an unfolded state. It happens in cooking, burns, proteinopathies, and other contexts. Residual structure present, if any, in the supposedly unfolded state may form a folding initiation site and guide the subsequent folding reactions. [6]
The duration of the folding process varies dramatically depending on the protein of interest. When studied outside the cell, the slowest folding proteins require many minutes or hours to fold, primarily due to proline isomerization, and must pass through a number of intermediate states, like checkpoints, before the process is complete.[7] On the other hand, very small single-domain proteins with lengths of up to a hundred amino acids typically fold in a single step.[8] Time scales of milliseconds are the norm, and the fastest known protein folding reactions are complete within a few microseconds.[9] The folding time scale of a protein depends on its size, contact order, and circuit topology.[10]
Understanding and simulating the protein folding process has been an important challenge for computational biology since the late 1960s.
Process of protein folding
Primary structure
The primary structure of a protein, its linear amino-acid sequence, determines its native conformation.[11] The specific amino acid residues and their position in the polypeptide chain are the determining factors for which portions of the protein fold closely together and form its three-dimensional conformation. The amino acid composition is not as important as the sequence.[12] The essential fact of folding, however, remains that the amino acid sequence of each protein contains the information that specifies both the native structure and the pathway to attain that state. This is not to say that nearly identical amino acid sequences always fold similarly.[13] Conformations differ based on environmental factors as well; similar proteins fold differently based on where they are found.
Secondary structure
Formation of a Protein secondary structure|secondary structure is the first step in the folding process that a protein takes to assume its native structure. Characteristic of secondary structure are the structures known as
Tertiary structure
The α-Helices and β-Sheets are commonly amphipathic, meaning they have a hydrophilic and a hydrophobic portion. This ability helps in forming tertiary structure of a protein in which folding occurs so that the hydrophilic sides are facing the aqueous environment surrounding the protein and the hydrophobic sides are facing the hydrophobic core of the protein.
Quaternary structure
Tertiary structure may give way to the formation of quaternary structure in some proteins, which usually involves the "assembly" or "coassembly" of subunits that have already folded; in other words, multiple polypeptide chains could interact to form a fully functional quaternary protein.[12]
Driving forces of protein folding
Folding is a
Proteins will have limitations on their folding abilities by the restricted bending angles or conformations that are possible. These allowable angles of protein folding are described with a two-dimensional plot known as the Ramachandran plot, depicted with psi and phi angles of allowable rotation.[20]
Hydrophobic effect
Protein folding must be thermodynamically favorable within a cell in order for it to be a spontaneous reaction. Since it is known that protein folding is a spontaneous reaction, then it must assume a negative Gibbs free energy value. Gibbs free energy in protein folding is directly related to enthalpy and entropy.[12] For a negative delta G to arise and for protein folding to become thermodynamically favorable, then either enthalpy, entropy, or both terms must be favorable.
Minimizing the number of hydrophobic side-chains exposed to water is an important driving force behind the folding process.
In proteins with globular folds, hydrophobic amino acids tend to be interspersed along the primary sequence, rather than randomly distributed or clustered together.[25][26] However, proteins that have recently been born de novo, which tend to be intrinsically disordered,[27][28] show the opposite pattern of hydrophobic amino acid clustering along the primary sequence.[29]
Chaperones
A fully denatured protein lacks both tertiary and secondary structure, and exists as a so-called
Under some conditions proteins will not fold into their biochemically functional forms. Temperatures above or below the range that cells tend to live in will cause
The bacterium
Fold switching
Some proteins have multiple native structures, and change their fold based on some external factors. For example, the KaiB protein switches fold throughout the day, acting as a clock for cyanobacteria. It has been estimated that around 0.5–4% of PDB (Protein Data Bank) proteins switch folds.[41]
Protein misfolding and neurodegenerative disease
A protein is considered to be
Aggregated proteins are associated with
Experimental techniques for studying protein folding
While inferences about protein folding can be made through mutation studies, typically, experimental techniques for studying protein folding rely on the gradual unfolding or folding of proteins and observing conformational changes using standard non-crystallographic techniques.
X-ray crystallography
Fluorescence spectroscopy
Fluorescence spectroscopy is a highly sensitive method for studying the folding state of proteins. Three amino acids, phenylalanine (Phe), tyrosine (Tyr) and tryptophan (Trp), have intrinsic fluorescence properties, but only Tyr and Trp are used experimentally because their quantum yields are high enough to give good fluorescence signals. Both Trp and Tyr are excited by a wavelength of 280 nm, whereas only Trp is excited by a wavelength of 295 nm. Because of their aromatic character, Trp and Tyr residues are often found fully or partially buried in the hydrophobic core of proteins, at the interface between two protein domains, or at the interface between subunits of oligomeric proteins. In this apolar environment, they have high quantum yields and therefore high fluorescence intensities. Upon disruption of the protein's tertiary or quaternary structure, these side chains become more exposed to the hydrophilic environment of the solvent, and their quantum yields decrease, leading to low fluorescence intensities. For Trp residues, the wavelength of their maximal fluorescence emission also depend on their environment.
Fluorescence spectroscopy can be used to characterize the
Circular dichroism
Vibrational circular dichroism of proteins
The more recent developments of
Protein nuclear magnetic resonance spectroscopy
Protein nuclear magnetic resonance (NMR) is able to collect protein structural data by inducing a magnet field through samples of concentrated protein. In NMR, depending on the chemical environment, certain nuclei will absorb specific radio-frequencies.[55][56] Because protein structural changes operate on a time scale from ns to ms, NMR is especially equipped to study intermediate structures in timescales of ps to s.[57] Some of the main techniques for studying proteins structure and non-folding protein structural changes include COSY, TOCSY, HSQC, time relaxation (T1 & T2), and NOE.[55] NOE is especially useful because magnetization transfers can be observed between spatially proximal hydrogens are observed.[55] Different NMR experiments have varying degrees of timescale sensitivity that are appropriate for different protein structural changes. NOE can pick up bond vibrations or side chain rotations, however, NOE is too sensitive to pick up protein folding because it occurs at larger timescale.[57]
Because protein folding takes place in about 50 to 3000 s−1 CPMG Relaxation dispersion and chemical exchange saturation transfer have become some of the primary techniques for NMR analysis of folding.[56] In addition, both techniques are used to uncover excited intermediate states in the protein folding landscape.[58] To do this, CPMG Relaxation dispersion takes advantage of the spin echo phenomenon. This technique exposes the target nuclei to a 90 pulse followed by one or more 180 pulses.[59] As the nuclei refocus, a broad distribution indicates the target nuclei is involved in an intermediate excited state. By looking at Relaxation dispersion plots the data collect information on the thermodynamics and kinetics between the excited and ground.[59][58] Saturation Transfer measures changes in signal from the ground state as excited states become perturbed. It uses weak radio frequency irradiation to saturate the excited state of a particular nuclei which transfers its saturation to the ground state.[56] This signal is amplified by decreasing the magnetization (and the signal) of the ground state.[56][58]
The main limitations in NMR is that its resolution decreases with proteins that are larger than 25 kDa and is not as detailed as X-ray crystallography.[56] Additionally, protein NMR analysis is quite difficult and can propose multiple solutions from the same NMR spectrum.[55]
In a study focused on the folding of an
Dual-polarization interferometry
Studies of folding with high time resolution
The study of protein folding has been greatly advanced in recent years by the development of fast, time-resolved techniques. Experimenters rapidly trigger the folding of a sample of unfolded protein and observe the resulting
Proteolysis
Single-molecule force spectroscopy
Single molecule techniques such as optical tweezers and AFM have been used to understand protein folding mechanisms of isolated proteins as well as proteins with chaperones.[65] Optical tweezers have been used to stretch single protein molecules from their C- and N-termini and unfold them to allow study of the subsequent refolding.[66] The technique allows one to measure folding rates at single-molecule level; for example, optical tweezers have been recently applied to study folding and unfolding of proteins involved in blood coagulation. von Willebrand factor (vWF) is a protein with an essential role in blood clot formation process. It discovered – using single molecule optical tweezers measurement – that calcium-bound vWF acts as a shear force sensor in the blood. Shear force leads to unfolding of the A2 domain of vWF, whose refolding rate is dramatically enhanced in the presence of calcium.[67] Recently, it was also shown that the simple src SH3 domain accesses multiple unfolding pathways under force.[68]
Biotin painting
Biotin painting enables condition-specific cellular snapshots of (un)folded proteins. Biotin 'painting' shows a bias towards predicted Intrinsically disordered proteins.[69]
Computational studies of protein folding
Computational studies of protein folding includes three main aspects related to the prediction of protein stability, kinetics, and structure. A 2013 review summarizes the available computational methods for protein folding. [70]
Levinthal's paradox
In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. An estimate of 3300 or 10143 was made in one of his papers.[71] Levinthal's paradox is a thought experiment based on the observation that if a protein were folded by sequential sampling of all possible conformations, it would take an astronomical amount of time to do so, even if the conformations were sampled at a rapid rate (on the nanosecond or picosecond scale).[72] Based upon the observation that proteins fold much faster than this, Levinthal then proposed that a random conformational search does not occur, and the protein must, therefore, fold through a series of meta-stable intermediate states.
Energy landscape of protein folding
The
A consequence of these evolutionarily selected sequences is that proteins are generally thought to have globally "funneled energy landscapes" (a term coined by José Onuchic)[74] that are largely directed toward the native state. This "folding funnel" landscape allows the protein to fold to the native state through any of a large number of pathways and intermediates, rather than being restricted to a single mechanism. The theory is supported by both computational simulations of model proteins and experimental studies,[73] and it has been used to improve methods for protein structure prediction and design.[73] The description of protein folding by the leveling free-energy landscape is also consistent with the 2nd law of thermodynamics.[75] Physically, thinking of landscapes in terms of visualizable potential or total energy surfaces simply with maxima, saddle points, minima, and funnels, rather like geographic landscapes, is perhaps a little misleading. The relevant description is really a high-dimensional phase space in which manifolds might take a variety of more complicated topological forms.[76]
The unfolded polypeptide chain begins at the top of the funnel where it may assume the largest number of unfolded variations and is in its highest energy state. Energy landscapes such as these indicate that there are a large number of initial possibilities, but only a single native state is possible; however, it does not reveal the numerous folding pathways that are possible. A different molecule of the same exact protein may be able to follow marginally different folding pathways, seeking different lower energy intermediates, as long as the same native structure is reached.[77] Different pathways may have different frequencies of utilization depending on the thermodynamic favorability of each pathway. This means that if one pathway is found to be more thermodynamically favorable than another, it is likely to be used more frequently in the pursuit of the native structure.[77] As the protein begins to fold and assume its various conformations, it always seeks a more thermodynamically favorable structure than before and thus continues through the energy funnel. Formation of secondary structures is a strong indication of increased stability within the protein, and only one combination of secondary structures assumed by the polypeptide backbone will have the lowest energy and therefore be present in the native state of the protein.[77] Among the first structures to form once the polypeptide begins to fold are alpha helices and beta turns, where alpha helices can form in as little as 100 nanoseconds and beta turns in 1 microsecond.[30]
There exists a saddle point in the energy funnel landscape where the transition state for a particular protein is found.[30] The transition state in the energy funnel diagram is the conformation that must be assumed by every molecule of that protein if the protein wishes to finally assume the native structure. No protein may assume the native structure without first passing through the transition state.[30] The transition state can be referred to as a variant or premature form of the native state rather than just another intermediary step.[78] The folding of the transition state is shown to be rate-determining, and even though it exists in a higher energy state than the native fold, it greatly resembles the native structure. Within the transition state, there exists a nucleus around which the protein is able to fold, formed by a process referred to as "nucleation condensation" where the structure begins to collapse onto the nucleus.[78]
Modeling of protein folding
De novo or ab initio techniques for computational protein structure prediction can be used for simulating various aspects of protein folding. Molecular dynamics (MD) was used in simulations of protein folding and dynamics in silico.[79] First equilibrium folding simulations were done using implicit solvent model and umbrella sampling.[80] Because of computational cost, ab initio MD folding simulations with explicit water are limited to peptides and very small proteins.[81][82] MD simulations of larger proteins remain restricted to dynamics of the experimental structure or its high-temperature unfolding. Long-time folding processes (beyond about 1 millisecond), like folding of small-size proteins (about 50 residues) or larger, can be accessed using coarse-grained models.[83][84][85]
Several large-scale computational projects, such as Rosetta@home,[86] Folding@home[87] and Foldit,[88] target protein folding.
Long continuous-trajectory simulations have been performed on
In 2020 a team of researchers that used
AlphaFold's protein structure prediction results at CASP were described as "transformational" and "astounding".
See also
- Anfinsen's dogma
- Chevron plot
- Denaturation midpoint
- Downhill folding
- Folding (chemistry)
- Phi value analysis
- Potential energy of protein
- Protein dynamics
- Protein misfolding cyclic amplification
- Protein structure prediction software
- Proteopathy
- Time-resolved mass spectrometry
References
- ISBN 978-0-8153-3218-3.
- PMID 4565129.
- ISBN 978-0-7167-4684-3.
- ^ S2CID 6451881.
- ISBN 978-0-8153-4454-4.
- PMID 33142107.
- PMID 2197986.
- PMID 9710577.
- PMID 15102453.
- S2CID 237583577.
- PMID 4124164.
- ^ ISBN 978-1-118-91840-1.
- PMID 17609385.
- PMID 17075053.
- ^ ISBN 978-0-7167-3268-6.
- ^ "Protein Structure". Scitable. Nature Education. Retrieved 2016-11-26.
- ISBN 978-0-471-39387-0. Retrieved 2016-11-26.
- PMID 21111607.
- PMID 10921869.
- ^ Al-Karadaghi S. "Torsion Angles and the Ramachnadran Plot in Protein Structures". www.proteinstructures.com. Retrieved 2016-11-26.
- S2CID 20021399.
- S2CID 27113763.
- PMID 653353.
- S2CID 4315026.
- PMID 11053106.
- PMID 8790365.
- PMID 28642936.
- PMID 30026186.
- PMID 30692195.
- ^ S2CID 1036192.
- ^ S2CID 4347271.
- ^ S2CID 4337671.
- PMID 23746257.
- S2CID 24066207.
- PMID 15943899.
- PMID 20643079.
- PMID 10601015.
- PMID 16716593.
- PMID 18664583.
- ^ PMID 9556522.
- PMID 29784778.
- ^ S2CID 23370420.
- ^ PMID 16473510.
- S2CID 30829998.
- S2CID 23797549.
- PMID 16359163.
- ^ a b Cowtan K (2001). "Phase Problem in X-ray Crystallography, and Its Solution" (PDF). Encyclopedia of Life Sciences. Macmillan Publishers Ltd, Nature Publishing Group. Retrieved November 3, 2016.
- ISBN 978-0-387-33746-3.
- PMID 14573942.
- ^ PMID 26607240.
- PMID 16087653.
- PMID 9660761.
- PMID 22640394.
- PMID 16683754.
- ^ PMID 2266107.
- ^ PMID 28552172.
- ^ PMID 23954103.
- ^ PMID 22554188.
- ^ PMID 19289032.
- ^ PMID 27791136.
- ISBN 978-0-470-01905-4.
- PMID 11575938.
- PMID 23056252.
- S2CID 21364478.
- PMID 24001118.
- PMID 23784721.
- PMID 21750539.
- PMID 22949695.
- doi:10.1101/274761.
- PMID 24187909.
- ^ "Structural Biochemistry/Proteins/Protein Folding - Wikibooks, open books for an open world". en.wikibooks.org. Retrieved 2016-11-05.
- doi:10.1051/jcp/1968650044. Archived from the original(PDF) on 2009-09-02.
- ^ S2CID 13838095.
- PMID 1528885.
- .
- PMID 19121702.
- ^ S2CID 5756068.
- ^ PMID 10677494.
- PMID 23266569.
- PMID 9826519.
- ^ Jones D. "Fragment-based Protein Folding Simulations". University College London.
- ^ "Protein folding" (by Molecular Dynamics).
- PMID 27333362.
- PMID 17636132.
- PMID 23045636.
- ^ "Rosetta@home". boinc.bakerlab.org. Retrieved 14 March 2023.
- ^ "The Folding@home Consortium (FAHC) – Folding@home". Retrieved 14 March 2023.
- ^ "Foldit". fold.it. Retrieved 14 March 2023.
- S2CID 27988268.
- PMID 20974152.
- ^ Shead, Sam (2020-11-30). "DeepMind solves 50-year-old 'grand challenge' with protein folding A.I." CNBC. Retrieved 2020-11-30.
- S2CID 247206999. Retrieved 25 March 2022.
- ^ Science, 30 November 2020
- ^ S2CID 227243204.
- ^ @MoAlQuraishi (November 30, 2020). "CASP14 #s just came out and they're astounding" (Tweet) – via Twitter.
- ^ Balls, Phillip (9 December 2020). "Behind the screens of AlphaFold". Chemistry World.