Protein secondary structure

Protein secondary structure is the local spatial conformation of the

polypeptide backbone excluding the side chains.^[1] The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure

.

Secondary structure is formally defined by the pattern of

backbone. Secondary structure may alternatively be defined based on the regular pattern of backbone dihedral angles in a particular region of the Ramachandran plot

regardless of whether it has the correct hydrogen bonds.

The concept of secondary structure was first introduced by

Stanford in 1952.^[2]^[3] Other types of biopolymers such as nucleic acids also possess characteristic secondary structures

.

Types

Structural features of the three major forms of protein helices^[4]^[5]
Geometry attribute	α-helix	3₁₀ helix	π-helix
Residues per turn	3.6	3.0	4.4
Translation per residue	1.5 Å (0.15 nm)	2.0 Å (0.20 nm)	1.1 Å (0.11 nm)
Radius of helix	2.3 Å (0.23 nm)	1.9 Å (0.19 nm)	2.8 Å (0.28 nm)
Pitch	5.4 Å (0.54 nm)	6.0 Å (0.60 nm)	4.8 Å (0.48 nm)

The most common secondary structures are alpha helices and beta sheets. Other helices, such as the 3₁₀ helix and π helix, are calculated to have energetically favorable hydrogen-bonding patterns but are rarely observed in natural proteins except at the ends of α helices due to unfavorable backbone packing in the center of the helix. Other extended structures such as the polyproline helix and alpha sheet are rare in native state proteins but are often hypothesized as important protein folding intermediates. Tight turns and loose, flexible loops link the more "regular" secondary structure elements. The random coil is not a true secondary structure, but is the class of conformations that indicate an absence of regular secondary structure.

glutamate and lysine ("MALEK" in amino-acid 1-letter codes); by contrast, the large aromatic residues (tryptophan, tyrosine and phenylalanine) and C^β-branched amino acids (isoleucine, valine, and threonine) prefer to adopt β-strand

conformations. However, these preferences are not strong enough to produce a reliable method of predicting secondary structure from sequence alone.

Low frequency collective vibrations are thought to be sensitive to local rigidity within proteins, revealing beta structures to be generically more rigid than alpha or disordered proteins.[6]^[7] Neutron scattering measurements have directly connected the spectral feature at ~1 THz to collective motions of the secondary structure of beta-barrel protein GFP.^[8]

Hydrogen bonding patterns in secondary structures may be significantly distorted, which makes automatic determination of secondary structure difficult. There are several methods for formally defining protein secondary structure (e.g.,

DSSP,^[9] DEFINE,^[10] STRIDE,^[11] ScrewFit,^[12] SST^[13]

).

DSSP classification

Distribution obtained from non-redundant pdb_select dataset (March 2006); Secondary structure assigned by DSSP; 8 conformational states reduced to 3 states: H=HGI, E=EB, C=STC. Visible are mixtures of (gaussian) distributions, resulting also from the reduction of DSSP states.

The Dictionary of Protein Secondary Structure, in short DSSP, is commonly used to describe the protein secondary structure with single letter codes. The secondary structure is assigned based on hydrogen bonding patterns as those initially proposed by Pauling et al. in 1951 (before any protein structure had ever been experimentally determined). There are eight types of secondary structure that DSSP defines:

G = 3-turn helix (
3₁₀ helix
). Min length 3 residues.
H = 4-turn helix (
α helix
). Minimum length 4 residues.
I = 5-turn helix (
π helix
). Minimum length 5 residues.
T = hydrogen bonded turn (3, 4 or 5 turn)
E = extended strand in parallel and/or anti-parallel
β-sheet
conformation. Min length 2 residues.
B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)
S = bend (the only non-hydrogen-bond based assignment).
C = coil (residues which are not in any of the above conformations).

'Coil' is often codified as ' ' (space), C (coil) or '–' (dash). The helices (G, H and I) and sheet conformations are all required to have a reasonable length. This means that 2 adjacent residues in the primary structure must form the same hydrogen bonding pattern. If the helix or sheet hydrogen bonding pattern is too short they are designated as T or B, respectively. Other protein secondary structure assignment categories exist (sharp turns, Omega loops, etc.), but they are less frequently used.

Secondary structure is defined by hydrogen bonding, so the exact definition of a hydrogen bond is critical. The standard hydrogen-bond definition for secondary structure is that of DSSP, which is a purely electrostatic model. It assigns charges of ±q₁ ≈ 0.42e to the carbonyl carbon and oxygen, respectively, and charges of ±q₂ ≈ 0.20e to the amide hydrogen and nitrogen, respectively. The electrostatic energy is

E=q_{1}q_{2}\left({\frac {1}{r_{\mathrm {ON} }}}+{\frac {1}{r_{\mathrm {CH} }}}-{\frac {1}{r_{\mathrm {OH} }}}-{\frac {1}{r_{\mathrm {CN} }}}\right)\cdot 332{\text{ kcal/mol}}.

According to DSSP, a hydrogen-bond exists if and only if E is less than −0.5 kcal/mol (−2.1 kJ/mol). Although the DSSP formula is a relatively crude approximation of the physical hydrogen-bond energy, it is generally accepted as a tool for defining secondary structure.

SST^[13] classification

SST is a Bayesian method to assign secondary structure to protein coordinate data using the Shannon information criterion of Minimum Message Length (

lossless data compression. SST accurately delineates any protein chain into regions associated with the following assignment types:^[14]

E = (Extended) strand of a β-pleated sheet
G = Right-handed 3₁₀ helix
H = Right-handed α-helix
I = Right-handed π-helix
g = Left-handed 3₁₀ helix
h = Left-handed α-helix
i = Left-handed π-helix
3 = 3₁₀-like Turn
4 = α-like Turn
5 = π-like Turn
T = Unspecified Turn
C = Coil
- = Unassigned residue

SST detects π and 3₁₀ helical caps to standard α-helices, and automatically assembles the various extended strands into consistent β-pleated sheets. It provides a readable output of dissected secondary structural elements, and a corresponding PyMol-loadable script to visualize the assigned secondary structural elements individually.

Experimental determination

The rough secondary-structure content of a biopolymer (e.g., "this protein is 40% α-helix and 20% β-sheet.") can be estimated spectroscopically.^[15] For proteins, a common method is far-ultraviolet (far-UV, 170–250 nm) circular dichroism. A pronounced double minimum at 208 and 222 nm indicate α-helical structure, whereas a single minimum at 204 nm or 217 nm reflects random-coil or β-sheet structure, respectively. A less common method is infrared spectroscopy, which detects differences in the bond oscillations of amide groups due to hydrogen-bonding. Finally, secondary-structure contents may be estimated accurately using the chemical shifts of an initially unassigned NMR spectrum.^[16]

Prediction

Predicting protein tertiary structure from only its amino sequence is a very challenging problem (see protein structure prediction), but using the simpler secondary structure definitions is more tractable.

Early methods of secondary-structure prediction were restricted to predicting the three predominate states: helix, sheet, or random coil. These methods were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. The first widely used techniques to predict protein secondary structure from the amino acid sequence were the Chou–Fasman method^[17]^[18]^[19] and the GOR method.^[20] Although such methods claimed to achieve ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts, blind computing assessments later showed that the actual accuracy was much lower.^[21]

A significant increase in accuracy (to nearly ~80%) was made by exploiting

neural networks, hidden Markov models and support vector machines

. Modern prediction methods also provide a confidence score for their predictions at every position.

Secondary-structure prediction methods were evaluated by the Critical Assessment of protein Structure Prediction (CASP) experiments and continuously benchmarked, e.g. by

Psipred, SAM,^[24] PORTER,^[25] PROF,^[26] and SABLE.^[27] The chief area for improvement appears to be the prediction of β-strands; residues confidently predicted as β-strand are likely to be so, but the methods are apt to overlook some β-strand segments (false negatives). There is likely an upper limit of ~90% prediction accuracy overall, due to the idiosyncrasies of the standard method (DSSP) for assigning secondary-structure classes (helix/strand/coil) to PDB structures, against which the predictions are benchmarked.^[28]

Accurate secondary-structure prediction is a key element in the prediction of

tertiary structure, in all but the simplest (homology modeling) cases. For example, a confidently predicted pattern of six secondary structure elements βαββαβ is the signature of a ferredoxin fold.^[29]

Applications

Both protein and nucleic acid secondary structures can be used to aid in multiple sequence alignment. These alignments can be made more accurate by the inclusion of secondary structure information in addition to simple sequence information. This is sometimes less useful in RNA because base pairing is much more highly conserved than sequence. Distant relationships between proteins whose primary structures are unalignable can sometimes be found by secondary structure.^[22]

It has been shown that α-helices are more stable, robust to mutations, and designable than β-strands in natural proteins,^[30] thus designing functional all-α proteins is likely to be easier that designing proteins with both helices and strands; this has been recently confirmed experimentally.^[31]

References

PMID 18429251
.

ASIN B0007J31SC
.

PMID 9144781
. He had already introduced the concepts of the primary, secondary, and tertiary structure of proteins in the third Lane Lecture (Linderstram-Lang, 1952)

^ Bottomley S (2004). "Interactive Protein Structure Tutorial". Archived from the original on March 1, 2011. Retrieved January 9, 2011.

OCLC 4498269
.

PMID 26029761
.

PMID 24940784
.

PMID 24209864
.

S2CID 29185760
.

S2CID 29126855
.

S2CID 17487756. Archived from the original
(PDF) on 2010-06-13.

PMID 23151634
.

^
PMID 22689785
.

^ "SST web server". Retrieved 17 April 2018.

PMID 10625503
.

PMID 14668443
.

PMID 4358940
.

PMID 354496
.

PMID 364941
.

PMID 642007
.

S2CID 41477827
.

^
PMID 15320732
.

PMID 20221928
.

PMID 19483096
.

PMID 15585524
.

PMID 24799431
.

S2CID 13267624
.

PMID 15987894
.

S2CID 823339
. Since the fold definition should include only the core secondary structural elements that are present in the majority of homologs, we define the thioredoxin-like fold as a two-layer α/β sandwich with the βαβββα secondary-structure pattern.

PMID 27935949
.

PMID 28706065
.

Further reading

Branden C, Tooze J (1999). Introduction to protein structure (2nd ed.). New York: Garland Science.
ISBN 978-0815323051
.

PMID 16578412
. (The original beta-sheet conformation article.)

PMID 14816373
. (alpha- and pi-helix conformations, since they predicted that $3_{10}$ helices would not be possible.)

External links

NetSurfP – Secondary Structure and Surface Accessibility predictor

PROF

ScrewFit

PSSpred A multiple neural network training program for protein secondary structure prediction

Genesilico metaserver Metaserver which allows to run over 20 different secondary structure predictors by one click

SST webserver: An information-theoretic (compression-based) secondary structural assignment.

v
t
e
Protein secondary structure
Protein secondary structure
Helices:

α-helix

3₁₀ helix

π-helix

β-helix

Polyproline helix

Collagen helix

Extended:

β-strand

Turn
Beta turn

Beta hairpin

Beta bulge

α-strand

Supersecondary:

Coiled coil

Helix-turn-helix

v
t
e
Biomolecular structure
Protein structure

Primary

Secondary

Tertiary

Quaternary

Determination

Prediction

Design

Thermodynamics

Nucleic acid structure

Primary

Secondary

Tertiary

Quaternary

Determination

Prediction

Design

Thermodynamics

See also

Protein

Protein domain

Protein engineering

Proteasome

Nucleic acid

DNA

RNA

Structural motif

Nucleic acid double helix

Retrieved from "https://en.wikipedia.org/w/index.php?title=Protein_secondary_structure&oldid=1203286980"

[1] PMID 18429251
.

[2] ASIN B0007J31SC
.

[pmid9144781-3] PMID 9144781
. He had already introduced the concepts of the primary, secondary, and tertiary structure of proteins in the third Lane Lecture (Linderstram-Lang, 1952)

[4] Bottomley S (2004). "Interactive Protein Structure Tutorial". Archived from the original on March 1, 2011. Retrieved January 9, 2011.

[5] OCLC 4498269
.

[6] PMID 26029761
.

[7] PMID 24940784
.

[8] PMID 24209864
.

[9] S2CID 29185760
.

[10] S2CID 29126855
.

[11] S2CID 17487756. Archived from the original
(PDF) on 2010-06-13.

[12] PMID 23151634
.

[:0-13] 
PMID 22689785
.

[14] "SST web server". Retrieved 17 April 2018.

[Pelton_McLean_2000-15] PMID 10625503
.

[pmid14668443-16] PMID 14668443
.

[Chou_predict0-17] PMID 4358940
.

[Chou_predict1-18] PMID 354496
.

[Chou_predict2-19] PMID 364941
.

[Garnier-20] PMID 642007
.

[Kabsch-21] S2CID 41477827
.

[Simossis_2004-22] 
PMID 15320732
.

[pmid20221928-23] PMID 20221928
.

[pmid19483096-24] PMID 19483096
.

[pmid15585524-25] PMID 15585524
.

[pmid24799431-26] PMID 24799431
.

[pmid15768403-27] S2CID 13267624
.

[28] PMID 15987894
.

[pmid15558583-29] S2CID 823339
. Since the fold definition should include only the core secondary structural elements that are present in the majority of homologs, we define the thioredoxin-like fold as a two-layer α/β sandwich with the βαβββα secondary-structure pattern.

[30] PMID 27935949
.

[31] PMID 28706065
.

[1]

[2]

[3]

[4]

[5]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[24]

[25]

[26]

[27]

[28]

[29]

[22]

[30]

[31]

Types

DSSP classification

SST[13] classification

Experimental determination

Prediction

Applications

See also

References

Further reading

External links

SST^[13] classification