Protein structure

Protein structure is the

, to determine the structure of proteins.

Protein structures range in size from tens to several thousand amino acids.

protein complexes can be formed from protein subunits. For example, many thousands of actin molecules assemble into a microfilament

.

A protein usually undergoes reversible structural changes in performing its biological function. The alternative structures of the same protein are referred to as different conformations, and transitions between them are called conformational changes.

Levels of protein structure

There are four distinct levels of protein structure.

Primary structure

The

water molecule is lost, and therefore proteins are made up of amino acid residues. Post-translational modifications such as phosphorylations and glycosylations are usually also considered a part of the primary structure, and cannot be read from the gene. For example, insulin

is composed of 51 amino acids in 2 chains. One chain has 31 amino acids, and the other has 20 amino acids.

Secondary structure

hydrogen bonds between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on the Ramachandran plot. Both the α-helix and the β-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused with random coil, an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "supersecondary unit".^[6]

Tertiary structure

disulfide bonds. The disulfide bonds are extremely rare in cytosolic proteins, since the cytosol (intracellular fluid) is generally a reducing

environment.

Quaternary structure

Quaternary structure is the three-dimensional structure consisting of the aggregation of two or more individual polypeptide chains (subunits) that operate as a single functional unit (

dimer if it contains two subunits, a trimer if it contains three subunits, a tetramer if it contains four subunits, and a pentamer if it contains five subunits, and so forth. The subunits are frequently related to one another by symmetry operations, such as a 2-fold axis in a dimer. Multimers made up of identical subunits are referred to with a prefix of "homo-" and those made up of different subunits are referred to with a prefix of "hetero-", for example, a heterotetramer, such as the two alpha and two beta chains of hemoglobin

.

Domains, motifs, and folds in protein structure

Proteins are frequently described as consisting of several structural units. These units include domains,

eukaryotic

systems, there are many fewer different domains, structural motifs and folds.

Structural domain

A

chimera proteins. A conservative combination of several domains that occur in different proteins, such as protein tyrosine phosphatase domain and C2 domain pair, was called "a superdomain" that may evolve as a single unit.^[8]

Structural and sequence motifs

The structural and sequence motifs refer to short segments of protein three-dimensional structure or amino acid sequence that were found in a large number of different proteins

Supersecondary structure

Tertiary protein structures can have multiple secondary elements on the same polypeptide chain. The

secondary structure elements, such as β-α-β units or a helix-turn-helix

motif. Some of them may be also referred to as structural motifs.

Protein fold

A protein fold refers to the general protein architecture, like a helix bundle, β-barrel, Rossmann fold or different "folds" provided in the Structural Classification of Proteins database.^[9] A related concept is protein topology.

Protein dynamics and conformational ensembles

Proteins are not static objects, but rather populate ensembles of

allostery via protein domain dynamics. "^[13]

Proteins are often thought of as relatively stable tertiary structures that experience conformational changes after being affected by interactions with other proteins or as a part of enzymatic activity. However, proteins may have varying degrees of stability, and some of the less stable variants are intrinsically disordered proteins. These proteins exist and function in a relatively 'disordered' state lacking a stable tertiary structure. As a result, they are difficult to describe by a single fixed tertiary structure. Conformational ensembles have been devised as a way to provide a more accurate and 'dynamic' representation of the conformational state of intrinsically disordered proteins.^[15]^[14]

Protein ensemble files are a representation of a protein that can be considered to have a flexible structure. Creating these files requires determining which of the various theoretically possible protein conformations actually exist. One approach is to apply computational algorithms to the protein data in order to try to determine the most likely set of conformations for an ensemble file. There are multiple methods for preparing data for the Protein Ensemble Database that fall into two general methodologies – pool and molecular dynamics (MD) approaches (diagrammed in the figure). The pool based approach uses the protein's amino acid sequence to create a massive pool of random conformations. This pool is then subjected to more computational processing that creates a set of theoretical parameters for each conformation based on the structure. Conformational subsets from this pool whose average theoretical parameters closely match known experimental data for this protein are selected. The alternative molecular dynamics approach takes multiple random conformations at a time and subjects all of them to experimental data. Here the experimental data is serving as limitations to be placed on the conformations (e.g. known distances between atoms). Only conformations that manage to remain within the limits set by the experimental data are accepted. This approach often applies large amounts of experimental data to the conformations which is a very computationally demanding task.^[14]

The conformational ensembles were generated for a number of highly dynamic and partially unfolded proteins, such as Sic1/Cdc4,^[16] p15 PAF,^[17] MKK7,^[18] Beta-synuclein^[19] and P27^[20]

Protein folding

As it is translated, polypeptides exit the ribosome mostly as a random coil and folds into its native state.^[21]^[22] The final structure of the protein chain is generally assumed to be determined by its amino acid sequence (Anfinsen's dogma).^[23]

Protein stability

Thermodynamic stability of proteins represents the free energy difference between the folded and unfolded protein states. This free energy difference is very sensitive to temperature, hence a change in temperature may result in unfolding or denaturation. Protein denaturation may result in loss of function, and loss of native state. The free energy of stabilization of soluble globular proteins typically does not exceed 50 kJ/mol.^{[citation needed]} Taking into consideration the large number of hydrogen bonds that take place for the stabilization of secondary structures, and the stabilization of the inner core through hydrophobic interactions, the free energy of stabilization emerges as small difference between large numbers.^[24]

Protein structure determination

Around 90% of the protein structures available in the

virus coat proteins and amyloid

fibers.

General secondary structure composition can be determined via

fast parallel proteolysis (FASTpp), can probe the structured fraction and its stability without the need for purification.^[30] Once a protein's structure has been experimentally determined, further detailed studies can be done computationally, using molecular dynamic simulations of that structure.^[31]

Protein structure databases

A protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes 3D coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.^[32]

Structural classifications of proteins

Protein structures can be grouped based on their structural similarity,

protein superfamilies.^[36] If shared structure is significant but the fraction shared is small, the fragment shared may be the consequence of a more dramatic evolutionary event such as horizontal gene transfer, and joining proteins sharing these fragments into protein superfamilies is no longer justified.^[35] Topology of a protein can be used to classify proteins as well. Knot theory and circuit topology

are two topology frameworks developed for classification of protein folds based on chain crossing and intrachain contacts respectively.

Computational prediction of protein structure

The generation of a

protein sequence is much easier than the determination of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been developed.^[37] Ab initio prediction methods use just the sequence of the protein. Threading and homology modeling methods can build a 3-D model for a protein of unknown structure from experimental structures of evolutionarily-related proteins, called a protein family

.

References

ISBN 978-1-305-68645-8
.

PMID 15951512
.

PMID 14886310
.

PMID 13658959
.

PMID 14816373
.

S2CID 29904865
.

PMID 19059267
.

PMID 25694109
.

S2CID 7147867. Archived from the original
on 5 January 2013.

PMID 21570668
.

^ Fraser JS, Clarkson MW, Degnan SC, Erion R, Kern D, Alber T (December 2009). "Hidden alternative structures of proline isomerase essential for catalysis". Nature. 462 (7273): 669–673.
PMID 19956261
.

OCLC 690489261
.

PMID 18365235
. 1432-119X.

^
PMID 26301226
.

^ Protein Ensemble Database

PMID 20399186
.

PMID 24559989
.

PMID 25737554
.

PMID 25389903
.

PMID 16214166
.

PMID 21111607
.

ISBN 978-0-8153-3218-3
.

PMID 4565129
.

PMID 1969647
.

S2CID 4162786
.

^ "PDB Statistics". 1 October 2022.

PMID 3541539. {{cite book}}: |journal= ignored (help
)

PMID 22356513
.

PMID 17981904
.

PMID 23056252
.

PMID 28637405
.

S2CID 45184564
.

PMID 7723011. Archived from the original
(PDF) on 26 April 2012.

PMID 9309224
.

^
PMID 19325884
.

PMID 20457744
.

PMID 18436442
.

Further reading

50 Years of Protein Structure Determination Timeline - HTML Version - National Institute of General Medical Sciences Archived 29 October 2018 at the
NIH

External links

Media related to Protein structures at Wikimedia Commons

v
t
e
Proteins
Processes

Protein biosynthesis

Post-translational modification

Protein folding

Protein targeting

Proteome

Protein methods

Structures

Protein structure

Protein structural domains

Proteasome

Types

List of proteins

Membrane protein

Globular protein
Globulin

Edestin

Albumin

Fibrous protein

Chromoprotein

Photoreceptor protein

Biliprotein
Phycobiliprotein

Phytochrome

Lipocalin

v
t
e
Protein domains

3H

ABM

ACDC

ACT

ADF-H

ANTH

ARID

BAR

BEN

BESS

BIR

BMC

BPS

BTB/POZ

BZIP

C1

C2

Cache

CBS

CGI-121

CRM

CUB

CUT

CVHN

Death
DD

DED

CARD

Pyrin

DEP

DHHC

DHR1

DHR2

DM

EcoEI_R_C

EF1

ENTH

FGGY

FYVE

HEAT

Kringle

LIM

LRR

NACHT

PAS
LOV

PDZ

PH

PX

SH2

SH3

SUN

TRIO

WD40

X8

YTH

zinc finger

v
t
e
Protein structural analysis
High resolution

Cryo-electron microscopy

X-ray crystallography

NMR

Electron crystallography

EPR

Medium resolution

Fiber diffraction

Mass spectrometry

SAXS

Spectroscopic

NMR

Circular dichroism

Dual-polarization interferometry

Absorbance

Fluorescence

Fluorescence anisotropy

Translational Diffusion

Analytical ultracentrifugation

Size exclusion chromatography

Light scattering

NMR

Rotational Diffusion

Fluorescence anisotropy

Flow birefringence

Dielectric relaxation

NMR

Chemical

Hydrogen–deuterium exchange

Site-directed mutagenesis

Chemical modification

Thermodynamic

Equilibrium unfolding

Computational

Protein structure prediction

Molecular docking

←Tertiary structure
Quaternary structure→

v
t
e
Biomolecular structure
Protein structure

Primary

Secondary

Tertiary

Quaternary

Determination

Prediction

Design

Thermodynamics

Nucleic acid structure

Primary

Secondary

Tertiary

Quaternary

Determination

Prediction

Design

Thermodynamics

See also

Protein

Protein domain

Protein engineering

Proteasome

Nucleic acid

DNA

RNA

Structural motif

Nucleic acid double helix

Protein Structure drugdesign.org

Retrieved from "https://en.wikipedia.org/w/index.php?title=Protein_structure&oldid=1208969352"

[Stoker2015-1] ISBN 978-1-305-68645-8
.

[Brocchieri2005-2] PMID 15951512
.

[3] PMID 14886310
.

[4] PMID 13658959
.

[Pauling1951-5] PMID 14816373
.

[ChiangYS2007-6] S2CID 29904865
.

[pmid19059267-7] PMID 19059267
.

[8] PMID 25694109
.

[Govinda_rajan-9] S2CID 7147867. Archived from the original
on 5 January 2013.

[pmid21570668-10] PMID 21570668
.

[11] Fraser JS, Clarkson MW, Degnan SC, Erion R, Kern D, Alber T (December 2009). "Hidden alternative structures of proline isomerase essential for catalysis". Nature. 462 (7273): 669–673.
PMID 19956261
.

[12] OCLC 690489261
.

[Satir2008-13] PMID 18365235
. 1432-119X.

[:2-14] 
PMID 26301226
.

[15] Protein Ensemble Database

[16] PMID 20399186
.

[17] PMID 24559989
.

[18] PMID 25737554
.

[19] PMID 25389903
.

[20] PMID 16214166
.

[21] PMID 21111607
.

[Alberts-22] ISBN 978-0-8153-3218-3
.

[Anfinsen-23] PMID 4565129
.

[24] PMID 1969647
.

[25] S2CID 4162786
.

[26] "PDB Statistics". 1 October 2022.

[pmid3541539-27] PMID 3541539. {{cite book}}: |journal= ignored (help
)

[28] PMID 22356513
.

[29] PMID 17981904
.

[pmid23056252-30] PMID 23056252
.

[pmid28637405-31] PMID 28637405
.

[32] S2CID 45184564
.

[pmid7723011-33] PMID 7723011. Archived from the original
(PDF) on 26 April 2012.

[pmid9309224-34] PMID 9309224
.

[Pascual2009-35] 
PMID 19325884
.

[36] PMID 20457744
.

[zhang2008-37] PMID 18436442
.

[6]

[8]

[9]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[30]

[31]

[32]

[36]

[35]

[37]