DNA binding site

DNA binding sites are a type of

restriction enzymes, site-specific recombinases (see site-specific recombination) and methyltransferases.^[1]

DNA binding sites can be thus defined as short DNA sequences (typically 4 to 30 base pairs long, but up to 200 bp for recombination sites) that are specifically bound by one or more DNA-binding proteins or protein complexes. It has been reported that some binding sites have potential to undergo fast evolutionary change.^[2]

Types of DNA binding sites

DNA binding sites can be categorized according to their biological function. Thus, we can distinguish between transcription factor-binding sites, restriction sites and recombination sites. Some authors have proposed that binding sites could also be classified according to their most convenient mode of representation.

consensus sequences, and they are typically represented using position specific frequency matrices (PSFM), which are often graphically depicted using sequence logos. This argument, however, is partly arbitrary. Restriction enzymes, like transcription factors, yield a gradual, though sharp, range of affinities for different sites ^[4] and are thus also best represented by PSFM. Likewise, site-specific recombinases also show a varied range of affinities for different target sites.^[5]^[6]

History and main experimental techniques

The existence of something akin to DNA binding sites was suspected from the experiments on the biology of the

Microscale Thermophoresis^[14]

is used.

Databases

Due to the diverse nature of the experimental techniques used in determining binding sites and to the patchy coverage of most organisms and transcription factors, there is no central database (akin to

false positive

rates are often associated with in-silico motif discovery / site search methods), there has been no systematic effort to computationally annotate these features in sequenced genomes.

There are, however, several private and public databases devoted to compilation of experimentally reported, and sometimes computationally predicted, binding sites for different transcription factors in different organisms. Below is a non-exhaustive table of available databases:

Name	Organisms	Source	Access	URL
PlantRegMap	165 plant species (e.g., Arabidopsis thaliana, Oryza sativa, Zea mays, etc.)	Expert curation and projection	Public	[1]
JASPAR	Vertebrates, Plants, Fungi, Flies, and Worms	Expert curation with literature support	Public	[2]
CIS-BP	All Eukaryotes	Experimentally derived motifs and predictions	Public	[3]
CollecTF	Prokaryotes	Literature curation	Public	[4]
RegPrecise	Prokaryotes	Expert curation	Public	[5]
RegTransBase	Prokaryotes	Expert/literature curation	Public	[6]
RegulonDB	Escherichia coli	Expert curation	Public	[7] Archived 2017-05-07 at the Wayback Machine
PRODORIC	Prokaryotes	Expert curation	Public	[8] Archived 2007-05-16 at the Wayback Machine
TRANSFAC	Mammals	Expert/literature curation	Public/Private	[9] Archived 2008-10-23 at the Wayback Machine
TRED	Human, Mouse, Rat	Computer predictions, manual curation	Public	[10]
DBSD	Drosophila species	Literature/Expert curation	Public	[11]
HOCOMOCO	Human, Mouse	Literature/Expert curation	Public	[12],[13]
MethMotif	Human, Mouse	Expert curation	Public	[14] Archived 2019-10-29 at the Wayback Machine

Representation of DNA binding sites

A collection of DNA binding sites, typically referred to as a DNA binding motif, can be represented by a

Information Theory,^[17] leading to its graphical representation as a sequence logo

.

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
A	1	0	1	5	32	5	35	23	34	14	43	13	34	4	52	3
C	50	1	0	1	5	6	0	4	4	13	3	8	17	51	2	0
G	0	0	54	15	5	5	12	2	7	1	1	3	1	0	1	52
T	5	55	1	35	14	40	9	27	11	28	9	32	4	1	1	1
Sum	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56

PSFM for the transcriptional repressor

LexA

as derived from 56 LexA-binding sites stored in Prodoric. Relative frequencies are obtained by dividing the counts in each cell by the total count (56)

Computational search and discovery of binding sites

In

artificial neural networks.^[3]^[19]^[20] A plethora of algorithms is also available for sequence motif discovery. These methods rely on the hypothesis that a set of sequences share a binding motif for functional reasons. Binding motif discovery methods can be divided roughly into enumerative, deterministic and stochastic.^[21] MEME^[22] and Consensus ^[23] are classical examples of deterministic optimization, while the Gibbs sampler^[24] is the conventional implementation of a purely stochastic method for DNA binding motif discovery. Another instance of this class of methods is SeSiMCMC^[25] that is focused of weak TFBS sites with symmetry. While enumerative methods often resort to regular expression representation of binding sites, PSFM and their formal treatment under Information Theory methods are the representation of choice for both deterministic and stochastic methods. Hybrid methods, e.g. ChIPMunk^[26] that combines greedy optimization with subsampling, also use PSFM. Recent advances in sequencing have led to the introduction of comparative genomics approaches to DNA binding motif discovery, as exemplified by PhyloGibbs.^[27]^[28]

More complex methods for binding site search and motif discovery rely on the base stacking and other interactions between DNA bases, but due to the small sample sizes typically available for binding sites in DNA, their efficiency is still not completely harnessed. An example of such tool is the ULPB^[29]

References

PMID 15178741
.

S2CID 21535866
.

^
PMID 10812473
.

PMID 9210460
.

PMID 10781547
.

ISBN 978-0-387-23919-4
.

PMID 14145311
.

S2CID 19804795
.

PMID 4587255
.

S2CID 4204720
.

PMID 1055366
.

PMID 17053094
.

S2CID 42489892
.
"A hot road to new drugs". Phys.org. February 24, 2010.

PMID 20981028
.

PMID 15130839
.

PMID 11861919
.

PMID 3525846
.

PMID 19210776
.

PMID 7784221
.

PMID 2014171
.

PMID 18566768
.

S2CID 205157795
.

PMID 2919167
.

S2CID 3040614
.

PMID 15728117
.

PMID 20736340
.

PMID 18047721
.

PMID 16477324
.

PMID 20439311
.

External links

ENCODE threads Explorer Transcription factor motifs in Nature

Manually Curated TF Binding Motifs for 157 plant species Archived 2016-10-19 at the Wayback Machine

Retrieved from "https://en.wikipedia.org/w/index.php?title=DNA_binding_site&oldid=1240855307"

[Halford2004-1] PMID 15178741
.

[Borneman2007-2] S2CID 21535866
.

[Stormo2000-3] 
PMID 10812473
.

[Pingoud1997-4] PMID 9210460
.

[Gyohda2000-5] PMID 10781547
.

[Birge2006-6] ISBN 978-0-387-23919-4
.

[Campbell1963-7] PMID 14145311
.

[autogenerated1-8] S2CID 19804795
.

[Gilbert1973-9] PMID 4587255
.

[Maniatis1974-10] S2CID 4204720
.

[Nash1975-11] PMID 1055366
.

[Elnitski2006-12] PMID 17053094
.

[Baaske-13] S2CID 42489892
.
"A hot road to new drugs". Phys.org. February 24, 2010.

[14] "A hot road to new drugs". Phys.org. February 24, 2010.

[Wienken-14] PMID 20981028
.

[Schneider2002-15] PMID 15130839
.

[Bulyk2002-16] PMID 11861919
.

[Schneider1986-17] PMID 3525846
.

[Erill2009-18] PMID 19210776
.

[Bisant1995-19] PMID 7784221
.

[O'Neill1991-20] PMID 2014171
.

[Bailey2008-21] PMID 18566768
.

[Bailey2002-22] S2CID 205157795
.

[Stormo1989-23] PMID 2919167
.

[Lawrence1993-24] S2CID 3040614
.

[25] PMID 15728117
.

[26] PMID 20736340
.

[Das2007-27] PMID 18047721
.

[Siddharthan2005-28] PMID 16477324
.

[Salama2010-29] PMID 20439311
.

[1]

[2]

[4]

[5]

[6]

[14]

[17]

[3]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]