Gene regulatory network
A gene (or genetic) regulatory network (GRN) is a collection of molecular regulators that interact with each other and with other substances in the cell to govern the
The regulator can be
In single-celled organisms, regulatory networks respond to the external environment, optimising the cell at a given time for survival in this environment. Thus a yeast cell, finding itself in a sugar solution, will turn on genes to make enzymes that process the sugar to alcohol.[2] This process, which we associate with wine-making, is how the yeast cell makes its living, gaining energy to multiply, which under normal circumstances would enhance its survival prospects.
In multicellular animals the same principle has been put in the service of gene cascades that control body-shape.
Overview
At one level, biological cells can be thought of as "partially mixed bags" of biological chemicals – in the discussion of gene regulatory networks, these chemicals are mostly the messenger RNAs (mRNAs) and proteins that arise from gene expression. These mRNA and proteins interact with each other with various degrees of specificity. Some diffuse around the cell. Others are bound to cell membranes, interacting with molecules in the environment. Still others pass through cell membranes and mediate long range signals to other cells in a multi-cellular organism. These molecules and their interactions comprise a gene regulatory network. A typical gene regulatory network looks something like this:
The nodes of this network can represent genes, proteins, mRNAs, protein/protein complexes or cellular processes. Nodes that are depicted as lying along vertical lines are associated with the cell/environment interfaces, while the others are free-floating and can diffuse. Edges between nodes represent interactions between the nodes, that can correspond to individual molecular reactions between DNA, mRNA, miRNA, proteins or molecular processes through which the products of one gene affect those of another, though the lack of experimentally obtained information often implies that some reactions are not modeled at such a fine level of detail. These interactions can be inductive (usually represented by arrowheads or the + sign), with an increase in the concentration of one leading to an increase in the other, inhibitory (represented with filled circles, blunt arrows or the minus sign), with an increase in one leading to a decrease in the other, or dual, when depending on the circumstances the regulator can activate or inhibit the target node. The nodes can regulate themselves directly or indirectly, creating feedback loops, which form cyclic chains of dependencies in the topological network. The network structure is an abstraction of the system's molecular or chemical dynamics, describing the manifold ways in which one substance affects all the others to which it is connected. In practice, such GRNs are inferred from the biological literature on a given system and represent a distillation of the collective knowledge about a set of related biochemical reactions. To speed up the manual curation of GRNs, some recent efforts try to use text mining, curated databases, network inference from massive data, model checking and other information extraction technologies for this purpose.[4]
Genes can be viewed as nodes in the network, with input being proteins such as
Mathematical models of GRNs have been developed to capture the behavior of the system being modeled, and in some cases generate predictions corresponding with experimental observations. In some other cases, models have proven to make accurate novel predictions, which can be tested experimentally, thus suggesting new approaches to explore in an experiment that sometimes wouldn't be considered in the design of the protocol of an experimental laboratory. Modeling techniques include differential equations (ODEs), Boolean networks, Petri nets, Bayesian networks, graphical Gaussian network models, Stochastic, and Process Calculi.[6] Conversely, techniques have been proposed for generating models of GRNs that best explain a set of time series observations. Recently it has been shown that ChIP-seq signal of histone modification are more correlated with transcription factor motifs at promoters in comparison to RNA level.[7] Hence it is proposed that time-series histone modification ChIP-seq could provide more reliable inference of gene-regulatory networks in comparison to methods based on expression levels.
Structure and evolution
Global feature
Gene regulatory networks are generally thought to be made up of a few highly connected
There are primarily two ways that networks can evolve, both of which can occur simultaneously. The first is that network topology can be changed by the addition or subtraction of nodes (genes) or parts of the network (modules) may be expressed in different contexts. The Drosophila Hippo signaling pathway provides a good example. The Hippo signaling pathway controls both mitotic growth and post-mitotic cellular differentiation.[11] Recently it was found that the network the Hippo signaling pathway operates in differs between these two functions which in turn changes the behavior of the Hippo signaling pathway. This suggests that the Hippo signaling pathway operates as a conserved regulatory module that can be used for multiple functions depending on context.[11] Thus, changing network topology can allow a conserved module to serve multiple functions and alter the final output of the network. The second way networks can evolve is by changing the strength of interactions between nodes, such as how strongly a transcription factor may bind to a cis-regulatory element. Such variation in strength of network edges has been shown to underlie between species variation in vulva cell fate patterning of Caenorhabditis worms.[12]
Local feature
Another widely cited characteristic of gene regulatory network is their abundance of certain repetitive sub-networks known as network motifs. Network motifs can be regarded as repetitive topological patterns when dividing a big network into small blocks. Previous analysis found several types of motifs that appeared more often in gene regulatory networks than in randomly generated networks.[13][14][15] As an example, one such motif is called feed-forward loops, which consist three nodes. This motif is the most abundant among all possible motifs made up of three nodes, as is shown in the gene regulatory networks of fly, nematode, and human.[15]
The enriched motifs have been proposed to follow
On the other hand, some researchers hypothesize that the enrichment of network motifs is non-adaptive.[23] In other words, gene regulatory networks can evolve to a similar structure without the specific selection on the proposed input-output behavior. Support for this hypothesis often comes from computational simulations. For example, fluctuations in the abundance of feed-forward loops in a model that simulates the evolution of gene regulatory networks by randomly rewiring nodes may suggest that the enrichment of feed-forward loops is a side-effect of evolution.[24] In another model of gene regulator networks evolution, the ratio of the frequencies of gene duplication and gene deletion show great influence on network topology: certain ratios lead to the enrichment of feed-forward loops and create networks that show features of hierarchical scale free networks. De novo evolution of coherent type 1 feed-forward loops has been demonstrated computationally in response to selection for their hypothesized function of filtering out a short spurious signal, supporting adaptive evolution, but for non-idealized noise, a dynamics-based system of feed-forward regulation with different topology was instead favored.[25]
Bacterial regulatory networks
Regulatory networks allow bacteria to adapt to almost every environmental niche on earth.[26][27] A network of interactions among diverse types of molecules including DNA, RNA, proteins and metabolites, is utilised by the bacteria to achieve regulation of gene expression. In bacteria, the principal function of regulatory networks is to control the response to environmental changes, for example nutritional status and environmental stress.[28] A complex organization of networks permits the microorganism to coordinate and integrate multiple environmental signals.[26]
One example stress is when the environment suddenly becomes poor of nutrients. This triggers a complex adaptation process in bacteria, such as E. coli. After this environmental change, thousands of genes change expression level. However, these changes are predictable from the topology and logic of the gene network[29] that is reported in RegulonDB. Specifically, on average, the response strength of a gene was predictable from the difference between the numbers of activating and repressing input transcription factors of that gene.[29]
Modelling
Coupled ordinary differential equations
It is common to model such a network with a set of coupled ordinary differential equations (ODEs) or SDEs, describing the reaction kinetics of the constituent parts. Suppose that our regulatory network has nodes, and let represent the concentrations of the corresponding substances at time . Then the temporal evolution of the system can be described approximately by
where the functions express the dependence of on the concentrations of other substances present in the cell. The functions are ultimately derived from basic
By solving for the fixed point of the system:
for all , one obtains (possibly several) concentration profiles of proteins and mRNAs that are theoretically sustainable (though not necessarily
Boolean network
The following example illustrates how a Boolean network can model a GRN together with its gene products (the outputs) and the substances from the environment that affect it (the inputs). Stuart Kauffman was amongst the first biologists to use the metaphor of Boolean networks to model genetic regulatory networks.[31][32]
- Each gene, each input, and each output is represented by a node in a directed graph in which there is an arrow from one node to another if and only if there is a causal link between the two nodes.
- Each node in the graph can be in one of two states: on or off.
- For a gene, "on" corresponds to the gene being expressed; for inputs and outputs, "on" corresponds to the substance being present.
- Time is viewed as proceeding in discrete steps. At each step, the new state of a node is a Boolean function of the prior states of the nodes with arrows pointing towards it.
The validity of the model can be tested by comparing simulation results with time series observations. A partial validation of a Boolean network model can also come from testing the predicted existence of a yet unknown regulatory connection between two particular transcription factors that each are nodes of the model.[33]
Continuous networks
Continuous network models of GRNs are an extension of the Boolean networks described above. Nodes still represent genes and connections between them regulatory influences on gene expression. Genes in biological systems display a continuous range of activity levels and it has been argued that using a continuous representation captures several properties of gene regulatory networks not present in the Boolean model.
Stochastic gene networks
Recent experimental results[40]
Since some processes, such as gene transcription, involve many reactions and could not be correctly modeled as an instantaneous reaction in a single step, it was proposed to model these reactions as single step multiple delayed reactions in order to account for the time it takes for the entire process to be complete.[47]
From here, a set of reactions were proposed[48] that allow generating GRNs. These are then simulated using a modified version of the Gillespie algorithm, that can simulate multiple time delayed reactions (chemical reactions where each of the products is provided a time delay that determines when will it be released in the system as a "finished product").
For example, basic transcription of a gene can be represented by the following single-step reaction (RNAP is the RNA polymerase, RBS is the RNA ribosome binding site, and Pro i is the promoter region of gene i):
Furthermore, there seems to be a trade-off between the noise in gene expression, the speed with which genes can switch, and the metabolic cost associated their functioning. More specifically, for any given level of metabolic cost, there is an optimal trade-off between noise and processing speed and increasing the metabolic cost leads to better speed-noise trade-offs.[49][50][51]
A recent work proposed a simulator (SGNSim, Stochastic Gene Networks Simulator),[52] that can model GRNs where transcription and translation are modeled as multiple time delayed events and its dynamics is driven by a stochastic simulation algorithm (SSA) able to deal with multiple time delayed events. The time delays can be drawn from several distributions and the reaction rates from complex functions or from physical parameters. SGNSim can generate ensembles of GRNs within a set of user-defined parameters, such as topology. It can also be used to model specific GRNs and systems of chemical reactions. Genetic perturbations such as gene deletions, gene over-expression, insertions, frame shift mutations can also be modeled as well.
The GRN is created from a graph with the desired topology, imposing in-degree and out-degree distributions. Gene promoter activities are affected by other genes expression products that act as inputs, in the form of monomers or combined into multimers and set as direct or indirect. Next, each direct input is assigned to an operator site and different transcription factors can be allowed, or not, to compete for the same operator site, while indirect inputs are given a target. Finally, a function is assigned to each gene, defining the gene's response to a combination of transcription factors (promoter state). The transfer functions (that is, how genes respond to a combination of inputs) can be assigned to each combination of promoter states as desired.
In other recent work, multiscale models of gene regulatory networks have been developed that focus on synthetic biology applications. Simulations have been used that model all biomolecular interactions in transcription, translation, regulation, and induction of gene regulatory networks, guiding the design of synthetic systems.[53]
Prediction
Other work has focused on predicting the gene expression levels in a gene regulatory network. The approaches used to model gene regulatory networks have been constrained to be interpretable and, as a result, are generally simplified versions of the network. For example, Boolean networks have been used due to their simplicity and ability to handle noisy data but lose data information by having a binary representation of the genes. Also, artificial neural networks omit using a hidden layer so that they can be interpreted, losing the ability to model higher order correlations in the data. Using a model that is not constrained to be interpretable, a more accurate model can be produced. Being able to predict gene expressions more accurately provides a way to explore how drugs affect a system of genes as well as for finding which genes are interrelated in a process. This has been encouraged by the DREAM competition[54] which promotes a competition for the best prediction algorithms.[55] Some other recent work has used artificial neural networks with a hidden layer.[56]
Applications
Multiple sclerosis
There are three classes of multiple sclerosis: relapsing-remitting (RRMS), primary progressive (PPMS) and secondary progressive (SPMS). Gene regulatory network (GRN) plays a vital role to understand the disease mechanism across these three different multiple sclerosis classes.[57]
See also
- Body plan
- Cis-regulatory module
- Genenetwork(database)
- Morphogen
- Operon
- Synexpression
- Systems biology
- Weighted gene co-expression network analysis
References
- PMID 8930119.
- S2CID 4841222.
- PMID 15809445.
- ^ Leitner F, Krallinger M, Tripathi S, Kuiper M, Lægreid A, Valencia A (July 2013). "Mining cis-regulatory transcription networks from literature". Proceedings of BioLINK SIG 2013: 5–12.
- PMID 28186191.
- PMID 27641093.
- PMID 23770639.
- ^ S2CID 10950726.
- S2CID 8612268.
- PMID 18682703.
- ^ PMID 23989952.
- PMID 21458263.
- S2CID 2180121.
- S2CID 4841222.
- ^ PMID 25164757.
- S2CID 959172.
- PMID 14530388.
- PMID 20005851.
- PMID 16406067.
- PMID 14607112.
- PMID 20005849.
- PMID 24278038.
- S2CID 11839414.
- PMID 16840361.
- PMID 31160574.
- ^ ISBN 978-1-908230-03-4.
- ISBN 978-1-908230-08-9.
- ISBN 978-1-908230-04-1.
- ^ PMID 35748858.
- S2CID 12809260.
- ISBN 978-0-19-505811-6.
- PMID 5803332.
- PMID 25398016.
- PMID 11395518.
- S2CID 8664677.
- ^ Schilstra MJ, Bolouri H (2 January 2002). "Modelling the Regulation of Gene Expression in Genetic Regulatory Networks". Biocomputation group, University of Hertfordshire. Archived from the original on 13 October 2007.
- CiteSeerX 10.1.1.72.5016.
- CiteSeerX 10.1.1.71.8768.
- ^ Knabe JF, Schilstra MJ, Nehaniv CL (2008). "Evolution and Morphogenesis of Differentiated Multicellular Organisms: Autonomously Generated Diffusion Gradients for Positional Information" (PDF). Artificial Life XI: Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems. MIT Press.
- S2CID 10845628.
- S2CID 4347106.
- PMID 9691025.
- PMID 16179466.
- S2CID 41632754.
- S2CID 345059.
- .
- S2CID 21456299.
- S2CID 6629364.
- PMID 20007173.
- PMID 21256918.
- S2CID 14274912.
- PMID 17267430.
- PMID 17986347.
- ^ "The DREAM Project". Columbia University Center for Multiscale Analysis Genomic and Cellular Networks (MAGNet).
- PMID 20169069.
- ^ Smith MR, Clement M, Martinez T, Snell Q (2010). "Time Series Gene Expression Prediction using Neural Networks with Hidden Layers" (PDF). Proceedings of the 7th Biotechnology and Bioinformatics Symposium (BIOT 2010). pp. 67–69.
- PMID 31484947.
Further reading
- Bolouri H, ISBN 978-0-262-02481-5.
- Kauffman SA (March 1969). "Metabolic stability and epigenesis in randomly constructed genetic nets". Journal of Theoretical Biology. 22 (3): 437–467. PMID 5803332.
External links
- Plant Transcription Factor Database and Plant Transcriptional Regulation Data and Analysis Platform
- Open source web service for GRN analysis
- BIB: Yeast Biological Interaction Browser
- Graphical Gaussian models for genome data – Inference of gene association networks with GGMs
- A bibliography on learning causal networks of gene interactions – regularly updated, contains hundreds of links to papers from bioinformatics, statistics, machine learning.
- https://web.archive.org/web/20060907074456/http://mips.gsf.de/proj/biorel/ BIOREL is a web-based resource for quantitative estimation of the gene network bias in relation to available database information about gene activity/function/properties/associations/interactio.
- Evolving Biological Clocks using Genetic Regulatory Networks – Information page with model source code and Java applet.
- Engineered Gene Networks
- Tutorial: Genetic Algorithms and their Application to the Artificial Evolution of Genetic Regulatory Networks
- BEN: a web-based resource for exploring the connections between genes, diseases, and other biomedical entities
- Global protein-protein interaction and gene regulation network of Arabidopsis thaliana Archived 16 March 2016 at the Wayback Machine