Graph neural network

A graph neural network (GNN) belongs to a class of

artificial neural networks for processing data that can be represented as graphs.^[1]^[2]^[3]^[4]^[5]

Basic building blocks of a graph neural network (GNN).

(1)

Permutation equivariant layer.

(2)

Local pooling layer.

(3)

Global pooling (or readout) layer. Colors indicate features.

In the more general subject of "geometric

words or tokens in a passage of natural language

text.

The key design element of GNNs is the use of pairwise message passing, such that graph nodes iteratively update their representations by exchanging information with their neighbors. Since their inception, several different GNN architectures have been proposed,[2]^[3]^[7]^[8]^[9] which implement different flavors of message passing,^[6]^[10] started by recursive^[2] or convolutional constructive^[3] approaches. As of 2022^[update], whether it is possible to define GNN architectures "going beyond" message passing, or if every GNN can be built on message passing over suitably defined graphs, is an open research question.^[11]

Relevant application domains for GNNs include

NP-hard combinatorial optimization problems.^[19]

Several open source libraries implementing graph neural networks are available, such as PyTorch Geometric^[20] (PyTorch), TensorFlow GNN^[21] (TensorFlow), jraph^[22] (Google JAX), and GraphNeuralNetworks.jl^[23]/GeometricFlux.jl^[24] (Julia, Flux).

Architecture

The architecture of a generic GNN implements the following fundamental layers:^[6]

Permutation equivariant: a permutation equivariant layer maps a representation of a graph into an updated representation of the same graph. In the literature, permutation equivariant layers are implemented via pairwise message passing between graph nodes.^[6]^[11] Intuitively, in a message passing layer, nodes update their representations by aggregating the messages received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop.
Local pooling: a local pooling layer coarsens the graph via downsampling. Local pooling is used to increase the receptive field of a GNN, in a similar fashion to pooling layers in convolutional neural networks. Examples include k-nearest neighbours pooling, top-k pooling,^[25] and self-attention pooling.^[26]
Global pooling: a global pooling layer, also known as readout layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output.^[27] Examples include element-wise sum, mean or maximum.

It has been demonstrated that GNNs cannot be more expressive than the

atoms but different bonds) that cannot be distinguished by GNNs. More powerful GNNs operating on higher-dimension geometries such as simplicial complexes can be designed.^[30]^[31]^[10] As of 2022, whether or not future architectures will overcome the message passing primitive is an open research question.^[11]

Message passing layers

Message passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph. Formally, they can be expressed as message passing neural networks (MPNNs).^[6]

Let $G=(V,E)$ be a graph, where $V$ is the node set and $E$ is the edge set. Let $N_{u}$ be the neighbourhood of some node $u\in V$ . Additionally, let $\mathbf {x} _{u}$ be the features of node $u\in V$ , and $\mathbf {e} _{uv}$ be the features of edge $(u,v)\in E$ . An MPNN layer can be expressed as follows:^[6]

\mathbf {h} _{u}=\phi \left(\mathbf {x} _{u},\bigoplus _{v\in N_{u}}\psi (\mathbf {x} _{u},\mathbf {x} _{v},\mathbf {e} _{uv})\right)

where $\phi$ and $\psi$ are

artificial neural networks

), and

\bigoplus

is a

aggregation operator

that can accept an arbitrary number of inputs (e.g., element-wise sum, mean, or max). In particular,

\phi

and

\psi

are referred to as update and message functions, respectively. Intuitively, in an MPNN computational block, graph nodes update their representations by aggregating the messages received from their neighbours.

The outputs of one or more MPNN layers are node representations $\mathbf {h} _{u}$ for each node $u\in V$ in the graph. Node representations can be employed for any downstream task, such as node/graph classification or edge prediction.

Graph nodes in an MPNN update their representation aggregating information from their immediate neighbours. As such, stacking $n$ MPNN layers means that one node will be able to communicate with nodes that are at most $n$ "hops" away. In principle, to ensure that every node receives information from every other node, one would need to stack a number of MPNN layers equal to the graph diameter. However, stacking many MPNN layers may cause issues such as oversmoothing^[32] and oversquashing.^[33] Oversmoothing refers to the issue of node representations becoming indistinguishable. Oversquashing refers to the bottleneck that is created by squeezing long-range dependencies into fixed-size representations. Countermeasures such as skip connections^[8]^[34] (as in residual neural networks), gated update rules^[35] and jumping knowledge^[36] can mitigate oversmoothing. Modifying the final layer to be a fully-adjacent layer, i.e., by considering the graph as a complete graph, can mitigate oversquashing in problems where long-range dependencies are required.^[33]

Other "flavours" of MPNN have been developed in the literature,^[6] such as graph convolutional networks^[7] and graph attention networks,^[9] whose definitions can be expressed in terms of the MPNN formalism.

Graph convolutional network

The graph convolutional network (GCN) was first introduced by Thomas Kipf and Max Welling in 2017.^[7]

A GCN layer defines a first-order approximation of a localized spectral filter on graphs. GCNs can be understood as a generalization of convolutional neural networks to graph-structured data.

The formal expression of a GCN layer reads as follows:

\mathbf {H} =\sigma \left({\tilde {\mathbf {D} }}^{-{\frac {1}{2}}}{\tilde {\mathbf {A} }}{\tilde {\mathbf {D} }}^{-{\frac {1}{2}}}\mathbf {X} \mathbf {\Theta } \right)

where $\mathbf {H}$ is the matrix of node representations $\mathbf {h} _{u}$ , $\mathbf {X}$ is the matrix of node features $\mathbf {x} _{u}$ , $\sigma (\cdot )$ is an

ReLU

),

{\tilde {\mathbf {A} }}

is the graph adjacency matrix with the addition of self-loops,

{\tilde {\mathbf {D} }}

is the graph degree matrix with the addition of self-loops, and

\mathbf {\Theta }

is a matrix of trainable parameters.

In particular, let $\mathbf {A}$ be the graph adjacency matrix: then, one can define ${\tilde {\mathbf {A} }}=\mathbf {A} +\mathbf {I}$ and ${\tilde {\mathbf {D} }}_{ii}=\sum _{j\in V}{\tilde {A}}_{ij}$ , where $\mathbf {I}$ denotes the

eigenvalues

of

{\tilde {\mathbf {D} }}^{-{\frac {1}{2}}}{\tilde {\mathbf {A} }}{\tilde {\mathbf {D} }}^{-{\frac {1}{2}}}

are bounded in the range

[0,1]

, avoiding numerical instabilities and exploding/vanishing gradients.

A limitation of GCNs is that they do not allow multidimensional edge features $\mathbf {e} _{uv}$ .^[7] It is however possible to associate scalar weights $w_{uv}$ to each edge by imposing $A_{uv}=w_{uv}$ , i.e., by setting each nonzero entry in the adjacency matrix equal to the weight of the corresponding edge.

Graph attention network

The graph attention network (GAT) was introduced by Petar Veličković et al. in 2018.^[9]

Graph attention network is a combination of a graph neural network and an attention layer. The implementation of attention layer in graphical neural networks helps provide attention or focus to the important information from the data instead of focusing on the whole data.

A multi-head GAT layer can be expressed as follows:

\mathbf {h} _{u}={\overset {K}{\underset {k=1}{\Big \Vert }}}\sigma \left(\sum _{v\in N_{u}}\alpha _{uv}\mathbf {W} ^{k}\mathbf {x} _{v}\right)

where $K$ is the number of attention heads, ${\Big \Vert }$ denotes vector concatenation, $\sigma (\cdot )$ is an

ReLU

),

\alpha _{ij}

are attention coefficients, and

W^{k}

is a matrix of trainable parameters for the

k

-th attention head.

For the final GAT layer, the outputs from each attention head are averaged before the application of the activation function. Formally, the final GAT layer can be written as:

\mathbf {h} _{u}=\sigma \left({\frac {1}{K}}\sum _{k=1}^{K}\sum _{v\in N_{u}}\alpha _{uv}\mathbf {W} ^{k}\mathbf {x} _{v}\right)

Attention in Machine Learning is a technique that mimics cognitive attention. In the context of learning on graphs, the attention coefficient $\alpha _{uv}$ measures how important is node $u\in V$ to node $v\in V$ .

Normalized attention coefficients are computed as follows:

\alpha _{uv}={\frac {\exp({\text{LeakyReLU}}\left(\mathbf {a} ^{T}[\mathbf {W} \mathbf {h} _{u}\Vert \mathbf {W} \mathbf {h} _{v}\Vert \mathbf {e} _{uv}]\right))}{\sum _{z\in N_{u}}\exp({\text{LeakyReLU}}\left(\mathbf {a} ^{T}[\mathbf {W} \mathbf {h} _{u}\Vert \mathbf {W} \mathbf {h} _{z}\Vert \mathbf {e} _{uz}]\right))}}

where $\mathbf {a}$ is a vector of learnable weights, $\cdot ^{T}$ indicates transposition, and ${\text{LeakyReLU}}$ is a modified ReLU activation function. Attention coefficients are normalized to make them easily comparable across different nodes.^[9]

A GCN can be seen as a special case of a GAT where attention coefficients are not learnable, but fixed and equal to the edge weights $w_{uv}$ .

Gated graph sequence neural network

The gated graph sequence neural network (GGS-NN) was introduced by Yujia Li et al. in 2015.^[35] The GGS-NN extends the GNN formulation by Scarselli et al.^[2] to output sequences. The message passing framework is implemented as an update rule to a gated recurrent unit (GRU) cell.

A GGS-NN can be expressed as follows:

\mathbf {h} _{u}^{(0)}=\mathbf {x} _{u}\,\Vert \,\mathbf {0}

\mathbf {m} _{u}^{(l+1)}=\sum _{v\in N_{u}}\mathbf {\Theta } \mathbf {h} _{v}

\mathbf {h} _{u}^{(l+1)}={\text{GRU}}(\mathbf {m} _{u}^{(l+1)},\mathbf {h} _{u}^{(l)})

where $\Vert$ denotes vector concatenation, $\mathbf {0}$ is a vector of zeros, $\mathbf {\Theta }$ is a matrix of learnable parameters, ${\text{GRU}}$ is a GRU cell, and $l$ denotes the sequence index. In a GGS-NN, the node representations are regarded as the hidden states of a GRU cell. The initial node features $\mathbf {x} _{u}^{(0)}$ are zero-padded up to the hidden state dimension of the GRU cell. The same GRU cell is used for updating representations for each node.

Local pooling layers

Local pooling layers coarsen the graph via downsampling. We present here several learnable local pooling strategies that have been proposed.^[27] For each cases, the input is the initial graph is represented by a matrix $\mathbf {X}$ of node features, and the graph adjacency matrix $\mathbf {A}$ . The output is the new matrix $\mathbf {X} '$ of node features, and the new graph adjacency matrix $\mathbf {A} '$ .

Top-k pooling

We first set

$\mathbf {y} ={\frac {\mathbf {X} \mathbf {p} }{\Vert \mathbf {p} \Vert }}$

where $\mathbf {p}$ is a learnable projection vector. The projection vector $\mathbf {p}$ computes a scalar projection value for each graph node.

The top-k pooling layer ^[25] can then be formalised as follows:

\mathbf {X} '=(\mathbf {X} \odot {\text{sigmoid}}(\mathbf {y} ))_{\mathbf {i} }

\mathbf {A} '=\mathbf {A} _{\mathbf {i} ,\mathbf {i} }

where $\mathbf {i} ={\text{top}}_{k}(\mathbf {y} )$ is the subset of nodes with the top-k highest projection scores, $\odot$ denotes element-wise matrix multiplication, and ${\text{sigmoid}}(\cdot )$ is the sigmoid function. In other words, the nodes with the top-k highest projection scores are retained in the new adjacency matrix $\mathbf {A} '$ . The ${\text{sigmoid}}(\cdot )$ operation makes the projection vector $\mathbf {p}$ trainable by backpropagation, which otherwise would produce discrete outputs.^[25]

Self-attention pooling

We first set

\mathbf {y} ={\text{GNN}}(\mathbf {X} ,\mathbf {A} )

where ${\text{GNN}}$ is a generic permutation equivariant GNN layer (e.g., GCN, GAT, MPNN).

The Self-attention pooling layer^[26] can then be formalised as follows:

\mathbf {X} '=(\mathbf {X} \odot \mathbf {y} )_{\mathbf {i} }

\mathbf {A} '=\mathbf {A} _{\mathbf {i} ,\mathbf {i} }

where $\mathbf {i} ={\text{top}}_{k}(\mathbf {y} )$ is the subset of nodes with the top-k highest projection scores, $\odot$ denotes element-wise matrix multiplication.

The self-attention pooling layer can be seen as an extension of the top-k pooling layer. Differently from top-k pooling, the self-attention scores computed in self-attention pooling account both for the graph features and the graph topology.

Applications

Protein folding

Graph neural networks are one of the main building blocks of

DeepMind for solving the protein folding problem in biology. AlphaFold achieved first place in several CASP competitions.^[37]^[38]^[36]

Social networks

social relations and item relations.^[39]^[13]

Combinatorial optimization

GNNs are used as fundamental building blocks for several combinatorial optimization algorithms.^[40] Examples include computing shortest paths or Eulerian circuits for a given graph,^[35] deriving chip placements superior or competitive to handcrafted human solutions,^[41] and improving expert-designed branching rules in branch and bound.^[42]

Cyber security

When viewed as a graph, a network of computers can be analyzed with GNNs for anomaly detection. Anomalies within provenance graphs often correlate to malicious activity within the network. GNNs have been used to identify these anomalies on individual nodes^[43] and within paths^[44] to detect malicious processes, or on the edge level^[45] to detect lateral movement.

References

^ Wu, Lingfei; Cui, Peng; Pei, Jian; Zhao, Liang (2022). "Graph Neural Networks: Foundations, Frontiers, and Applications". Springer Singapore: 725.
^
S2CID 206756462
.

^
S2CID 17486263
.

ISSN 2476-0757
.

S2CID 239678898
.

^
arXiv:2104.13478 [cs.LG
].

^
S2CID 206756462
.

^
arXiv:1706.02216
– via Stanford.

^
arXiv:1710.10903 [stat.ML
].

^
arXiv:2206.00606
.

^
arXiv:2202.11097 [cs.LG
].

S2CID 206756462
.

^
S2CID 46949657
.

^ "Stanford Large Network Dataset Collection". snap.stanford.edu. Retrieved 2021-07-05.

doi:10.1101/2023.11.29.569114. {{cite journal}}: Cite journal requires |journal= (help
)

arXiv:1704.01212
.

PMID 30746086
.

S2CID 88518244
.

arXiv:1810.10659
.

arXiv:1903.02428 [cs.LG
].

^ "Tensorflow GNN". GitHub. Retrieved 30 June 2022.

^ "jraph". GitHub. Retrieved 30 June 2022.

^ Lucibello, Carlo (2021). "GraphNeuralNetworks.jl". Retrieved 2023-09-21.

^ FluxML/GeometricFlux.jl, FluxML, 2024-01-31, retrieved 2024-02-03

^
arXiv:1905.05178 [cs.LG
].

^
arXiv:1904.08082 [cs.LG
].

^
arXiv:2204.07321 [cs.LG
].

arXiv:1101.5211 [math.CO
].

arXiv:1810.00826 [cs.LG
].

arXiv:2103.03212 [cs.LG
].

^ Grady, Leo; Polimeni, Jonathan (2011). Discrete Calculus: Applied Analysis on Graphs for Computational Science (PDF). Springer.

S2CID 202539008
.

^
arXiv:2006.05205 [cs.LG
].

arXiv:2105.04550 [cs.LG
].

^
arXiv:1511.05493 [cs.LG
].

^
arXiv:1806.03536 [cs.LG
].

^ Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 30 November 2020.

^ "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.

S2CID 67769538
.

arXiv:2102.09544 [cs.LG
].

S2CID 235395490
.

arXiv:1906.01629 [cs.LG
].

S2CID 243847506
.

S2CID 211267791
.

S2CID 248221601
.

External links

https://distill.pub/2021/gnn-intro/

Retrieved from "https://en.wikipedia.org/w/index.php?title=Graph_neural_network&oldid=1214884063"

[wucuipeizhao2022-1] Wu, Lingfei; Cui, Peng; Pei, Jian; Zhao, Liang (2022). "Graph Neural Networks: Foundations, Frontiers, and Applications". Springer Singapore: 725.

[scarselli2009-2] 
S2CID 206756462
.

[micheli2009-3] 
S2CID 17486263
.

[sanchez2021-4] ISSN 2476-0757
.

[daigavane2021-5] S2CID 239678898
.

[bronstein2021-6] 
arXiv:2104.13478 [cs.LG
].

[kipf2016-7] 
S2CID 206756462
.

[hamilton2017-8] 
arXiv:1706.02216
– via Stanford.

[velickovic2018-9] 
arXiv:1710.10903 [stat.ML
].

[hajij2022-10] 
arXiv:2206.00606
.

[velickovic2022-11] 
arXiv:2202.11097 [cs.LG
].

[wuchen2023-12] S2CID 206756462
.

[ying2018-13] 
S2CID 46949657
.

[stanforddata-14] "Stanford Large Network Dataset Collection". snap.stanford.edu. Retrieved 2021-07-05.

[15] :10.1101/2023.11.29.569114. {{cite journal}}: Cite journal requires |journal= (help
)

[gilmer2017-16] rXiv:1704.01212
.

[17] PMID 30746086
.

[qasim2019-18] S2CID 88518244
.

[li2018-19] rXiv:1810.10659
.

[fey2019-20] rXiv:1903.02428 [cs.LG
].

[tfgnn2022-21] "Tensorflow GNN". GitHub. Retrieved 30 June 2022.

[jraph2022-22] "jraph". GitHub. Retrieved 30 June 2022.

[Lucibello2021GNN-23] Lucibello, Carlo (2021). "GraphNeuralNetworks.jl". Retrieved 2023-09-21.

[24] FluxML/GeometricFlux.jl, FluxML, 2024-01-31, retrieved 2024-02-03

[gao2019-25] 
arXiv:1905.05178 [cs.LG
].

[lee2019-26] 
arXiv:1904.08082 [cs.LG
].

[lui2022-27] 
arXiv:2204.07321 [cs.LG
].

[douglas2011-28] rXiv:1101.5211 [math.CO
].

[xu2019-29] rXiv:1810.00826 [cs.LG
].

[bronstein2021-2-30] rXiv:2103.03212 [cs.LG
].

[grady2011discrete-31] Grady, Leo; Polimeni, Jonathan (2011). Discrete Calculus: Applied Analysis on Graphs for Computational Science (PDF). Springer.

[chen2021-32] S2CID 202539008
.

[alon2021-33] 
arXiv:2006.05205 [cs.LG
].

[xu2021-34] rXiv:2105.04550 [cs.LG
].

[li2016-35] 
arXiv:1511.05493 [cs.LG
].

[xu2018-36] 
arXiv:1806.03536 [cs.LG
].

[guardian2018-37] Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 30 November 2020.

[mit2020-38] "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.

[fan2019-39] S2CID 67769538
.

[cappart2021-40] rXiv:2102.09544 [cs.LG
].

[mirhoseini2021-41] S2CID 235395490
.

[gasse2019-42] rXiv:1906.01629 [cs.LG
].

[43] S2CID 243847506
.

[44] S2CID 211267791
.

[45] S2CID 248221601
.

[1]

[2]

[3]

[4]

[5]

[7]

[8]

[9]

[6]

[10]

[11]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[13]

[40]

[41]

[42]

[43]

[44]

[45]