Skip to main content

Graph Attention Network Explained

What Kind of Model Is It?

Represents node features not through standard convolution but as a weighted sum of neighboring nodes

---> In short, edge representations are simply expressed as attention weights

It can handle neighborhoods of different sizes while (implicitly) assigning different importance to different nodes within a neighborhood. It does not rely on knowing the entire graph structure in advance, solving many of the theoretical issues of conventional spectral-based approaches. Because edges can be represented simply, computation speed is also reasonably improved.

Objective

Understand GATs (Graph Attention Networks), which have been frequently used in GNNs in recent years.

Prerequisites

Paper available here: ICLR 2018

Background

In graphs, edge information is important in addition to nodes. However, creating latent representations for edges is computationally expensive. -> Represent edges simply using attention weights.

Method

The overview diagram of GATs is shown below.

arch

The node update equation is expressed as follows:

node_update

In practice, Multi-Head Attention (with K heads) is computed (|| denotes concatenation):

multi_head

Breaking It Down Further

arch

entire_arch

Interpretation of each equation:

(1) Transform the features of node i through a linear layer

(2) Concatenate adjacent nodes i and j (|| in the equation denotes concatenation), then compute the energy function using linear + LeakyReLU
This is called additive attention, not dot-product attention

(3) Convert to weights

(4) The features of node i at layer (l+1) are represented as the weighted sum of (3) computed via attention and (1)

Experiments

Transductive Learning

The graph used for training and testing is the same (e.g., predicting nodes that already exist but have unknown labels)

  • 3 citation network benchmark datasets were used

Inductive Learning

The graph used for training and testing may differ (e.g., predicting new edges or nodes, evaluation on different graphs)

  • A dataset representing protein-protein interactions (PPI dataset)

Generally, inductive settings require stronger generalization performance

entire_arch

  • Visualization examples

The simple approach of representing strong connections between nodes with large weights is successfully captured

visualization

  • Visualization of node classification per epoch (as a side note)

training_vis

References

https://www.slideshare.net/takahirokubo7792/graph-attention-network