https://zhampel.github.io/gcn-ssc-technical-overview
E-mail: zhampel@wipac.wisc.edu
Find me on:
Data sets in which data samples (nodes) demonstrate inter-connectivity or relationships (edges).
Describe with graphs.
Graphs defined by
A simple, undirected graph with $n=4$ vertices (nodes) each with $C = 1$ features connected by $E = 4$ edges.
Need a way to describe connectivity...
Indices run over graph vertices: $i \in [1,n]$.
The adjacency matrix $A$ defines the connectivity of the graph:
$$ A_{ij} = \begin{cases} %2 & \text{ if loop at } i=j \\ 1 & \text{ if edge from vertex i to j} \\ 0 & \text{ otherwise} \end{cases} $$
The degree matrix $D$ is diagonal, giving the number of edges at each vertex:
$$ D_{ii} = \sum_j A_{ij} $$
# Adjacency matrix A, Degree matrix D
print(' A \t\t D')
for j in range(0,A.shape[1]):
print('{} {}'.format(A[:,j],D[:,j]))
print(' Symmetric \t Diagonal')
A D [0. 1. 0. 0.] [1. 0. 0. 0.] [1. 0. 1. 1.] [0. 3. 0. 0.] [0. 1. 0. 1.] [0. 0. 2. 0.] [0. 1. 1. 0.] [0. 0. 0. 2.] Symmetric Diagonal
Can we build a predictive model on a GSD set that's not completely described?
One solution incorporates $A$ via regularization term in objective function:
$$ \begin{align} \mathcal{L} &= \underbrace{\mathcal{L}_0}_{\text{Supervised}} + \underbrace{\lambda \, \mathcal{L}_{\text{reg}}}_{\text{Unsupervised}} \\ %&= L_0 + \lambda \, L_{\text{reg}}(A) \\ %\\ %&= \underbrace{L_0}_{\text{Supervised}} + \underbrace{\lambda \ \ f(X)^{\text{T}} \ (D - A) \, \ f(X)}_{Unsupervised} \end{align} $$
Supervised Component:
Unsupervised Component:
Incorporating $A$ via regularization:
$$ \begin{align} \mathcal{L} %&= \underbrace{L_0}_{\text{Supervised}} + \underbrace{\lambda \, L_{\text{reg}}}_{\text{Unsupervised}} \\ &= \mathcal{L}_0 + \lambda \, \mathcal{L}_{\text{reg}} \\ \\ &= \mathcal{L}_0 + \lambda \ \sum_{i,j} A_{ij} \ || \ f(X_i) \ - \ f(X_j) \ ||^2 \end{align} $$
Two main contributions:
Embed $A$ inside neural network structure:
$$ f(X) \rightarrow f(X,A) $$
Consider convolution parametrized by characteristic spectrum of graph structure:
$$ \text{Conv} \left( X \right)= U \ g_{\theta} (\Lambda) \ U^{T} X $$
Potential issues:
Approximate $g_{\theta'}\approx g_{\theta}$
$$ \text{Conv} \left( X \right) \approx U \ \left( \theta'_0 + \theta'_1 \ \Lambda \ + ... + \ \theta'_k \ \Lambda^k \right) \ U^{T} \ X $$
$$ g_{\theta'}(\Lambda) \rightarrow g_{\theta'}(L_{\text{norm}}) = \theta'_0 + \theta'_1 \ L_{\text{norm}} \ + ... + \ \theta'_k \ L_{\text{norm}}^k $$
Complexity in evaluating $g_{\theta'}$ scales linearly with number of edges $\mathcal{O}(k \ E) < \mathcal{O}(n^2)$
Let's limit to $k=1$
$$ \text{Conv} \left( X \right) \approx w \ \left( \ I + D^{-1/2} \ A \ D^{-1/2} \ \right) \ X $$
Renormalization $\tilde{A} = A+I$ to avoid numerical instabilities for repeated convolution gives:
$$ Z = \tilde{D}^{-1/2} \ \tilde{A} \ \tilde{D}^{-1/2} \ X \ W $$
Full propagation rule at layer $l$, with activation function $\sigma(\cdot)$:
$$ H^{(l+1)} = \sigma \left( \tilde{D}^{-1/2} \ \tilde{A} \ \tilde{D}^{-1/2} \ H^{(l)} \ W^{(l)} \right) $$
Image source: Node Classification by Graph Convolutional Network by Graham Ganssle
Semi-supervised classification on karate club network
Model:
Video source: Graph Convolutional Networks by Thomas Kipf
Authors tested on GSD sets with different
Mean training time over 100 epochs. Factor $\sim 2$ performance gain.
Authors developed a model that
linearly scales with edges
outperforms other models
Two - three layers sufficient
Memory
Non-simple graphs
Locality
For simple graph, the Laplacian is given by $L = D - A$,
$$ L_{ij} = \begin{cases} \text{deg} \, v_i & \text{ if } i=j \\ -1 & \text{ if } i \neq j \text{ and } v_i \text{ adjacent to } v_j \\ 0 & \text{ otherwise} \end{cases} $$
The symmetric, normalized Laplacian $L_{\text{norm}}$ is given by
$$ L_{\text{norm}} = D^{-1/2} \, L \, D^{-1/2} = I - D^{-1/2} \, A \, D^{-1/2} \, , $$
$$ L^{\text{norm}}_{ij} = \begin{cases} 1 &\text{ if } i=j \text{ and deg} \, v_i \neq 0 \\ -\left( \text{deg } v_i \, \text{deg } v_j \right)^{-1/2} & \text{ if } i \neq j \text{ and } v_i \text{ adjacent to } v_j \\ 0 & \text{ otherwise} \end{cases} $$
The resulting spectrum of eigenvalues are all non-negative real values: $\lambda^{\text{norm}}_i \in \mathbb{R}_{\geq 0} \ \ , \ \forall i$
$$ h_i^{(l+1)} = \sigma \left( \sum_{j\in\mathcal{N}+i} \ \frac{1}{c_{ij}} \ h_{j}^{(l)} \ W^{(l)}\right) $$
Interpretation: propagation rule is a parametrized, differentiable generalization of 1D WL alg.