Representation Learning on networks¶
Why Is It Hard?¶
Modern deep learning toolbox is designed for simple sequences or grids.
But networks are far more complex! Complex topographical structure (i.e., NO Spatial locality like grids)
Node embeddings¶
Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.
Setup¶
Assume we have a graph G:
-
V is the vertex set.
-
A is the adjacency matrix (assume binary).
No node features or extra information is used!
“Shallow” Encoding¶
Simplest encoding approach: each node is assigned a unique embedding vector.
-
We will focus on shallow encoding in this section...
-
After the break we will discuss more encoders based on deep neural networks.
How to Define Node Similarity?¶
Key distinction between “shallow” methods is how they define node similarity.
E.g., should two nodes have similar embeddings if they - are connected? - share neighbors - have similar “structural roles”?
Adjacency-based Similarity¶
-
Similarity function is just the edge weight between u and v in the original network.
-
Intuition: Dot products between node embeddings approximate edge existence.
Drawbacks¶
- \(O(V^2)\) runtime
- \(O(V)\) paramters
- Only sonsiders direct, local connections
Multihop Similarity¶
Idea: Consider k-hop node neighbors
Issues¶
-
Expensive: Generally \(O(|V^2|)\), since we need to iterate over all pairs of nodes.
-
Brittle: Must hand-design deterministic node similarity measures.
-
Massive parameter space: \(O(|V|)\) parameters
Randome walk Method¶
-
Estimate probability of visiting node \(v\) on a random walk starting from node \(u\) using some random walk strategy \(R\).
-
Optimize embeddings to encode these random walk statistics.
Why Random Walks?¶
-
Expressivity: Flexible stochastic definition of node similarity that incorporates both local and higher- order neighborhood information.
-
Efficiency: Do not need to consider all node pairs when training; only need to consider pairs that co-occur on random walks.
Random Walk Optimization¶
-
Run short random walks starting from each node on the graph using some strategy
-
For each node wu collect \(N_R(U)\), the multiset” of nodes visited on random walks starting rom \(uw\).
-
Optimize embeddings to according to
(1) Sum over all nodes u
(2) sum over nodes v seen on random walks starting from u
(3) Predicted probability if u and v co-occuring on random walk
Optimizing random walk embeddings = Finding embeddings \(Z_u\) that minimize L
Graph neural networks¶
From Shallow to Deep¶
-
Limitations of shallow encoding:
-
O(|V|) parameters are needed: there no parameter sharing and every node has its own unique embedding vector.
-
Inherently “transductive’: It is impossible to generate embeddings for nodes that were not seen during training.
-
Do not incorporate node features: Many graphs have features that we can and should leverage.
Setup¶
Assume we have a graph G:
-
V is the vertex set.
-
A is the adjacency matrix (assume binary).
-
\(X\in R^{m \times |v|}\) is a matrix of node features
- Categorical attributes, text, image data — E.g., profile information in a social network.
- Node degrees, clustering coefficients, etc.
- " Indicator vectors (i.e., one-hot encoding of each node)
Neighborhood aggregation¶
Generate node embeddings based on local neighborhoods
NetworkX¶
!pip install networkx
Requirement already satisfied: networkx in /usr/local/lib/python3.6/dist-packages (2.5)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx) (4.4.2)