01Why Go Beyond Graphs?
Graph neural networks (GNNs) have been spectacularly successful: molecular property prediction, social network analysis, physics simulation, recommendation systems. But graphs encode only pairwise relationships — an edge connects exactly two nodes. Many real-world phenomena involve multi-way interactions that are not naturally decomposable into pairs.
Consider heat conduction through a chip package. A graph can model pairwise thermal contacts between adjacent layers, but a volumetric region of material — a BEOL tile, say — is fundamentally a higher-dimensional object. Its thermal behavior depends on the collective interaction of all boundary faces, not just pairwise contacts. Topological deep learning provides the mathematical machinery to assign features to, and pass messages between, objects of any dimension.
Physics analogy: In electromagnetism, you already work with objects of different dimension — charges live on points (0-cells), currents flow along paths (1-cells), flux passes through surfaces (2-cells), and charge density fills volumes (3-cells). The de Rham complex formalizes this hierarchy. Topological deep learning is, in a precise sense, the neural network version of that same idea.
02Graphs as Combinatorial Objects
Before building up, let's establish the language carefully. A graph $G = (V, E)$ consists of a set of vertices (or nodes) $V$ and a set of edges $E \subseteq \binom{V}{2}$, where each edge is an unordered pair of distinct vertices.
A graph $G = (V, E)$ is a pair where $V$ is a finite set (vertices) and $E \subseteq \{\{v, w\} : v, w \in V, \, v \neq w\}$ is a set of unordered pairs (edges).
Key point: every vertex has rank 0, and every edge has rank 1. This rank corresponds to dimension — a vertex is a 0-dimensional object, an edge is 1-dimensional. A graph has exactly two ranks. The entire enterprise of topological deep learning is about allowing more ranks.
Adjacency and Incidence
Two vertices $v, w$ are adjacent (written $v \sim w$) if $\{v,w\} \in E$. A vertex $v$ is incident to an edge $e$ if $v \in e$. These two relations — adjacency (same rank) and incidence (cross rank) — are the only structural relationships in a graph. The adjacency matrix $\mathbf{A} \in \{0,1\}^{|V| \times |V|}$ and the incidence matrix $\mathbf{B}_1 \in \{-1,0,1\}^{|V| \times |E|}$ encode these.
The Signed Incidence Matrix: Why the ±1?
You'll notice the incidence matrix $\mathbf{B}_1$ contains $+1$ and $-1$ entries, not just 0/1. This is not an arbitrary convention — it's what turns the incidence matrix into a discrete gradient operator, connecting graph theory to the calculus you already know.
A graph is inherently unordered — edge $\{v_0, v_1\}$ has no preferred direction. But to do linear algebra (and especially to connect to calculus-like operators), we arbitrarily assign an orientation to each edge: pick one vertex as the "tail" (source) and the other as the "head" (target), like drawing an arrow.
For an edge $e$ oriented as $v_{\text{tail}} \to v_{\text{head}}$:
$$(\mathbf{B}_1)_{v,e} = \begin{cases} +1 & \text{if } v \text{ is the tail (source) of } e \\ -1 & \text{if } v \text{ is the head (target) of } e \\ 0 & \text{if } v \notin e \end{cases}$$Why bother with signs? Because they make $\mathbf{B}_1^\top$ act as a discrete gradient. If $f$ is a scalar function on vertices (a 0-cochain — think temperature at nodes), then $\mathbf{B}_1^\top f$ gives the difference along each edge:
This is the key payoff: the graph Laplacian emerges naturally as
The Degree Matrix, the Laplacian, and Why We Care
Three matrices — the adjacency matrix $\mathbf{A}$, the degree matrix $\mathbf{D}$, and the incidence matrix $\mathbf{B}_1$ — are connected by a beautiful identity that turns graph structure into a diffusion operator. Let's define each precisely and then verify the identity by hand on our example.
The degree matrix $\mathbf{D} \in \mathbb{R}^{|V| \times |V|}$ is a diagonal matrix where each diagonal entry is the degree (number of incident edges) of that vertex:
$$D_{vv} = \deg(v) = |\{e \in E : v \in e\}|, \qquad D_{vw} = 0 \;\text{ for } v \neq w$$It counts, for each vertex, how many connections it has. All off-diagonal entries are zero.
For the graph in Figure 2.1: vertex 0 touches edges $a, b, d$ (degree 3); vertex 1 touches $a, c$ (degree 2); vertex 2 touches $b, c, e$ (degree 3); vertex 3 touches $d, e$ (degree 2). So:
This isn't a coincidence — it works for any graph, and the proof is short:
Entry $(v,w)$ of $\mathbf{B}_1\mathbf{B}_1^\top$ is the dot product of rows $v$ and $w$ of $\mathbf{B}_1$.
If $v = w$: You're dotting row $v$ with itself. Each edge incident to $v$ contributes $(\pm 1)^2 = 1$, everything else contributes $0$. Sum = $\deg(v) = D_{vv}$. ✓
If $v \sim w$ (adjacent): They share exactly one edge $e$. In column $e$, one has $+1$ and the other $-1$ (tail vs head). Product = $-1$. In all other columns, at most one of them is nonzero, so product = $0$. Sum = $-1 = 0 - A_{vw} = (D-A)_{vw}$. ✓
If $v \not\sim w$ (not adjacent): No edge contains both. In every column, at least one of them is $0$. Sum = $0 = 0 - 0 = (D-A)_{vw}$. ✓
Why the Laplacian Matters: Discrete Diffusion
So $\mathbf{L}_0 = \mathbf{D} - \mathbf{A}$ has degrees on the diagonal and $-1$ wherever two vertices share an edge. Why is this the right operator for physics?
Apply $\mathbf{L}_0$ to a temperature vector $\mathbf{T}$. The result at vertex $v$ is:
Read that last expression: the Laplacian at $v$ sums up all the temperature differences between $v$ and its neighbors. This is exactly the discrete version of $\nabla^2 T$ — the divergence of the gradient — which governs heat diffusion. The equivalence $\deg(v) \cdot T_v - \sum T_w = \sum(T_v - T_w)$ is why $\mathbf{D} - \mathbf{A}$ is the right combination: the degree term and the adjacency term conspire to produce pairwise differences.
This gives three reasons why the Laplacian is the central object:
1. Diffusion: The heat equation $\dot{T} = -\alpha \mathbf{L}_0 T$ says temperature evolves by flowing from hot to cold — directly from the "sum of differences" structure.
2. Conservation: $\mathbf{L}_0$ is positive semi-definite (since $\mathbf{L}_0 = \mathbf{B}_1\mathbf{B}_1^\top$ is a Gram matrix) and its row sums are zero ($\mathbf{L}_0 \mathbf{1} = 0$). This means total "heat" is conserved under diffusion — a physical requirement.
3. Spectral structure: The eigenvalues $0 = \lambda_0 \leq \lambda_1 \leq \cdots$ measure connectivity. $\lambda_1 = 0$ iff the graph is disconnected. The eigenvectors form a Fourier-like basis for graph signals — the foundation of spectral GNNs (ChebNet, GCN).
Without the signs in $\mathbf{B}_1$, you'd get $\mathbf{D} + \mathbf{A}$ instead — which sums neighbor values rather than computing differences. That operator doesn't conserve anything, isn't positive semi-definite, and has no connection to diffusion. The signs are what make the Laplacian a second-order difference operator (discrete $\nabla^2$).
The choice of orientation is arbitrary. Flipping an edge's direction just negates that column of $\mathbf{B}_1$. All derived quantities — the Laplacian $\mathbf{L}_0 = \mathbf{B}_1\mathbf{B}_1^\top$, Hodge Laplacians, message-passing operators — involve products like $\mathbf{B}^\top\mathbf{B}$ where the signs cancel, so they are invariant to the orientation choice. The orientation is a bookkeeping device, not a physical assumption.
Spectral Structure: Eigenvalues and Eigenvectors of $\mathbf{L}_0$
The three properties above — diffusion, conservation, spectral structure — all flow from the eigendecomposition of $\mathbf{L}_0$. Let's work through this completely for our example, starting from the definition of diagonalization.
Review: Matrix Diagonalization
A real symmetric matrix $\mathbf{M} \in \mathbb{R}^{n \times n}$ can always be decomposed as:
Here $\boldsymbol{\Lambda}$ is diagonal (eigenvalues on the diagonal), and $\mathbf{P}$ is an orthogonal matrix whose columns are the corresponding eigenvectors: $\mathbf{P} = [\mathbf{v}_0 \mid \mathbf{v}_1 \mid \cdots \mid \mathbf{v}_{n-1}]$. Orthogonal means $\mathbf{P}^\top\mathbf{P} = \mathbf{I}$ — the eigenvectors form an orthonormal basis.
The defining property of each eigenpair is:
Applying $\mathbf{M}$ to eigenvector $\mathbf{v}_i$ just scales it by $\lambda_i$ — it doesn't change direction. The eigenvalue $\lambda_i$ is the scaling factor.
Step 1. Find eigenvalues by solving the characteristic equation: $\det(\mathbf{M} - \lambda\mathbf{I}) = 0$. This is a degree-$n$ polynomial in $\lambda$; its roots are the eigenvalues.
Step 2. For each eigenvalue $\lambda_i$, find eigenvectors by solving the homogeneous system $(\mathbf{M} - \lambda_i\mathbf{I})\mathbf{v} = \mathbf{0}$. The solution space (the null space or kernel) gives the eigenvectors.
Step 3. Normalize: $\hat{\mathbf{v}}_i = \mathbf{v}_i / \|\mathbf{v}_i\|$. For a symmetric matrix, eigenvectors from distinct eigenvalues are automatically orthogonal. Within a repeated eigenvalue, use Gram-Schmidt.
Step 1: The Characteristic Polynomial of Our $\mathbf{L}_0$
We need $\det(\mathbf{L}_0 - \lambda\mathbf{I}) = 0$ for:
Before grinding through a $4 \times 4$ determinant, we can use two shortcuts:
Shortcut 1 — $\lambda = 0$ is always an eigenvalue. Since $\mathbf{L}_0 \mathbf{1} = \mathbf{0}$ (row sums are zero), the all-ones vector $\mathbf{1} = (1,1,1,1)^\top$ is an eigenvector with eigenvalue $0$. For a connected graph, $\lambda_0 = 0$ has multiplicity exactly 1.
Shortcut 2 — The trace gives the sum. $\text{tr}(\mathbf{L}_0) = 3 + 2 + 3 + 2 = 10 = \sum_i \lambda_i$.
For our graph, the characteristic polynomial factors as:
giving eigenvalues $\lambda_0 = 0$, $\lambda_1 = 2$, $\lambda_2 = \lambda_3 = 4$ (the last has algebraic multiplicity 2). Check: $0 + 2 + 4 + 4 = 10$ ✓.
Step 2: Finding the Eigenvectors
For each eigenvalue, we solve $(\mathbf{L}_0 - \lambda_i \mathbf{I})\mathbf{v} = \mathbf{0}$:
Step 3: The Diagonalization
Assembling the (normalized) eigenvectors as columns of $\mathbf{P}$ and eigenvalues into $\boldsymbol{\Lambda}$:
Because $\mathbf{P}$ is orthogonal ($\mathbf{P}^\top = \mathbf{P}^{-1}$), this gives us two equivalent readings:
Forward: $\mathbf{L}_0 = \sum_{i=0}^{3} \lambda_i \, \hat{\mathbf{v}}_i \hat{\mathbf{v}}_i^\top$ — the Laplacian is a weighted sum of projection matrices, one per eigenvector.
Inverse: $\mathbf{P}^\top \mathbf{L}_0 \mathbf{P} = \boldsymbol{\Lambda}$ — in the eigenvector basis, $\mathbf{L}_0$ acts by pure scaling (no mixing).
Physical Interpretation: Vibration Modes of the Graph
The eigenvectors are the fundamental modes of oscillation on the graph — the exact analogue of Fourier modes on a continuous domain. Low eigenvalues correspond to slow, smooth variation; high eigenvalues correspond to rapid oscillation.
The Graph Fourier Transform
Just as the classical Fourier transform decomposes a signal into sinusoidal modes of increasing frequency, the graph Fourier transform (GFT) decomposes a graph signal into eigenvector modes of increasing eigenvalue:
The spectral coefficient $\hat{f}_i = \hat{\mathbf{v}}_i^\top \mathbf{f}$ measures "how much of mode $i$ is present in the signal $\mathbf{f}$." Low-index coefficients capture smooth, global trends; high-index coefficients capture sharp, local variations.
Why This Matters for Neural Networks
Spectral GNNs — ChebNet, GCN, and their descendants — define graph convolution as filtering in the spectral domain. A convolution filter $g_\theta$ applied to a signal $\mathbf{f}$ on the graph is:
The function $g_\theta(\lambda)$ is a learnable filter that scales each frequency independently — just like a low-pass or band-pass filter in signal processing, but with learned frequency response. In GCN (Kipf & Welling, 2017), the filter is simply $g_\theta(\lambda) = 1 - \lambda/\lambda_{\max}$, which attenuates high frequencies — a smoothing operation.
The spectral bridge to topology: The graph Laplacian $\mathbf{L}_0$ has eigenvalues and eigenvectors that define a spectral basis for vertex signals. The Hodge Laplacian $\mathbf{L}_k$ (Section 9) does exactly the same for rank-$k$ cochains — it provides eigenvalues (frequencies) and eigenvectors (modes) for signals on edges, faces, and volumes. Spectral topological neural networks use the Hodge eigenvectors as their Fourier basis at each rank, enabling spectral filtering of higher-order signals. This is the mathematical pipeline: Laplacian → eigendecomposition → Fourier basis → learnable spectral filters.
The Discrete de Rham Complex
This sign structure extends to higher ranks. The boundary matrix $\mathbf{B}_{1,2}$ (edges → faces) encodes oriented boundaries of 2-cells and acts as a discrete curl. The full chain of boundary operators forms the discrete de Rham complex:
What's Missing? Message Passing
The spectral perspective tells us that graph convolution is spectral filtering through the Laplacian eigenbasis. But in practice, GNNs work by message passing: each vertex aggregates features from its neighbors, transforms them, and updates its own state. In the next chapter, we'll deconstruct a full GCN layer step by step — with a complete numerical example on this same graph — showing exactly how the Laplacian, the adjacency matrix, and the spectral theory connect to the concrete operations a GNN performs.