Graphs as <strong>Combinatorial Objects</strong> — Topological Deep Learning Course

In this chapter

01Why Go Beyond Graphs?
02Graphs as Combinatorial Objects

01Why Go Beyond Graphs?

Graph neural networks (GNNs) have been spectacularly successful: molecular property prediction, social network analysis, physics simulation, recommendation systems. But graphs encode only pairwise relationships — an edge connects exactly two nodes. Many real-world phenomena involve multi-way interactions that are not naturally decomposable into pairs.

Figure 1.1. From pairwise to higher-order. A graph captures only edges between pairs. A simplicial complex can represent a filled triangle (a 3-body interaction). A combinatorial complex allows arbitrary higher-order cells — here, a 2-cell spanning four nodes that isn't a triangle.

Consider heat conduction through a chip package. A graph can model pairwise thermal contacts between adjacent layers, but a volumetric region of material — a BEOL tile, say — is fundamentally a higher-dimensional object. Its thermal behavior depends on the collective interaction of all boundary faces, not just pairwise contacts. Topological deep learning provides the mathematical machinery to assign features to, and pass messages between, objects of any dimension.

Physics analogy: In electromagnetism, you already work with objects of different dimension — charges live on points (0-cells), currents flow along paths (1-cells), flux passes through surfaces (2-cells), and charge density fills volumes (3-cells). The de Rham complex formalizes this hierarchy. Topological deep learning is, in a precise sense, the neural network version of that same idea.

02Graphs as Combinatorial Objects

Before building up, let's establish the language carefully. A graph $G = (V, E)$ consists of a set of vertices (or nodes) $V$ and a set of edges $E \subseteq \binom{V}{2}$, where each edge is an unordered pair of distinct vertices.

Definition — Graph

A graph $G = (V, E)$ is a pair where $V$ is a finite set (vertices) and $E \subseteq \{\{v, w\} : v, w \in V, \, v \neq w\}$ is a set of unordered pairs (edges).

Key point: every vertex has rank 0, and every edge has rank 1. This rank corresponds to dimension — a vertex is a 0-dimensional object, an edge is 1-dimensional. A graph has exactly two ranks. The entire enterprise of topological deep learning is about allowing more ranks.

Adjacency and Incidence

Two vertices $v, w$ are adjacent (written $v \sim w$) if $\{v,w\} \in E$. A vertex $v$ is incident to an edge $e$ if $v \in e$. These two relations — adjacency (same rank) and incidence (cross rank) — are the only structural relationships in a graph. The adjacency matrix $\mathbf{A} \in \{0,1\}^{|V| \times |V|}$ and the incidence matrix $\mathbf{B}_1 \in \{-1,0,1\}^{|V| \times |E|}$ encode these.

Figure 2.1. A graph with 4 vertices and 5 edges, its adjacency matrix A, and its signed incidence matrix B₁. The Laplacian $L_0 = D - A$ (where $D$ is the degree matrix) governs diffusion on the graph — this is the discrete analogue of $\nabla^2$.

The Signed Incidence Matrix: Why the ±1?

You'll notice the incidence matrix $\mathbf{B}_1$ contains $+1$ and $-1$ entries, not just 0/1. This is not an arbitrary convention — it's what turns the incidence matrix into a discrete gradient operator, connecting graph theory to the calculus you already know.

A graph is inherently unordered — edge $\{v_0, v_1\}$ has no preferred direction. But to do linear algebra (and especially to connect to calculus-like operators), we arbitrarily assign an orientation to each edge: pick one vertex as the "tail" (source) and the other as the "head" (target), like drawing an arrow.

Convention — Signed Incidence

For an edge $e$ oriented as $v_{\text{tail}} \to v_{\text{head}}$:

$$(\mathbf{B}_1)_{v,e} = \begin{cases} +1 & \text{if } v \text{ is the tail (source) of } e \\ -1 & \text{if } v \text{ is the head (target) of } e \\ 0 & \text{if } v \notin e \end{cases}$$

Why bother with signs? Because they make $\mathbf{B}_1^\top$ act as a discrete gradient. If $f$ is a scalar function on vertices (a 0-cochain — think temperature at nodes), then $\mathbf{B}_1^\top f$ gives the difference along each edge:

Discrete gradient $$\left(\mathbf{B}_1^\top f\right)_e \;=\; f(v_{\text{head}}) - f(v_{\text{tail}}) \quad\longleftrightarrow\quad (\nabla f) \cdot \hat{e}$$

This is the key payoff: the graph Laplacian emerges naturally as

Figure 2.2. An arbitrary orientation on edge $a$ makes the incidence matrix a signed operator. Applying $\mathbf{B}_1^\top$ to a vertex signal computes the directional difference — a discrete gradient. The sign tells you the direction of increase.

The Degree Matrix, the Laplacian, and Why We Care

Three matrices — the adjacency matrix $\mathbf{A}$, the degree matrix $\mathbf{D}$, and the incidence matrix $\mathbf{B}_1$ — are connected by a beautiful identity that turns graph structure into a diffusion operator. Let's define each precisely and then verify the identity by hand on our example.

Definition — Degree Matrix

The degree matrix $\mathbf{D} \in \mathbb{R}^{|V| \times |V|}$ is a diagonal matrix where each diagonal entry is the degree (number of incident edges) of that vertex:

$$D_{vv} = \deg(v) = |\{e \in E : v \in e\}|, \qquad D_{vw} = 0 \;\text{ for } v \neq w$$

It counts, for each vertex, how many connections it has. All off-diagonal entries are zero.

For the graph in Figure 2.1: vertex 0 touches edges $a, b, d$ (degree 3); vertex 1 touches $a, c$ (degree 2); vertex 2 touches $b, c, e$ (degree 3); vertex 3 touches $d, e$ (degree 2). So:

Figure 2.3. Full worked verification of $\mathbf{B}_1\mathbf{B}_1^\top = \mathbf{D} - \mathbf{A} = \mathbf{L}_0$ for the Figure 2.1 graph. The diagonal of $\mathbf{B}_1\mathbf{B}_1^\top$ always gives the degree (each row's squared entries sum to the number of incident edges). The off-diagonal entry at $(v,w)$ is $-1$ if $v$ and $w$ share an edge (they appear with opposite signs in exactly one column of $\mathbf{B}_1$), and $0$ otherwise.

This isn't a coincidence — it works for any graph, and the proof is short:

Why it always works

Entry $(v,w)$ of $\mathbf{B}_1\mathbf{B}_1^\top$ is the dot product of rows $v$ and $w$ of $\mathbf{B}_1$.

If $v = w$: You're dotting row $v$ with itself. Each edge incident to $v$ contributes $(\pm 1)^2 = 1$, everything else contributes $0$. Sum = $\deg(v) = D_{vv}$. ✓

If $v \sim w$ (adjacent): They share exactly one edge $e$. In column $e$, one has $+1$ and the other $-1$ (tail vs head). Product = $-1$. In all other columns, at most one of them is nonzero, so product = $0$. Sum = $-1 = 0 - A_{vw} = (D-A)_{vw}$. ✓

If $v \not\sim w$ (not adjacent): No edge contains both. In every column, at least one of them is $0$. Sum = $0 = 0 - 0 = (D-A)_{vw}$. ✓

Why the Laplacian Matters: Discrete Diffusion

So $\mathbf{L}_0 = \mathbf{D} - \mathbf{A}$ has degrees on the diagonal and $-1$ wherever two vertices share an edge. Why is this the right operator for physics?

Apply $\mathbf{L}_0$ to a temperature vector $\mathbf{T}$. The result at vertex $v$ is:

Laplacian applied to a signal $$(\mathbf{L}_0 \mathbf{T})_v = \deg(v) \cdot T_v - \sum_{w \sim v} T_w = \sum_{w \sim v}\left(T_v - T_w\right)$$

Read that last expression: the Laplacian at $v$ sums up all the temperature differences between $v$ and its neighbors. This is exactly the discrete version of $\nabla^2 T$ — the divergence of the gradient — which governs heat diffusion. The equivalence $\deg(v) \cdot T_v - \sum T_w = \sum(T_v - T_w)$ is why $\mathbf{D} - \mathbf{A}$ is the right combination: the degree term and the adjacency term conspire to produce pairwise differences.

Figure 2.4. The Laplacian applied to a temperature field. At vertex 0 (50°), surrounded by cooler neighbors (20°, 30°, 10°), the Laplacian gives a large positive value — indicating net heat outflow. A vertex in equilibrium with its neighbors gives $(L_0 T)_v = 0$. The steady-state heat equation $L_0 T = 0$ says: every vertex is at the average temperature of its neighbors.

This gives three reasons why the Laplacian is the central object:

1. Diffusion: The heat equation $\dot{T} = -\alpha \mathbf{L}_0 T$ says temperature evolves by flowing from hot to cold — directly from the "sum of differences" structure.

2. Conservation: $\mathbf{L}_0$ is positive semi-definite (since $\mathbf{L}_0 = \mathbf{B}_1\mathbf{B}_1^\top$ is a Gram matrix) and its row sums are zero ($\mathbf{L}_0 \mathbf{1} = 0$). This means total "heat" is conserved under diffusion — a physical requirement.

3. Spectral structure: The eigenvalues $0 = \lambda_0 \leq \lambda_1 \leq \cdots$ measure connectivity. $\lambda_1 = 0$ iff the graph is disconnected. The eigenvectors form a Fourier-like basis for graph signals — the foundation of spectral GNNs (ChebNet, GCN).

Without the signs in $\mathbf{B}_1$, you'd get $\mathbf{D} + \mathbf{A}$ instead — which sums neighbor values rather than computing differences. That operator doesn't conserve anything, isn't positive semi-definite, and has no connection to diffusion. The signs are what make the Laplacian a second-order difference operator (discrete $\nabla^2$).

The choice of orientation is arbitrary. Flipping an edge's direction just negates that column of $\mathbf{B}_1$. All derived quantities — the Laplacian $\mathbf{L}_0 = \mathbf{B}_1\mathbf{B}_1^\top$, Hodge Laplacians, message-passing operators — involve products like $\mathbf{B}^\top\mathbf{B}$ where the signs cancel, so they are invariant to the orientation choice. The orientation is a bookkeeping device, not a physical assumption.

Spectral Structure: Eigenvalues and Eigenvectors of $\mathbf{L}_0$

The three properties above — diffusion, conservation, spectral structure — all flow from the eigendecomposition of $\mathbf{L}_0$. Let's work through this completely for our example, starting from the definition of diagonalization.

Review: Matrix Diagonalization

A real symmetric matrix $\mathbf{M} \in \mathbb{R}^{n \times n}$ can always be decomposed as:

Eigendecomposition (Spectral Theorem) $$\mathbf{M} = \mathbf{P} \boldsymbol{\Lambda} \mathbf{P}^\top \qquad\text{where}\qquad \boldsymbol{\Lambda} = \text{diag}(\lambda_0, \lambda_1, \ldots, \lambda_{n-1})$$

Here $\boldsymbol{\Lambda}$ is diagonal (eigenvalues on the diagonal), and $\mathbf{P}$ is an orthogonal matrix whose columns are the corresponding eigenvectors: $\mathbf{P} = [\mathbf{v}_0 \mid \mathbf{v}_1 \mid \cdots \mid \mathbf{v}_{n-1}]$. Orthogonal means $\mathbf{P}^\top\mathbf{P} = \mathbf{I}$ — the eigenvectors form an orthonormal basis.

The defining property of each eigenpair is:

Eigenvector equation $$\mathbf{M}\mathbf{v}_i = \lambda_i \mathbf{v}_i$$

Applying $\mathbf{M}$ to eigenvector $\mathbf{v}_i$ just scales it by $\lambda_i$ — it doesn't change direction. The eigenvalue $\lambda_i$ is the scaling factor.

Recipe — Finding Eigenvalues and Eigenvectors

Step 1. Find eigenvalues by solving the characteristic equation: $\det(\mathbf{M} - \lambda\mathbf{I}) = 0$. This is a degree-$n$ polynomial in $\lambda$; its roots are the eigenvalues.

Step 2. For each eigenvalue $\lambda_i$, find eigenvectors by solving the homogeneous system $(\mathbf{M} - \lambda_i\mathbf{I})\mathbf{v} = \mathbf{0}$. The solution space (the null space or kernel) gives the eigenvectors.

Step 3. Normalize: $\hat{\mathbf{v}}_i = \mathbf{v}_i / \|\mathbf{v}_i\|$. For a symmetric matrix, eigenvectors from distinct eigenvalues are automatically orthogonal. Within a repeated eigenvalue, use Gram-Schmidt.

Step 1: The Characteristic Polynomial of Our $\mathbf{L}_0$

We need $\det(\mathbf{L}_0 - \lambda\mathbf{I}) = 0$ for:

Characteristic matrix $$\mathbf{L}_0 - \lambda\mathbf{I} = \begin{pmatrix} 3-\lambda & -1 & -1 & -1 \\ -1 & 2-\lambda & -1 & 0 \\ -1 & -1 & 3-\lambda & -1 \\ -1 & 0 & -1 & 2-\lambda \end{pmatrix}$$

Before grinding through a $4 \times 4$ determinant, we can use two shortcuts:

Shortcut 1 — $\lambda = 0$ is always an eigenvalue. Since $\mathbf{L}_0 \mathbf{1} = \mathbf{0}$ (row sums are zero), the all-ones vector $\mathbf{1} = (1,1,1,1)^\top$ is an eigenvector with eigenvalue $0$. For a connected graph, $\lambda_0 = 0$ has multiplicity exactly 1.

Shortcut 2 — The trace gives the sum. $\text{tr}(\mathbf{L}_0) = 3 + 2 + 3 + 2 = 10 = \sum_i \lambda_i$.

For our graph, the characteristic polynomial factors as:

Characteristic polynomial $$\det(\mathbf{L}_0 - \lambda\mathbf{I}) = \lambda(\lambda - 2)(\lambda - 4)^2 = 0$$

giving eigenvalues $\lambda_0 = 0$, $\lambda_1 = 2$, $\lambda_2 = \lambda_3 = 4$ (the last has algebraic multiplicity 2). Check: $0 + 2 + 4 + 4 = 10$ ✓.

Step 2: Finding the Eigenvectors

For each eigenvalue, we solve $(\mathbf{L}_0 - \lambda_i \mathbf{I})\mathbf{v} = \mathbf{0}$:

Figure 2.5. The four eigenvectors of $\mathbf{L}_0$, computed by solving $(\mathbf{L}_0 - \lambda_i\mathbf{I})\mathbf{v} = \mathbf{0}$ for each eigenvalue. The constant mode ($\lambda_0 = 0$) represents equilibrium. The Fiedler vector ($\lambda_1 = 2$) gives the smoothest non-trivial partition. The two high-frequency modes ($\lambda_2 = \lambda_3 = 4$) oscillate rapidly across the graph.

Step 3: The Diagonalization

Assembling the (normalized) eigenvectors as columns of $\mathbf{P}$ and eigenvalues into $\boldsymbol{\Lambda}$:

Diagonalization: $\mathbf{L}_0 = \mathbf{P}\boldsymbol{\Lambda}\mathbf{P}^\top$ $$\underbrace{\begin{pmatrix} 3 & -1 & -1 & -1 \\ -1 & 2 & -1 & 0 \\ -1 & -1 & 3 & -1 \\ -1 & 0 & -1 & 2 \end{pmatrix}}_{\mathbf{L}_0} = \underbrace{\begin{pmatrix} \tfrac{1}{2} & 0 & \tfrac{1}{2} & \tfrac{1}{\sqrt{2}} \\ \tfrac{1}{2} & \tfrac{1}{\sqrt{2}} & -\tfrac{1}{2} & 0 \\ \tfrac{1}{2} & 0 & \tfrac{1}{2} & -\tfrac{1}{\sqrt{2}} \\ \tfrac{1}{2} & -\tfrac{1}{\sqrt{2}} & -\tfrac{1}{2} & 0 \end{pmatrix}}_{\mathbf{P}} \underbrace{\begin{pmatrix} 0 & & & \\ & 2 & & \\ & & 4 & \\ & & & 4 \end{pmatrix}}_{\boldsymbol{\Lambda}} \underbrace{\mathbf{P}^\top}_{\mathbf{P}^{-1}}$$

Because $\mathbf{P}$ is orthogonal ($\mathbf{P}^\top = \mathbf{P}^{-1}$), this gives us two equivalent readings:

Forward: $\mathbf{L}_0 = \sum_{i=0}^{3} \lambda_i \, \hat{\mathbf{v}}_i \hat{\mathbf{v}}_i^\top$ — the Laplacian is a weighted sum of projection matrices, one per eigenvector.
Inverse: $\mathbf{P}^\top \mathbf{L}_0 \mathbf{P} = \boldsymbol{\Lambda}$ — in the eigenvector basis, $\mathbf{L}_0$ acts by pure scaling (no mixing).

Physical Interpretation: Vibration Modes of the Graph

The eigenvectors are the fundamental modes of oscillation on the graph — the exact analogue of Fourier modes on a continuous domain. Low eigenvalues correspond to slow, smooth variation; high eigenvalues correspond to rapid oscillation.

Figure 2.6. Eigenvector values visualized on the graph. Red = positive, blue = negative, gray = zero. The zero-eigenvalue mode (DC) is constant everywhere. The Fiedler vector ($\lambda_1$) gives the smoothest non-trivial variation. The two $\lambda = 4$ modes oscillate at the highest frequency the graph supports — every edge connects vertices of different sign (or zero).

The Graph Fourier Transform

Just as the classical Fourier transform decomposes a signal into sinusoidal modes of increasing frequency, the graph Fourier transform (GFT) decomposes a graph signal into eigenvector modes of increasing eigenvalue:

Graph Fourier Transform $$\hat{\mathbf{f}} = \mathbf{P}^\top \mathbf{f} \qquad \text{(analysis: signal → spectral coefficients)}$$ $$\mathbf{f} = \mathbf{P}\hat{\mathbf{f}} = \sum_{i=0}^{n-1} \hat{f}_i \, \hat{\mathbf{v}}_i \qquad \text{(synthesis: spectral coefficients → signal)}$$

The spectral coefficient $\hat{f}_i = \hat{\mathbf{v}}_i^\top \mathbf{f}$ measures "how much of mode $i$ is present in the signal $\mathbf{f}$." Low-index coefficients capture smooth, global trends; high-index coefficients capture sharp, local variations.

Figure 2.7. The graph Fourier transform of our temperature signal from Figure 2.4. The DC component (27.5°) gives the mean temperature. The Fiedler coefficient (5) captures gentle variation. The high-frequency coefficients (12.5 and 10) capture sharp contrasts. Together they perfectly reconstruct the original signal.

Why This Matters for Neural Networks

Spectral GNNs — ChebNet, GCN, and their descendants — define graph convolution as filtering in the spectral domain. A convolution filter $g_\theta$ applied to a signal $\mathbf{f}$ on the graph is:

Spectral Graph Convolution $$g_\theta \star \mathbf{f} = \mathbf{P} \, g_\theta(\boldsymbol{\Lambda}) \, \mathbf{P}^\top \mathbf{f} = \mathbf{P} \begin{pmatrix} g_\theta(\lambda_0) & & \\ & \ddots & \\ & & g_\theta(\lambda_{n-1}) \end{pmatrix} \mathbf{P}^\top \mathbf{f}$$

The function $g_\theta(\lambda)$ is a learnable filter that scales each frequency independently — just like a low-pass or band-pass filter in signal processing, but with learned frequency response. In GCN (Kipf & Welling, 2017), the filter is simply $g_\theta(\lambda) = 1 - \lambda/\lambda_{\max}$, which attenuates high frequencies — a smoothing operation.

The spectral bridge to topology: The graph Laplacian $\mathbf{L}_0$ has eigenvalues and eigenvectors that define a spectral basis for vertex signals. The Hodge Laplacian $\mathbf{L}_k$ (Section 9) does exactly the same for rank-$k$ cochains — it provides eigenvalues (frequencies) and eigenvectors (modes) for signals on edges, faces, and volumes. Spectral topological neural networks use the Hodge eigenvectors as their Fourier basis at each rank, enabling spectral filtering of higher-order signals. This is the mathematical pipeline: Laplacian → eigendecomposition → Fourier basis → learnable spectral filters.

The Discrete de Rham Complex

This sign structure extends to higher ranks. The boundary matrix $\mathbf{B}_{1,2}$ (edges → faces) encodes oriented boundaries of 2-cells and acts as a discrete curl. The full chain of boundary operators forms the discrete de Rham complex:

Figure 2.8. The discrete de Rham complex: coboundary operators $\mathbf{B}_{k,k+1}^\top$ act as discrete gradient, curl, and divergence. The fundamental identity $\mathbf{B}_{k,k+1}^\top \mathbf{B}_{k-1,k}^\top = 0$ (curl of grad = 0, div of curl = 0) holds by construction — it's a consequence of the signed incidence structure.

What's Missing? Message Passing

The spectral perspective tells us that graph convolution is spectral filtering through the Laplacian eigenbasis. But in practice, GNNs work by message passing: each vertex aggregates features from its neighbors, transforms them, and updates its own state. In the next chapter, we'll deconstruct a full GCN layer step by step — with a complete numerical example on this same graph — showing exactly how the Laplacian, the adjacency matrix, and the spectral theory connect to the concrete operations a GNN performs.