The Chain of Analogies

In this chapter

28The Complete Analogy Table
29What We Discovered
30Modern GNN Variants in Spectral Context
31Beyond Graphs
32Further Reading

28The Complete Analogy Table

The following table is the main reference for the course — every row records a concept from the $\ZN$ setting and its graph counterpart.

The rows where the analogy breaks are as informative as the rows where it holds.

Concept	Z_N (ring)	General Graph
Domain	$N$ points equally spaced on a circle	$N$ vertices, arbitrary topology
Shift / adjacency	$S$ (directed cyclic shift)	$A$ (adjacency matrix)
Symmetric shift	$A = S + S^*$ (forward + backward shift)	$A$ (already symmetric for undirected graphs)
Smoothness operator	$L = 2I - (S + S^*)$	$L = D - A$
Fourier basis	DFT vectors $w^{(m)}_k = \frac{1}{\sqrt{N}}\rho_m^k$, where $\rho_m = e^{i2\pi m/N}$	Eigenvectors $u_k$ of $L$
Forward transform	$\hat{x} = Wx$ (DFT)	$\hat{x} = U^T x$ (GFT)
Inverse transform	$x = W^{-1}\hat{x}$ (IDFT)	$x = U\hat{x}$ (IGFT)
Convolution (algebraic)	Polynomial in $S$: $\sum_k h_k S^k$	Polynomial in $L$: $\sum_k \theta_k L^k$
Convolution (spectral)	Diagonal in the DFT basis	Diagonal in the $L$-eigenbasis
Convolution (group-theoretic)	Commutes with all shifts $S^k$	— (no analog: no transitive group)
Frequency	DFT index $k$ (or $m$)	Eigenvalue $\lambda_k$ of $L$
Low frequency	$k \approx 0$ (smooth, slowly varying)	Small $\lambda$ (smooth on graph)
High frequency	$k \approx N/2$ (oscillatory)	Large $\lambda$ (oscillatory on graph)
Spectral filtering	$\hat{y}_k = \hat{h}_k \cdot \hat{x}_k$	$\hat{y}_k = \hat{h}_k \cdot \hat{x}_k$
Filter locality	Kernel width (number of terms in polynomial)	Polynomial degree (number of hops)
Eigenvalue structure	All distinct (roots of unity)	May repeat (real symmetric matrix)
Eigenbasis uniqueness	Unique (determined by distinct eigenvalues)	Non-unique when eigenvalues repeat

The spectral filtering row is identical on both sides. This is the core of the analogy. Regardless of whether the underlying domain is a ring or an arbitrary graph, the filtering recipe is the same — transform to the eigenbasis, multiply pointwise, transform back.

We can express this parallel in a single equation that captures the entire course:

Eq. 45 — Spectral Filtering on ZN vs. Graphs $$ \underbrace{y = W^{-1}\operatorname{diag}(\hat{h})\,W\,x}_{\text{DFT filtering on }\ZN} \quad \longleftrightarrow \quad \underbrace{y = U\,\operatorname{diag}(\hat{h})\,U^T\,x}_{\text{Graph filtering}} \tag{45} $$

In both cases, the eigenbasis plays the role of a "frequency domain" in which convolution becomes pointwise multiplication. On $\ZN$, the eigenbasis is the DFT matrix $W$, determined by the group structure and distinct eigenvalues. On a graph, the eigenbasis is the matrix $U$ of Laplacian eigenvectors — the graph Fourier transform. The formula is the same; only the basis changes.

The rows where the analogy breaks are the most informative. The "group-theoretic convolution" row has no graph analog. The "eigenvalue structure" row differs (distinct vs. possibly repeated). The "eigenbasis uniqueness" row differs. These three breakdowns are not accidents — they are logically connected, and they trace a single thread: the absence of group structure on general graphs.

29What We Discovered

This course showed that $\ZN$ convolution theory motivates the constructions used on graphs, but the correspondence between them is partial. The spectral and algebraic structures carry over; the group-theoretic underpinning does not. What the graph literature calls "convolution" is modeled on the $\ZN$ case — it is not a unique, canonical extension of it.

What the course covered

Part I: The $\ZN$ story (Chapters 1–3). Signals on a ring, the cyclic shift $S$, and three characterizations of convolution that all coincide: (i) operators that commute with all shifts (group-theoretic), (ii) polynomials in $S$ (algebraic), (iii) operators diagonal in the DFT basis (spectral). The DFT emerged as the unique eigenbasis of $S$, forced by the simultaneous diagonalization theorem (Bamieh 2018). The distinct eigenvalues of $S$ (roots of unity) guaranteed uniqueness.

Part II: The graph story (Chapters 4–6). Replacing $S$ with the graph Laplacian $L$, the group structure was lost — an irregular graph has no translation symmetry. With it, the group-theoretic characterization died. Two characterizations survived: polynomials in $L$ (algebraic) and operators diagonal in the $L$-eigenbasis (spectral). These were adopted as the definition of graph convolution.

Eq. 46 — Circulant vs. Graph Filtering $$ \underbrace{C_h = W^{-1}\hat{H}W}_{\text{Circulant = DFT-diagonal (on }\ZN\text{)}} \quad \longleftrightarrow \quad \underbrace{C_h = U\hat{H}U^T}_{\text{Graph convolution = GFT-diagonal}} \tag{46} $$

The left side is a theorem — circulant matrices are exactly those diagonalized by the DFT. The right side is a definition, chosen because of the structural parallel. $W$ and $U$ play analogous roles, but the right side is not forced by the same algebraic argument that forces the left.

What was gained

Most treatments of graph convolutions begin with $L$ and simply define filtering as $U\operatorname{diag}(\hat{h})U^T x$, without explaining where that formula comes from. By starting from $\ZN$, we saw that this definition is not arbitrary — it is modeled on a structure that is uniquely determined in the classical setting. The definition is motivated rather than imposed.

What was lost

Three things, all connected:

Group structure. No transitive group acts on the vertices of an irregular graph. "Translation" does not apply.
Eigenbasis uniqueness. When $L$ has repeated eigenvalues, any rotation within a repeated eigenspace gives an equally valid "graph Fourier transform."
The third characterization. On $\ZN$, group-theoretic, algebraic, and spectral characterizations define the same set of operators. On graphs, only the algebraic and spectral characterizations survive.

These losses are the price of modeling irregular, non-homogeneous connectivity. A ring is one of the simplest possible graph structures. General graphs can model molecules, social networks, meshes, and brain connectomes — but the canonical symmetries that made $\ZN$ so tractable are gone.

A note on the word "convolution"

On $\ZN$, "convolution" has a precise group-theoretic meaning: it is the operation that commutes with all group translations. The three characterizations proved that the same set of operators can equivalently be described as polynomials in $S$ or as spectrally diagonal operators. The word "convolution" names all three at once, because they coincide.

On graphs, we kept the word but the strict definition it refers to is gone. What the literature calls "graph convolution" is really the spectral/polynomial construction — operators diagonal in the $L$-eigenbasis, or equivalently polynomials in $L$. This construction is genuinely useful and well-motivated by the $\ZN$ parallel. But it is not uniquely determined in the way circular convolution is: the choice of $L$ over $A$ (or a normalized variant) is a modeling decision, not a theorem; and the non-unique eigenbasis means that "the graph Fourier transform" is a convention, not a canonical object. The literature is not always upfront about this.

The honest summary: the spectral filtering recipe (Eq. 45) has the same form on $\ZN$ and on graphs. On $\ZN$, this form is derived from group structure. On graphs, it is adopted by analogy. Two of three characterizations survive; one dies; the eigenbasis may not be unique. The analogy is productive — it motivates all of spectral graph theory and the architectures built on it — but it is an analogy, not an identity.

Figure 7.1. The mathematical arc of the course. The top row traces the $\ZN$ path: from the cyclic group to the shift operator $S$, through simultaneous diagonalization to the DFT. The bottom row traces the graph path: from arbitrary graphs to the Laplacian $L$, through the GFT to polynomial/spectral convolutions and GNNs. The diagonal arrow marks the conceptual leap — adopting the spectral filtering recipe from $\ZN$ as the definition of graph convolution.

30Modern GNN Variants in Spectral Context

With the spectral framework, we can place the major GNN architectures along a spectrum from "purely spectral" to "non-spectral." This is not a survey — it is a guide to reading the literature through the lens of what we have covered. For each architecture, we ask: what does the filter look like in spectral terms?

GCN (Kipf & Welling, 2017). The Graph Convolutional Network uses a degree-1 polynomial in the renormalized Laplacian $\tilde{L} = I - \tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$, where $\tilde{A} = A + I$ adds self-loops. The layer update is:

$$ H^{(\ell+1)} = \sigma\!\left(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}\, H^{(\ell)}\, W^{(\ell)}\right) $$

In spectral terms, this is a very restrictive filter. A degree-1 polynomial has only two free spectral values (the filter is constrained to be a linear function of $\lambda$), so the GCN can only learn filters that vary linearly across the spectrum. It cannot distinguish between two frequencies with the same spectral distance from either endpoint. Despite this limitation, the GCN launched the modern GNN era — the simplicity of a single-hop aggregation with shared weights proved remarkably effective for semi-supervised node classification.

ChebNet (Defferrard, Bresson & Vandergheynst, 2016). ChebNet is historically the first practical spectral GNN. It uses a degree-$K$ Chebyshev polynomial approximation:

$$ g_\theta(\Lambda) \approx \sum_{k=0}^{K} \theta_k\, T_k(\tilde{\Lambda}) $$

where $T_k$ are Chebyshev polynomials and $\tilde{\Lambda} = 2\Lambda/\lambda_{\max} - I$ rescales the eigenvalues to $[-1, 1]$. In spectral terms, ChebNet has $K+1$ free parameters controlling a polynomial fit to the spectrum. Unlike GCN, it can learn localized filters with arbitrary spectral shape (up to degree $K$). The Chebyshev basis is chosen for numerical stability — it avoids the ill-conditioning of the monomial basis $\{I, L, L^2, \ldots\}$ at high degrees. Crucially, a degree-$K$ polynomial in $L$ corresponds to a filter with $K$-hop spatial support: the output at vertex $i$ depends only on vertices within $K$ graph hops.

GAT (Velickovic et al., 2018). The Graph Attention Network computes attention-weighted aggregation:

$$ h_i^{(\ell+1)} = \sigma\!\left(\sum_{j \in \mathcal{N}(i)} \alpha_{ij}\, W h_j^{(\ell)}\right) $$

where the attention weights $\alpha_{ij}$ are computed from the features of both vertices $i$ and $j$. In spectral terms, GAT is not a spectral method. The key difference is that the attention weights are data-dependent — the "filter" changes for every input signal. In our framework, a spectral filter $\hat{h}$ is fixed (or learned once and applied to all inputs). GAT breaks this by making the effective filter a function of the input features. This is analogous to the difference between a linear time-invariant (LTI) filter and a nonlinear, time-varying one. GAT is more expressive precisely because it abandons the LTI constraint.

GraphSAGE (Hamilton, Inui & Leskovec, 2017). GraphSAGE addresses scalability by sampling a fixed-size neighborhood at each hop, rather than aggregating over the full neighborhood. The aggregation step is:

$$ h_i^{(\ell+1)} = \sigma\!\left(W \cdot \text{CONCAT}\!\left(h_i^{(\ell)},\; \text{AGG}\!\left(\{h_j^{(\ell)} : j \in \mathcal{S}(i)\}\right)\right)\right) $$

where $\mathcal{S}(i)$ is a sampled subset of the neighbors, and AGG can be mean, LSTM, or max-pooling. Spectrally, the mean aggregator is a 1-hop polynomial filter (like GCN), but the sampling introduces a stochastic approximation. The LSTM and max-pool aggregators are non-spectral — they are nonlinear functions of the neighbor features.

GIN (Xu, Hu, Leskovec & Jegelka, 2019). The Graph Isomorphism Network is designed to be maximally expressive within the message-passing framework. Its update rule is:

Eq. 47 — GIN Update Rule $$ h_i^{(k+1)} = \text{MLP}^{(k)}\!\left((1 + \varepsilon^{(k)})\, h_i^{(k)} + \sum_{j \in \mathcal{N}(i)} h_j^{(k)}\right) \tag{47} $$

The term $(1+\varepsilon)\, h_i + \sum_{j \in \mathcal{N}(i)} h_j$ is a special case of the message-passing framework. In matrix form, the linear part (before the MLP) is $(1+\varepsilon)I + A$ applied to the feature matrix — a degree-1 polynomial in $A$. The parameter $\varepsilon$ (learnable or fixed) controls the relative weight of the self-signal versus the neighbor aggregate. GIN proves that this update, followed by a sufficiently expressive MLP, is as powerful as the Weisfeiler-Leman graph isomorphism test for distinguishing non-isomorphic graphs.

Spectral Classification of GNN Architectures

Architecture	Filter type	Spectral freedom	Spatial locality
ChebNet	Degree-$K$ polynomial in $L$	$K+1$ parameters	$K$ hops
GCN	Degree-1 polynomial in $\tilde{L}$	2 parameters (linear in $\lambda$)	1 hop
GIN	Degree-1 polynomial in $A$ + MLP	2 spectral + nonlinear MLP	1 hop per layer
GAT	Data-dependent (attention)	Input-dependent (not LTI)	1 hop per layer
GraphSAGE	1-hop + sampling	Stochastic approximation	1 hop (sampled)

The table reveals a clear pattern. The earliest spectral methods (ChebNet) offered the most control over the spectral response but required computing Laplacian eigenvectors or Chebyshev recurrences. Later architectures (GCN, GIN, GraphSAGE, GAT) traded spectral precision for simplicity and scalability, using only 1-hop operations that avoid any eigendecomposition. The field has largely moved from spectral to spatial, but the spectral viewpoint remains essential for understanding why these architectures work — and where they are limited.

A common misconception: "Stacking $K$ layers of a 1-hop GCN gives a degree-$K$ polynomial filter." This is only true if there are no nonlinearities between layers. With ReLU activations between layers, the composition is a nonlinear function of $L$, not a polynomial. Each layer is a degree-1 polynomial, but the composition through nonlinearities creates a much richer function class — one that cannot be described by any single polynomial in $L$.

31Beyond Graphs

Graphs model pairwise relationships: an edge connects two vertices. But many real-world systems have higher-order structure that pairwise edges cannot capture. Three friends who always appear together at meetings form a three-way relationship that is more than the sum of three pairwise friendships. A triangular face in a mesh carries geometric information (orientation, area) beyond what its three edges encode. To model such systems, we need mathematical objects that go beyond graphs.

Simplicial complexes extend graphs by adding higher-dimensional cells. A 0-simplex is a vertex, a 1-simplex is an edge, a 2-simplex is a filled triangle, a 3-simplex is a filled tetrahedron, and so on. The crucial additional structure is the boundary operator $\partial_k$, which maps each $k$-simplex to its oriented $(k-1)$-boundary. For a triangle with vertices $(v_0, v_1, v_2)$, the boundary is the oriented sum of its three edges: $\partial_2[v_0, v_1, v_2] = [v_1, v_2] - [v_0, v_2] + [v_0, v_1]$.

Cell complexes relax the simplicial constraint. In a simplicial complex, every face of a simplex must also belong to the complex (a filled triangle forces all three edges and all three vertices to be present). Cell complexes allow more flexible topology — cells of arbitrary shape, without the inclusion constraint on faces.

The connection to our spectral framework is through the Hodge Laplacian. Just as the graph Laplacian $L_0 = D - A$ operates on signals living on vertices (0-cells), the Hodge Laplacian $L_k$ operates on signals living on $k$-cells (edges, faces, and higher). It is defined using the boundary operators, represented as matrices $B_k$:

Eq. 48 — Hodge Laplacian $$ L_k = B_k^T B_k + B_{k+1} B_{k+1}^T \tag{48} $$

This decomposition has the following structure. The first term $B_k^T B_k$ is the lower Laplacian (or "gradient" component) — it measures how a $k$-signal varies relative to the $(k-1)$-cells below it. The second term $B_{k+1} B_{k+1}^T$ is the upper Laplacian (or "curl" component) — it measures how a $k$-signal circulates around the $(k+1)$-cells above it.

For $k=0$ (vertex signals), the boundary matrix $B_0$ does not exist (or is zero), and $B_1$ is the vertex-edge incidence matrix. So $L_0 = B_1 B_1^T$, which is exactly the graph Laplacian $L = D - A$. The Hodge Laplacian at rank 0 reduces to the familiar graph Laplacian.

For $k=1$ (edge signals), $L_1 = B_1^T B_1 + B_2 B_2^T$. The eigenvectors of $L_1$ define a "Fourier transform for edge signals" — they are the natural frequency decomposition for flows on a network. Low-frequency edge signals are smooth flows; high-frequency edge signals are rapidly oscillating.

Figure 7.2. The Hodge Laplacian hierarchy. Each rank of cells (vertices, edges, faces, ...) has its own Laplacian $L_k$, defined in terms of the boundary operators $B_k$ connecting adjacent ranks. The graph Laplacian $L_0$ is the special case for vertex signals. The spectral theory generalizes: eigenvectors of $L_k$ define a "Fourier transform" for $k$-cell signals.

The spectral theory carries over to every rank. The eigenvectors of $L_k$ form an orthonormal basis for $k$-cell signals, and filtering can be defined as polynomial operations in $L_k$ or as spectral multipliers in the $L_k$-eigenbasis. The entire framework we developed for graphs — polynomial filters, spectral convolutions, the trade-off between spectral freedom and spatial locality — applies to signals on edges, faces, and higher-dimensional cells.

A deep result from algebraic topology, the Hodge decomposition, states that the kernel of $L_k$ (the "harmonic" $k$-signals) captures the $k$th homology of the complex — its $k$-dimensional "holes." Harmonic 0-signals correspond to connected components. Harmonic 1-signals correspond to independent cycles. Harmonic 2-signals correspond to enclosed cavities. This topological information is invisible to standard graph neural networks, which operate only on vertex signals with $L_0$.

For a full treatment of simplicial and cell complexes, Hodge theory, and higher-order message passing, see the Topological Deep Learning course on this site. That course builds the boundary operators, constructs all the Hodge Laplacians, proves the Hodge decomposition, and develops the higher-order message-passing architectures (SNN, CWN, CAN) that generalize GNNs to simplicial and cell complexes.

32Further Reading

This course traced a single thread: from circular convolution on $\ZN$ to spectral convolution on graphs. The following works are the primary sources for the ideas developed here, along with the key papers that extend them. Each entry includes a note on how it connects to our narrative.

Primary References

Bamieh (2018) — "A Tutorial on Matrix Functions and their use in the Discovery of the DFT."

The foundation of Part I. Bamieh shows that the DFT is a consequence of simultaneous diagonalization. Contains the complete proofs of the commutant theorem, the three characterizations, and the derivation of the DFT from algebraic structure. Our Chapters 1–3 follow this paper closely.
Hammond, Vandergheynst & Gribonval (2011) — "Wavelets on Graphs via Spectral Graph Theory."

Foundational paper for graph signal processing. Introduces spectral graph wavelets using polynomial approximation of spectral multipliers — the direct mathematical ancestor of ChebNet. Established the principle that graph filters should be defined spectrally and approximated by polynomials, which is the central idea of our Chapters 5–6.
Defferrard, Bresson & Vandergheynst (2016) — "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering."

ChebNet: the bridge from spectral graph theory to practical neural networks. Combines Hammond et al.'s polynomial spectral filters with learnable parameters and neural network training. The Chebyshev polynomial basis avoids eigendecomposition entirely, making spectral filtering scalable. This is the key paper for understanding how spectral theory becomes computational practice.
Kipf & Welling (2017) — "Semi-Supervised Classification with Graph Convolutional Networks."

The GCN — a degree-1 simplification of ChebNet that launched the modern GNN era. By restricting to a single Chebyshev polynomial and adding the renormalization trick ($\tilde{A} = A + I$, symmetric normalization), Kipf and Welling produced a model simple enough for widespread adoption. The paper demonstrates that even the crudest spectral filter (linear in $\lambda$) can achieve strong results when stacked with nonlinearities.
Bronstein, Bruna, Cohen & Velickovic (2021) — "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges."

The "5G" blueprint paper. Places everything in this course within a unified framework of symmetry and invariance. Standard CNNs, GNNs, Transformers, and equivariant networks are all instances of a single principle: exploiting the symmetry group of the domain. Our $\ZN$ story is the special case where the symmetry group is cyclic; graphs are the case where symmetry is reduced to the automorphism group.
Distill.pub (2021) — "Understanding Convolutions on Graphs."

A visual introduction to graph convolutions with interactive diagrams. This course goes deeper mathematically — Bamieh's discovery approach, the three characterizations, the role of repeated eigenvalues, the lost group-theoretic characterization — but the Distill article is a good complement for building geometric intuition, particularly the spectral filtering visualizations.
Hajij et al. (2022) — "Topological Deep Learning: Going Beyond Graph Data."

The paper that our Topological Deep Learning course provides background for. Takes the Hodge Laplacian from Section 31 further, defining message passing on simplicial and cell complexes. If the spectral framework of this course felt natural on graphs, Hajij et al. show how to push it to higher-order structures where boundary operators replace the adjacency matrix.

Additional references by topic:

On graph signal processing foundations: Shuman, Narang, Frossard, Ortega & Vandergheynst (2013), "The Emerging Field of Signal Processing on Graphs," provides a comprehensive survey of spectral graph theory for signal processing, including windowed graph Fourier transforms and uncertainty principles on graphs.

On the expressivity of GNNs: Xu, Hu, Leskovec & Jegelka (2019), "How Powerful are Graph Neural Networks?" (the GIN paper) proves that message-passing GNNs are at most as powerful as the Weisfeiler-Leman isomorphism test, and designs GIN to achieve this upper bound. Morris et al. (2019), "Weisfeiler and Leman Go Neural," independently reach similar conclusions.

On attention mechanisms for graphs: Velickovic et al. (2018), "Graph Attention Networks," introduces data-dependent edge weights via attention. Brody, Alon & Yahav (2022), "How Attentive are Graph Attention Networks?" identifies a subtle issue in the original GAT attention mechanism and proposes GATv2 with strictly more expressive dynamic attention.

On scalable graph learning: Hamilton, Ying & Leskovec (2017), "Inductive Representation Learning on Large Graphs" (GraphSAGE), introduces neighborhood sampling for mini-batch training on graphs with millions of nodes. Cluster-GCN (Chiang et al., 2019) and GraphSAINT (Zeng et al., 2020) offer alternative scalable training strategies.