13Symmetrizing the Shift: A = S + S*
Before jumping to arbitrary graphs, let us see what happens when we stay on the ring but drop directionality. On the directed cycle, the shift $S$ sends vertex $k$ to vertex $k+1 \pmod{N}$. The adjacency matrix of the undirected cycle (the ring graph) connects each vertex to both its neighbors:
Here $S^*$ is the conjugate transpose of $S$, which shifts in the opposite direction: $(S^*x)_k = x_{k+1 \bmod N}$ (or equivalently, $S^* = S^{N-1}$). The sum $A = S + S^*$ is a real symmetric matrix, which immediately makes it Hermitian with real eigenvalues.
From §7–8, $S^*$ has eigenvalues $\rho_m = e^{i2\pi m/N}$ (Eq. 11) and $S$ has eigenvalues $\rho_m^{-1} = e^{-i2\pi m/N}$ (since $Sw^{(m)} = \rho_m^{-1} w^{(m)}$, as derived in §8). Both share the same eigenvectors $w^{(m)}$, so the eigenvalues of $A = S + S^*$ are simply the sum:
Now comes the critical observation: eigenvalues collapse in pairs. Since $\cos(2\pi m/N) = \cos(2\pi(N-m)/N)$, we have $\lambda_m = \lambda_{N-m}$ for all $m$. Concretely, for $N = 8$:
- $\lambda_0 = 2\cos(0) = 2$ — non-degenerate
- $\lambda_1 = \lambda_7 = 2\cos(\pi/4) = \sqrt{2}$ — degenerate pair
- $\lambda_2 = \lambda_6 = 2\cos(\pi/2) = 0$ — degenerate pair
- $\lambda_3 = \lambda_5 = 2\cos(3\pi/4) = -\sqrt{2}$ — degenerate pair
- $\lambda_4 = 2\cos(\pi) = -2$ — non-degenerate
From 8 distinct eigenvalues of $S$, we are down to only 5 distinct eigenvalues of $A$. The three degenerate pairs each form a 2-dimensional eigenspace in which any orthonormal basis works. The DFT basis is one valid choice, but it is no longer the only one.
Comparing the three polynomial algebras
Before leaving the ring, it is instructive to compare what each operator generates.
Polynomials in $S$ are exactly the circulant matrices — i.e., all circular convolutions. Since $S$ has $N$ distinct eigenvalues, degree-$(N{-}1)$ polynomials span the full $N$-dimensional circulant algebra. Every spectral multiplier $\hat{h} \in \C^N$ is realizable.
Polynomials in $S^*$ are the cross-correlations. But on $\ZN$, $S^* = S^{N-1}$ (shifting backward by one is the same as shifting forward by $N-1$). So any polynomial in $S^*$ is already a polynomial in $S$, and vice versa. The two algebras are identical — convolutions and cross-correlations are the same set of operators on the ring.
Polynomials in $A = S + S^*$ are a strict subspace. Because $\lambda_m(A) = \lambda_{N-m}(A)$, a polynomial $p(A)$ must assign the same spectral value to frequencies $m$ and $N - m$: the filter is forced to be symmetric in frequency, $\hat{h}_m = \hat{h}_{N-m}$. For $N = 8$, there are 5 distinct eigenvalues, so the polynomial algebra is 5-dimensional inside the 8-dimensional circulant algebra. The three “missing dimensions” are exactly the antisymmetric filters that can distinguish a frequency from its conjugate — they require the directed shift $S$ to build.
Physically, this makes sense: $A$ treats left-shifts and right-shifts identically, so it cannot produce filters that favor one direction over the other. Symmetrizing the operator symmetrizes the filter space. The table below summarizes the consequences:
| Property | S (directed) | A = S + S* (undirected) |
|---|---|---|
| Asymmetric filters | ✓ | ✗ (hk = hN−k forced) |
| Unique eigenbasis | ✓ | ✗ (degenerate eigenspaces) |
| Commutant = poly algebra | ✓ | ✗ (commutant larger) |
| Locality | ✓ | ✓ |
| Parameter sharing | ✓ | ✓ |
Going from $S$ to $A$ is the cost of dropping directionality. The directed shift $S$ has $N$ distinct eigenvalues, so its commutant is exactly the set of polynomials in $S$ (the circulant matrices). The undirected adjacency $A = S + S^*$ has repeated eigenvalues, so its commutant is strictly larger than the polynomial algebra. There exist matrices that commute with $A$ but are not polynomials in $A$.
This is a preview of what will happen on general graphs, but in a controlled setting where we can still compute everything explicitly. The ring graph is the bridge between the algebraically perfect world of $\ZN$ and the messy world of irregular graphs.
14The Adjacency Matrix of a General Graph
Now we leave the ring entirely. Our running example for the rest of the course is a small graph that exhibits all the phenomena we care about: community structure, a bottleneck edge, and repeated eigenvalues.
Vertices: $V = \{0, 1, 2, 3, 4, 5\}$. Edges: $E = \big\{\{0,1\}, \{0,2\}, \{1,2\}, \{2,3\}, \{3,4\}, \{3,5\}, \{4,5\}\big\}$.
Two triangles — $(0,1,2)$ and $(3,4,5)$ — connected by the bridge edge $\{2,3\}$.
The adjacency matrix encodes the edge structure:
For our running example, the $6 \times 6$ adjacency matrix is:
Notice the block structure: the upper-left $3 \times 3$ block and the lower-right $3 \times 3$ block encode the two triangles, while the single 1 at positions $(2,3)$ and $(3,2)$ encodes the bridge. This graph has:
- No group structure. Vertices 2 and 3 have degree 3 (three neighbors each), while vertices 0, 1, 4, 5 have degree 2. No group can act transitively on the vertices because the degrees are not uniform.
- No canonical shift. Which vertex is “next” after vertex 0? Vertex 1? Vertex 2? There is no natural ordering. The notion of “shifting a signal by one position” is meaningless here.
Physical picture: Think of each vertex as a sensor, and each edge as a physical connection (a wire, a pipe, a bond). On the ring, every sensor was identical — the same number of connections, arranged symmetrically. On this graph, the bridge vertices 2 and 3 are structurally distinct from the peripheral vertices. There is no way to “translate” the network so that vertex 0 maps to vertex 2 while preserving all connections.
15The Graph Laplacian
On $\ZN$, we used the shift operator $S$ as our fundamental matrix. On general graphs, the preferred operator is the graph Laplacian, which carries richer spectral information than the adjacency matrix alone.
The degree matrix $D$ is the diagonal matrix of vertex degrees:
$$ D = \operatorname{diag}(d_0, d_1, \ldots, d_{N-1}), \qquad d_i = \sum_{j} A_{ij} $$For our running example: $D = \operatorname{diag}(2, 2, 3, 3, 2, 2)$.
The Laplacian $L$ is the difference between the degree matrix and the adjacency matrix. Each diagonal entry $L_{ii} = d_i$ equals the degree of vertex $i$, and each off-diagonal entry $L_{ij} = -A_{ij}$ is $-1$ if $i$ and $j$ are neighbors, and $0$ otherwise. For our running example:
Why prefer $L$ over $A$? There are several deep reasons:
1. The Laplacian is positive semi-definite. All eigenvalues of $L$ are $\geq 0$. This follows from the quadratic form: for any signal $x \in \R^N$,
The Laplacian measures how much the value at vertex $i$ differs from the values at its neighbors. If $x$ is constant across the neighborhood of $i$, then $(Lx)_i = 0$. The more $x_i$ deviates from its neighbors, the larger $(Lx)_i$.
$L = D - A$. The quadratic form
$$ x^T L x \;=\; \sum_{(i,j)\in E}(x_i - x_j)^2 \tag{26} $$measures the total smoothness of the signal $x$ over the graph. A signal is smooth (small $x^TLx$) when neighboring vertices have similar values, and rough (large $x^TLx$) when they differ.
The quadratic form $x^TLx \geq 0$ for all $x$ immediately proves that $L$ is positive semi-definite. The zero eigenvalue corresponds to the constant eigenvector $\mathbf{1} = (1,1,\ldots,1)^T$, for which all differences $x_i - x_j = 0$.
2. Combinatorial meaning. The eigenvalues of $L$ encode fundamental graph properties. The number of zero eigenvalues equals the number of connected components. The smallest nonzero eigenvalue $\lambda_1$ (the algebraic connectivity or Fiedler value) quantifies how well-connected the graph is. The Cheeger inequality relates $\lambda_1$ to the minimum edge cut.
3. Discrete analog of $\nabla^2$. On a regular grid, the graph Laplacian reduces to the standard finite-difference Laplacian. The formula $(Lx)_i = \sum_{j \sim i}(x_i - x_j) = d_i\, x_i - \sum_{j \sim i} x_j$ is precisely a discrete divergence of a discrete gradient — the combinatorial analog of the continuous $\nabla^2$.
4. Physical intuition. $(Lx)_i$ measures how much $x_i$ differs from the average of its neighbors: $(Lx)_i = d_i\bigl(x_i - \frac{1}{d_i}\sum_{j \sim i} x_j\bigr)$. If you think of $x$ as a temperature field, $Lx$ gives the “thermal stress” at each vertex — the tendency for heat to flow away from (or toward) that vertex.
Heat equation on graphs: The equation $\dot{x}(t) = -Lx(t)$ is the discrete heat equation. Heat flows from hot vertices to cold neighbors along edges. The eigenvectors of $L$ are the natural “modes” of this diffusion, and the eigenvalues are the decay rates. Low-frequency modes (small $\lambda$) decay slowly; high-frequency modes (large $\lambda$) decay quickly. This is the graph-theoretic analog of Fourier modes in the continuous heat equation.
16Eigendecomposition of L
Now we compute the full eigendecomposition of our running example’s Laplacian. This worked example is the concrete foundation for everything that follows.
Since $L$ is a $6 \times 6$ real symmetric matrix, it has 6 real eigenvalues (counted with multiplicity) and a complete orthonormal set of eigenvectors. Notice that our graph has a reflection symmetry: swapping vertices $(0 \leftrightarrow 5, 1 \leftrightarrow 4, 2 \leftrightarrow 3)$ preserves all edges. This symmetry forces the eigenvectors to be either symmetric or antisymmetric under the reflection, and it causes eigenvalue collisions.
The characteristic polynomial $\det(L - \lambda I) = 0$ factors as:
$$ \lambda(\lambda - 3)^3(\lambda^2 - 5\lambda + 2) = 0 $$which gives the eigenvalues:
Numerically:
- $\lambda_0 = 0$ — always present for connected graphs; eigenvector $u_0 \propto (1,1,1,1,1,1)^T$ (constant signal).
- $\lambda_1 = \frac{5 - \sqrt{17}}{2} \approx 0.44$ — the Fiedler eigenvalue.
- $\lambda_2 = \lambda_3 = \lambda_4 = 3$ — a triple eigenvalue.
- $\lambda_5 = \frac{5 + \sqrt{17}}{2} \approx 4.56$ — the largest eigenvalue.
The Fiedler eigenvector $u_1$ (corresponding to $\lambda_1 \approx 0.44$) has a specific geometric structure: it is positive on one triangle and negative on the other. This is because the Fiedler value measures the algebraic connectivity — the bridge between communities. The eigenvector changes sign exactly at the bottleneck, revealing the two-community structure.
Concretely, the Fiedler eigenvector (unnormalized) takes the form $u_1 \propto (a, a, b, -b, -a, -a)^T$ where $a$ and $b$ are determined by the quadratic $\lambda^2 - 5\lambda + 2 = 0$. The positive entries sit on triangle 1, the negative entries on triangle 2, and the magnitudes at the bridge vertices (2 and 3) differ from the peripheral vertices because they have higher degree.
The triple eigenvalue $\lambda = 3$ is the key pathological feature. It creates a 3-dimensional eigenspace, and any orthonormal basis for this eigenspace is equally valid. There is no canonical choice. If you run a numerical eigendecomposition, the specific eigenvectors returned in this eigenspace depend on the algorithm, the rounding, and even the platform. The eigenvalues are well-defined, but the individual eigenvectors within degenerate eigenspaces are not.
Where does the triple eigenvalue come from? Each triangle $K_3$ contributes an eigenvalue of 3 to its own Laplacian (the eigenvalue of the “antisymmetric mode” within a triangle). The two triangles, weakly coupled by the bridge, each contribute this mode, and an additional mode arises from their interaction. The result is a three-fold degeneracy.
The eigenvectors, ordered by increasing eigenvalue (i.e., increasing “roughness”), provide a frequency decomposition of signals on the graph:
- $u_0$ ($\lambda_0 = 0$): the constant signal — “DC component,” the smoothest possible.
- $u_1$ ($\lambda_1 \approx 0.44$): one sign change across the bridge — the lowest “frequency.”
- $u_2, u_3, u_4$ ($\lambda = 3$): oscillations within each triangle — medium “frequency.”
- $u_5$ ($\lambda_5 \approx 4.56$): the most oscillatory mode — highest “frequency.”
Two key lessons. The eigenvalue $\lambda_1 \approx 0.44$ reflects the bridge between communities — this is the Fiedler value, measuring how well-connected the graph is. The triple eigenvalue at 3 shows that on graphs, eigenvalue collision is the norm, not the exception. Any graph with internal symmetries (and most real-world graphs have at least approximate symmetries) will exhibit degeneracies.
17Three Characterizations Revisited
With the graph Laplacian and its eigendecomposition in hand, we can now state precisely which of our three characterizations from $\ZN$ survive the transition to general graphs.
Group-theoretic characterization: BROKEN. On $\ZN$, we defined shift-invariant operators as those commuting with the group action (the shift $S$). On a general graph, there is no group acting transitively on the vertices. The entire framework of “translation invariance as commutativity with a group action” has no analog. This characterization is genuinely and irrecoverably lost.
Algebraic characterization (polynomials in $L$): SURVIVES. For any graph Laplacian $L$ and any polynomial $p$, the matrix $p(L) = \theta_0 I + \theta_1 L + \theta_2 L^2 + \cdots + \theta_d L^d$ is a well-defined linear operator on signals. We can define convolution as any polynomial in $L$. This is well-defined for any graph, requires no eigenbasis, and — as we will see in Chapter 5 — the degree $d$ controls the locality of the filter (an order-$d$ polynomial touches only the $d$-hop neighborhood of each vertex).
Spectral characterization (diagonal in $L$-eigenbasis): SURVIVES. Since $L$ is real symmetric, it always has a complete orthonormal eigenbasis $U = [u_0 | u_1 | \cdots | u_{N-1}]$, with $L = U\Lambda U^T$. We can define convolution as $U\operatorname{diag}(\hat{h})\,U^T x$ for any choice of spectral multipliers $\hat{h} \in \R^N$. This is well-defined for any graph and gives $N$ free parameters (one per eigenvalue).
Note the inclusion $\subseteq$, not equality. Any polynomial in $L$ is diagonal in the $L$-eigenbasis (because $L^k = U\Lambda^k U^T$, so $p(L) = U\,p(\Lambda)\,U^T$, which is diagonal with entries $p(\lambda_m)$). But the converse fails when $L$ has repeated eigenvalues: a spectral multiplier can assign different values to eigenvectors within the same eigenspace, which no polynomial in $L$ can do (a polynomial must assign the same value $p(\lambda)$ to all eigenvectors with eigenvalue $\lambda$).
For our running example, the triple eigenvalue $\lambda = 3$ illustrates this precisely. A polynomial filter $p(L)$ must assign the same multiplier $p(3)$ to all three eigenvectors in the $\lambda = 3$ eigenspace. A spectral filter can assign three different multipliers $\hat{h}_2, \hat{h}_3, \hat{h}_4$ to those eigenvectors. The spectral class is strictly larger.
| Characterization | ZN status | Graph status |
|---|---|---|
| Group-theoretic (commute with S) | ✓ | ✗ Broken |
| Algebraic (polynomial in operator) | ✓ (poly in S) | ✓ (poly in L) |
| Spectral (diagonal in eigenbasis) | ✓ (DFT basis) | ✓ (L-eigenbasis) |
The pivot of the course: on graphs, we define convolutions via the algebraic and spectral characterizations, which still agree with each other in the sense that every polynomial filter is a spectral filter. The reverse inclusion fails when eigenvalues repeat, but this is a manageable asymmetry. The group-theoretic characterization — the one that gave us translation invariance for free — is genuinely gone. On graphs, there is no translation invariance. There is only the weaker notion of “locality via polynomial degree” and “spectral filtering via the eigenbasis of a chosen operator.”
This sets the stage for the next chapter, where we develop both characterizations in full detail: the graph Fourier transform $\hat{x} = U^T x$ and the polynomial graph convolution $p(L)x$. We will see how the spectral and polynomial viewpoints each come with distinct tradeoffs — expressiveness versus efficiency, global versus local — and how the GCN architecture of Kipf & Welling (2017) emerges as a specific degree-1 polynomial approximation.