Circulant Matrices and the Shift Operator

In this chapter

1Signals on a Ring
2The Shift Operator
3Circulant Matrices as Polynomials in S
4Circular Convolution

1Signals on a Ring

The entire story of the DFT begins with one of the simplest algebraic structures imaginable: the cyclic group $\ZN$. This is the group of integers $\{0, 1, 2, \ldots, N-1\}$ with addition modulo $N$. You can think of it as $N$ points arranged equally around a circle, where counting past $N-1$ wraps back to $0$.

A signal on $\ZN$ is simply a function $x: \ZN \to \C$ (or $\R$) that assigns a complex (or real) value to each of the $N$ points on the ring. We write $x[n]$ for the value at point $n$, and collect all $N$ values into a column vector:

Signal as vector $$ x = \begin{pmatrix} x[0] \\ x[1] \\ \vdots \\ x[N-1] \end{pmatrix} \in \C^N \tag{1} $$

Throughout this course, our running example will be $\Z_8$ — eight points equally spaced on a circle. A signal on $\Z_8$ is just a vector in $\C^8$ (or $\R^8$). Here is what the underlying domain looks like:

Figure 1.1. The cyclic group $\Z_8$: eight points equally spaced on a circle. Arrows show the cyclic ordering — the successor of point 7 wraps back to point 0. A signal on $\Z_8$ assigns a value $x[n]$ to each of these eight nodes.

Why start here? Because $\ZN$ has a beautiful symmetry — every point looks the same as every other point. There is no "special" node. This translation symmetry is what will force the DFT to exist. Our goal in the next few chapters is to see exactly how this works.

Note the distinction between the domain ($\ZN$, a finite set of points) and the signal ($x \in \C^N$, a vector of values). The domain is the underlying structure; the signal lives on top of it. This distinction will matter enormously when we move from rings to graphs.

2The Shift Operator

The most natural operation on a signal living on a ring is to shift it around the ring. Define the cyclic right-shift operator $S$ by:

Right-shift definition $$ (Sx)[n] = x[(n - 1) \bmod N] \tag{2} $$

In words: $S$ moves every value one step clockwise around the ring. The value that was at position $n-1$ is now at position $n$. The value at position $0$ wraps around to become the value at position $N-1$ moving to position $0$ — that is, $(Sx)[0] = x[N-1]$.

As a matrix, $S$ has a 1 on the subdiagonal and a 1 in the top-right corner (the "wraparound"). For $\Z_8$:

$$ S = \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{pmatrix} $$

The matrix entry $S_{ij} = 1$ if and only if $j = (i - 1) \bmod N$. Every row and every column has exactly one 1 — this is a permutation matrix.

Three fundamental properties of $S$ follow immediately from the definition:

1. Periodicity. Shifting $N$ times brings every value back to its starting position:

Periodicity of S $$ S^N = I \tag{3} $$

This is because $(S^N x)[n] = x[(n - N) \bmod N] = x[n]$ for all $n$. In matrix terms, $S$ is a permutation of order $N$: it cycles through all $N$ positions before returning to the identity.

2. The adjoint is the left-shift. The conjugate transpose $S^* = S^T$ (since $S$ is real) performs the left-shift: $(S^* x)[n] = x[(n+1) \bmod N]$. This moves values one step counterclockwise. You can verify this by transposing the matrix above — the 1's move from the subdiagonal to the superdiagonal, with a wraparound in the bottom-left corner.

3. Unitarity. $S$ preserves the norm of every signal:

$$ S^* S = S S^* = I $$

This is immediate: $S^*S = S^{N-1}S = S^N = I$. Physically, shifting a signal around a ring neither creates nor destroys energy — it just relocates it.

$S$ and the group $\ZN$. A generator of a group is an element from which every other element can be built by repeated application. The abstract group $\ZN$ is the integers $\{0, 1, \ldots, N-1\}$ under addition mod $N$, and the element $1$ generates it: $1, 1+1=2, 1+1+1=3, \ldots$ cycles through every element before wrapping back to $0$. The shift operator $S$ lives in a different world — it is an $N \times N$ matrix, not an integer — but the group of matrices $\{I, S, S^2, \ldots, S^{N-1}\}$ under matrix multiplication is isomorphic to $\ZN$. The correspondence $k \;\leftrightarrow\; S^k$ maps group element $k$ to the matrix that shifts signals by $k$ positions, and it respects the group operation: $S^j S^k = S^{(j+k) \bmod N}$, mirroring $j + k \pmod{N}$ in $\ZN$. In the language of representation theory, $k \mapsto S^k$ is a faithful representation of $\ZN$ on $\C^N$ — a way for the abstract group to act concretely on signals. The equation $S^N = I$ encodes the cyclic structure: applying the generator $N$ times returns to the identity, just as adding $1$ to itself $N$ times gives $0 \pmod{N}$. Because a single matrix $S$ generates the entire group action, understanding $S$ alone will be enough to understand every shift-invariant operation.

3Circulant Matrices as Polynomials in S

Now we build something more general than a simple shift. Given a kernel (or filter) $h = (h_0, h_1, \ldots, h_{N-1})^T \in \C^N$, define the circulant matrix:

Circulant matrix as polynomial in S $$ C_h = \sum_{k=0}^{N-1} h_k \, S^k = h_0 I + h_1 S + h_2 S^2 + \cdots + h_{N-1} S^{N-1} \tag{4} $$

This is a polynomial in $S$ with coefficients given by the kernel $h$. Since $S^k$ shifts by $k$ positions, the matrix $C_h$ combines shifted versions of the input signal with weights $h_k$. The matrix entries are:

$$ (C_h)_{ij} = h_{(i - j) \bmod N} $$

The key structural property: every row of $C_h$ is a cyclic shift of the first row. This is what makes it "circulant" — the rows circulate.

Worked example. Take $N = 8$ and the kernel $h = (3, 1, 0, 0, 0, 0, 0, 2)^T$. This kernel has three nonzero entries: $h_0 = 3$, $h_1 = 1$, and $h_7 = 2$. The circulant matrix is:

$$ C_h = 3I + 1 \cdot S + 2 \cdot S^7 $$

To build the matrix explicitly, we compute $(C_h)_{ij} = h_{(i-j) \bmod 8}$. Row $0$ (i.e., $i=0$):

$j=0$: $h_{0} = 3$
$j=1$: $h_{(0-1) \bmod 8} = h_7 = 2$
$j=2$: $h_{(0-2) \bmod 8} = h_6 = 0$
$j=3$ through $j=6$: $h_5 = h_4 = h_3 = h_2 = 0$
$j=7$: $h_{(0-7) \bmod 8} = h_1 = 1$

So row 0 is $(3, 2, 0, 0, 0, 0, 0, 1)$. Each subsequent row is the cyclic right-shift of the previous:

$$ C_h = \begin{pmatrix} 3 & 2 & 0 & 0 & 0 & 0 & 0 & 1 \\ 1 & 3 & 2 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 3 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 3 & 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 3 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 3 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 3 & 2 \\ 2 & 0 & 0 & 0 & 0 & 0 & 1 & 3 \end{pmatrix} $$

Notice the structure: the diagonal is all 3's (from $h_0 = 3$), the subdiagonal is all 1's (from $h_1 = 1$), and the superdiagonal is all 2's (from $h_7 = 2$, which corresponds to $S^7 = S^{-1}$, the left-shift). The matrix wraps around at the corners — the 2 in position $(0,1)$ and the 1 in position $(7,6)$ are the cyclic wraparound terms.

Key idea: Circulant matrices are polynomials in $S$. This algebraic fact — that all circulants live in the polynomial algebra $\C[S]/(S^N - I)$ — will drive everything that follows. It means diagonalizing $S$ automatically diagonalizes every circulant simultaneously.

Aside: A natural operation that is not circulant

Not every linear operation on signals is a polynomial in $S$. Consider signal reversal (reflection through position 0): the operator $R$ that sends $x[n]$ to $x[-n \bmod N]$. In vector notation:

$$ Rx = \begin{pmatrix} x[0] \\ x[N-1] \\ x[N-2] \\ \vdots \\ x[2] \\ x[1] \end{pmatrix} $$

For $N = 8$, the matrix is:

$$ R = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix} $$

This is not circulant. Row 0 is $(1, 0, 0, 0, 0, 0, 0, 0)$ and row 1 is $(0, 0, 0, 0, 0, 0, 0, 1)$ — row 1 is not a cyclic shift of row 0. So $R$ is not a polynomial in $S$, and signal reversal is not a convolution.

But $R$ is not unrelated to $S$ either. What happens when we compose them? Apply $R$ then $S$: the signal is first reversed, then shifted. Apply $S$ then $R$: the signal is first shifted, then reversed. These give different results — $RS \neq SR$ — so $R$ does not commute with $S$ (which is exactly why $R$ cannot be circulant, by the characterization we will prove in §5). But there is a pattern. Working out the action on a signal component:

$$ (RS\,x)[n] = (Sx)[-n] = x[-n - 1], \qquad (S^*R\,x)[n] = (Rx)[n+1] = x[-(n+1)] = x[-n-1] $$

where all indices are mod $N$. So $RS = S^*R$: composing reversal-then-right-shift is the same as left-shift-then-reversal. Since $R$ is its own inverse ($R^2 = I$ — reversing twice recovers the original signal), we can conjugate:

$$ RSR = S^* $$

Reversal conjugates the right-shift $S$ into the left-shift $S^*$. It swaps the direction of propagation around the ring. This is not a coincidence — $R$ is the geometric reflection of the ring through position 0, and reflecting a clockwise shift produces a counterclockwise shift.

Why this matters: The reversal operator $R$ will reappear in §4, where it turns out to be exactly what distinguishes convolution from cross-correlation. Convolving $x$ with a kernel $h$ involves the reversed kernel $Rh$ (the “flip” in “flip and slide”); cross-correlating with $h$ uses $h$ without flipping. The algebraic relationship $RSR = S^*$ is the structural reason the flip appears.

4Circular Convolution

What happens when we apply $C_h$ to a signal $x$? Let us compute $y = C_h x$ by expanding the polynomial definition (Eq. 4):

$$ y = C_h\, x = \sum_{k=0}^{N-1} h_k \, S^k x $$

The $k$-th power of the shift moves every entry $k$ steps around the ring: $(S^k x)[n] = x[(n - k) \bmod N]$. Substituting and reading off the $n$-th component:

$$ y[n] = \sum_{k=0}^{N-1} h_k \, (S^k x)[n] = \sum_{k=0}^{N-1} h_k \, x[(n - k) \bmod N] $$

This operation is called circular convolution, written $y = h * x$:

Circular convolution $$ (h * x)[n] = \sum_{k=0}^{N-1} h[k] \, x[(n - k) \bmod N] \tag{5} $$

To visualize this, fix an output position $n$ and arrange two sequences on concentric rings. The inner ring carries the signal values $x[0], x[1], \ldots, x[N-1]$ at their natural positions. The outer ring carries the kernel rearranged: position $j$ gets the coefficient $h[(n - j) \bmod N]$. The output $y[n]$ is the sum of products of radially aligned pairs. As $n$ increases by 1, the outer ring rotates one step clockwise.

Worked example. For our $\Z_8$ kernel $h = (3, 1, 0, 0, 0, 0, 0, 2)^T$, only three coefficients are nonzero: $h_0 = 3$, $h_1 = 1$, $h_7 = 2$. The figure below shows all eight output components. In each panel, the inner ring (gray dots) holds $x[0]$ through $x[7]$ at fixed positions. The three colored dots on the outer ring mark the nonzero kernel coefficients — watch how they rotate clockwise as $n$ increases:

Figure 1.2. Circular convolution on $\Z_8$ with kernel $h = (3,1,0,0,0,0,0,2)^T$. Each panel computes one component $y[n]$ by summing products of radially aligned pairs: inner ring $x[j]$ times outer ring $h[(n-j) \bmod 8]$. The three nonzero kernel coefficients (colored dots) rotate clockwise by one position per step. For instance, $y[0] = 3\,x[0] + 2\,x[1] + 1\cdot x[7]$, while $y[1] = 1\cdot x[0] + 3\,x[1] + 2\,x[2]$. Since each $y[n]$ is just a sum of pairwise products, swapping the roles of the two rings — putting $h$ on the inner ring and rearranging $x$ on the outer — gives the same values. This is the symmetry $C_h\,x = C_x\,h$.

Look at the $n = 0$ panel. The outer ring shows $h[0], h[7], h[6], \ldots, h[1]$ — the kernel reversed (reflected through position 0). Why the flip? Because the shift operator $S^k$ moves signal values forward by $k$ positions, so the value arriving at output position $n$ came from position $n - k$. The coefficient $h[k]$ multiplies the signal value $k$ steps behind the output, not $k$ steps ahead. You are looking backward along the shift to find the source. This backward look is the origin of the minus sign in $x[(n - k) \bmod N]$, and it is what puts the reversed kernel on the outer ring.

A second way to see the reversal: look at the matrix $C_h$ directly. We computed that row 0 has entries $(C_h)_{0,j} = h[-j \bmod N] = (Rh)[j]$. In our example, $h = (3,1,0,0,0,0,0,2)^T$ and row 0 is $(3,2,0,0,0,0,0,1)$ — which is $Rh$. Since $C_h$ is circulant, every subsequent row is a cyclic shift of this first row: row $n$ is $S^n(Rh)$.

This means $y[n] = (C_h x)[n]$ is the dot product of $x$ with the $n$-th cyclic shift of $Rh$:

$$ y[n] = \langle S^n(Rh),\, x \rangle $$

Convolution slides a reversed copy of the kernel across the signal, computing a dot product at each position — a “template matching” operation. The template is $Rh$, not $h$ itself.

We can express this matrix relationship concisely. Recall that $C_{Rh}$ is the circulant built from $Rh$, so $(C_{Rh})_{n,j} = (Rh)[(n-j) \bmod N] = h[(j-n) \bmod N]$. Comparing with $(C_h)_{n,j} = h[(n-j) \bmod N]$, we see that swapping $n$ and $j$ takes one to the other:

$$ C_h = C_{Rh}^T $$

Transposing a circulant reverses its kernel. Since $C_h$ is built from $h$ and $C_h^T$ is built from $Rh$, applying $C_h$ to $x$ amounts to taking inner products of $x$ against shifted copies of $Rh$ — which is exactly what the figure shows on the outer ring.

This reversal is precisely what distinguishes convolution from cross-correlation. Cross-correlation pairs $h[k]$ with $x[n + k]$ — no flip. In the ring picture, cross-correlation would place $h$ on the outer ring in its natural order, matching $h[k]$ to the signal value $k$ steps ahead. The two operations are related by the reversal operator $R$ from §3: $(h \star x)[n] = (Rh * x)[n]$, where $Rh$ is the time-reversed kernel $(Rh)[k] = h[-k \bmod N]$ and $\star$ denotes cross-correlation. The algebraic root is the identity $RSR = S^*$: reversal swaps the direction of the shift, turning the backward look of convolution into the forward look of correlation.

Terminology trap: In the neural network literature, “convolutional layer” almost universally means cross-correlation, not convolution. PyTorch’s Conv1d and Conv2d documentation explicitly states that the operation is the “valid cross-correlation operator” — the learned kernel is applied without flipping. This is harmless in practice: since the kernel weights are learned from data, the network can absorb the reflection into the learned parameters. A kernel that would be $h$ under true convolution is simply learned as $Rh$ under cross-correlation. But it means the “convolution theorem” as stated above (eigenvalues = DFT of the kernel) applies to the mathematical operation, not directly to what a CNN layer computes. Mathematically honest names like “cross-correlation layer” never caught on.

Returning to the figure: it also makes visible the commutativity $h * x = x * h$. Reindexing $j = n - k$ in Eq. 5 gives the equivalent form $(h * x)[n] = \sum_j x[(n - j) \bmod N]\, h[j]$, which is the same formula with the roles of $h$ and $x$ swapped. Each $y[n]$ is a sum of pairwise products of radially aligned values — it does not matter which factor sits on which ring. In matrix language: $C_h\, x = C_x\, h$.

Now comes the fundamental algebraic fact. What happens when we compose two circulants — apply $C_h$ first, then $C_g$? Expand both as polynomials in $S$, using index $k$ for $C_h$ and index $l$ for $C_g$:

$$ C_g \, C_h = \left(\sum_{l=0}^{N-1} g_l \, S^l\right)\left(\sum_{k=0}^{N-1} h_k \, S^k\right) = \sum_{l=0}^{N-1}\sum_{k=0}^{N-1} g_l \, h_k \, S^{l+k} $$

This is a double sum over all pairs $(l, k)$, each contributing a term $g_l h_k$ multiplied by $S^{l+k}$. Since $S^N = I$, we have $S^{l+k} = S^{(l+k) \bmod N}$, so terms with different $(l,k)$ pairs can land on the same power of $S$. Collect all terms whose exponents agree modulo $N$: for each target power $m = 0, 1, \ldots, N-1$, gather every pair $(l, k)$ with $l + k \equiv m \pmod{N}$:

$$ C_g \, C_h = \sum_{m=0}^{N-1} \underbrace{\left(\sum_{l=0}^{N-1} g_l \, h_{(m - l) \bmod N}\right)}_{\text{coefficient of } S^m} S^m $$

The coefficient of $S^m$ in the product is $\sum_l g_l \, h_{(m-l) \bmod N}$. But this is exactly $(g * h)[m]$ — the $m$-th component of the circular convolution of $g$ with $h$ (Eq. 5). So the product is:

$$ C_g \, C_h = \sum_{m=0}^{N-1} (g * h)[m] \, S^m = C_{g * h} $$

The product of two circulants is the circulant of the convolved kernels. Composing two convolution filters is itself a convolution, with a kernel given by convolving the individual kernels. The circulant algebra is closed under multiplication.

Notice that the double sum $\sum_l \sum_k g_l h_k S^{l+k}$ is symmetric in $g$ and $h$ — swapping the two factors just relabels the dummy indices. Therefore:

Circulants commute $$ C_g \, C_h = C_{g*h} = C_{h*g} = C_h \, C_g \quad \text{for all kernels } g, h \in \C^N \tag{6} $$

The order in which you apply two convolution filters does not matter. This follows directly from the polynomial structure: $p(S) \cdot q(S) = q(S) \cdot p(S)$ for any polynomials $p, q$, because we are multiplying in the ring $\C[S]/(S^N - I)$, which is commutative. The set of all $N \times N$ circulant matrices forms a commutative algebra — a vector space closed under multiplication, with all elements commuting.

Worked example: the delta signal. Apply $C_h$ to the delta signal $\delta_0 = (1, 0, 0, 0, 0, 0, 0, 0)^T$ on $\Z_8$:

$$ C_h \, \delta_0 = \begin{pmatrix} 3\\1\\0\\0\\0\\0\\0\\2 \end{pmatrix} $$

The output is the first column of $C_h$, which is exactly the kernel $h$ itself (since $(C_h)_{i,0} = h_{(i-0) \bmod 8} = h_i$). This is the discrete analogue of convolving with a delta function: you recover the impulse response. It also confirms that $C_h$ encodes the kernel faithfully — the kernel is the response to a single impulse at position 0.

We can now state the algebraic characterization precisely:

Algebraic Characterization of Convolution

A linear operator $M: \C^N \to \C^N$ is a circular convolution if and only if $M$ is a polynomial in the shift operator $S$. Equivalently, $M$ is a circular convolution if and only if the matrix of $M$ is circulant (every row is a cyclic shift of the first row).

This characterization is purely algebraic — no Fourier transform, no frequencies, no complex exponentials. It says: convolution = polynomial in $S$. In the next chapter, we will discover that this algebraic structure, combined with the simultaneous diagonalization theorem, forces the DFT into existence.

Here is the question that drives the rest of the course: these circulant matrices all commute and are all polynomials in $S$. Linear algebra tells us that commuting diagonalizable matrices can be simultaneously diagonalized — they share a common eigenbasis. What is that eigenbasis? Finding it will give us the DFT.