23Burgers' Equation (1D)
The simplest benchmark in the paper is the 1D viscous Burgers' equation, a nonlinear PDE that develops steep gradients (near-shocks) as viscosity decreases:
with periodic boundary conditions and viscosity $\nu = 0.1$.
The operator learned by FNO is:
$$ G^\dagger: u_0(x) \mapsto u(x, T=1) $$— mapping the initial condition $u_0$ to the solution at time $T=1$. This is a 1D problem, so the FFT is a 1D FFT, and $k_{\max} = 16$ modes suffice.
Setup: Initial conditions $u_0$ are drawn from a Gaussian random field $\mu = \mathcal{N}(0, 625(-\Delta + 25I)^{-2})$. Training: 1000 pairs, testing: 200 pairs, grid: 8192 points (subsampled to various resolutions for training).
Results: FNO achieves a relative $L^2$ test error of $0.0018$ (0.18%) — excellent for this smooth problem. For comparison, a fully-connected network gets $0.0154$ and a DeepONet variant gets $0.0028$. The 1D case is the easiest benchmark; the real test comes with higher-dimensional problems.
24Darcy Flow (2D) — Our Running Example
This is the PDE we've been building toward all course. Now we see the full experimental setup from the paper.
PDE: $-\nabla\cdot(a(x)\nabla u(x)) = 1$ on $D = (0,1)^2$ with $u|_{\partial D} = 0$ (as in §1).
Coefficient distribution: The permeability field $a(x)$ is generated by pushing a Gaussian random field through a binary thresholding function:
$$ a \sim \psi_\#\, \mathcal{N}\!\left(0,\; (-\Delta + 9I)^{-2}\right), \qquad \psi(z) = \begin{cases} 12 & z \geq 0 \\ 3 & z < 0 \end{cases} $$This produces piecewise-constant permeability fields that take only two values (12 or 3), with smooth random boundaries between regions — exactly the checkerboard-like fields we visualized in Chapter 1.
Training: $N = 1000$ input–output pairs at resolution $85 \times 85$ (subsampled from $421 \times 421$ solutions). FNO uses $\dv = 32$, $k_{\max} = 12$, $T = 4$ layers.
| Method | Relative $L^2$ Error | Parameters |
|---|---|---|
| NN (fully connected) | $0.0394$ | 297K |
| RBM (random basis) | $0.0275$ | 339K |
| FCN (fully convolutional) | $0.0299$ | 480K |
| U-Net | $0.0245$ | ~500K |
| FNO-2D | $\mathbf{0.0108}$ | 928K |
FNO cuts the error by more than half compared to U-Net. The advantage comes from the inductive bias: FNO is designed for operator learning (function-to-function), while U-Net treats it as an image-to-image regression without resolution invariance.
Resolution transfer
The paper's most striking result: an FNO trained on $85 \times 85$ data is evaluated without retraining at higher resolutions:
| Test Resolution | $85 \times 85$ | $141 \times 141$ | $211 \times 211$ | $421 \times 421$ |
|---|---|---|---|---|
| FNO error | $0.0108$ | $0.0098$ | $0.0098$ | $0.0098$ |
Error actually decreases at higher resolution — the finer grid better approximates the continuous function that FNO is really learning. This is resolution invariance in action: the learned tensor $R$ acts on the same Fourier modes regardless of grid size.
25Navier–Stokes (2D + Time)
The most challenging benchmark: the 2D Navier–Stokes equations in vorticity form on a periodic torus $D = (0,1)^2$:
where $w$ is vorticity, $u$ is velocity, $\nu$ is viscosity, and $f(x) = 0.1(\sin(2\pi(x_1+x_2)) + \cos(2\pi(x_1+x_2)))$ is a fixed forcing.
The operator maps vorticity over a past window to vorticity in the future:
$$ G^\dagger: w\big|_{[0,\,10]} \;\mapsto\; w\big|_{(10,\,T]} \tag{10} $$The paper tests two FNO variants:
- FNO-2D: Treats the 10 input time steps as channels ($\da = 10$). The FFT is spatial-only (2D). This is simpler but loses temporal structure.
- FNO-3D: Applies a 3D FFT over space and time jointly. More expensive, but captures spatiotemporal correlations directly.
| Method | $\nu = 10^{-3}$ | $\nu = 10^{-4}$ | $\nu = 10^{-5}$ |
|---|---|---|---|
| FNO-2D ($T = 50$) | $0.0086$ | $0.0820$ | $0.1893$ |
| FNO-3D ($T = 50$) | $\mathbf{0.0056}$ | $\mathbf{0.0556}$ | $\mathbf{0.1556}$ |
| U-Net | $0.0245$ | $0.2048$ | $0.1982$ |
| ResNet | $0.0701$ | $0.2311$ | $0.2753$ |
At low viscosity ($\nu = 10^{-3}$), the flow is smooth and FNO excels with $< 1\%$ error. As viscosity decreases ($\nu = 10^{-5}$), the flow becomes more turbulent with finer structures, and errors grow to $\sim 15\%$. This is expected: turbulence populates high-frequency modes that FNO's mode truncation discards.
26Super-Resolution and Inverse Problems
Zero-shot super-resolution
Because FNO's parameters (the tensor $R$ and the matrix $W$) are defined independently of the grid, an FNO trained at one resolution can be evaluated at a completely different resolution with no modifications. The paper demonstrates this by training on coarse grids and testing on fine grids.
Why it works mechanically: $R$ acts on Fourier modes $k = 0, 1, \ldots, k_{\max}-1$. On a $64 \times 64$ grid, mode $k=5$ is the same frequency as mode $k=5$ on a $256 \times 256$ grid — it's $5/(2\pi)$ oscillations across the domain. The same $R(k)$ matrix applies at both resolutions. The FFT produces more modes on the finer grid, but FNO only touches the low ones.
Key idea: Resolution invariance isn't just a nice theory — it's practically useful. Train cheaply on coarse grids, deploy on fine grids. No retraining, no architectural changes. This is impossible for CNNs or MLPs, whose weight dimensions are tied to the input size.
Bayesian inverse problems
A powerful application: use FNO as a fast surrogate for the forward model in Bayesian inference. The paper considers the Darcy flow inverse problem: given observations of the pressure $u$ at sparse locations, infer the permeability field $a$.
Bayesian inference requires evaluating the forward model $G^\dagger(a)$ thousands of times (for MCMC sampling). Using a traditional PDE solver, this takes $\sim 18$ hours. Using FNO as the forward model: $\sim 2.5$ minutes — a 430× speedup with negligible loss of accuracy.
27What FNO Can and Can't Do
Strengths
- Speed: 1000× faster inference than traditional PDE solvers once trained. One forward pass takes milliseconds.
- Resolution invariance: Train at one resolution, evaluate at another. Parameters don't depend on the grid.
- Accuracy on smooth problems: State-of-the-art for elliptic PDEs (Darcy) and low-Reynolds-number flows.
- Data efficiency relative to performance: With 1000 training examples, FNO outperforms all baselines by a significant margin.
Limitations
- Periodic boundary assumption: The FFT implicitly assumes periodic boundaries. For non-periodic problems (like our Darcy flow with $u=0$ boundaries), the $W$ path provides a partial fix, but there's a fundamental mismatch. Later work (e.g., Fourier continuation) addresses this.
- Struggles with shocks and discontinuities: Mode truncation is a low-pass filter. Problems with sharp features (shocks, contact discontinuities, fractures) lose critical high-frequency information.
- Data-hungry training: Needs 1000+ PDE solves for training data. Generating this data is expensive — you need a working traditional solver to bootstrap FNO.
- Mode truncation loses fine detail: With $k_{\max} = 12$, anything finer than $\sim 1/12$ of the domain length is invisible to the Fourier path. High-$k$ physics (turbulent cascades, microstructure) is poorly captured.
- Uniform grids only: Standard FNO requires data on a uniform grid for the FFT. Unstructured meshes and irregular geometries need different approaches (see GNO, Geo-FNO).
Comparison with other methods
| Property | FNO | PINNs | DeepONet |
|---|---|---|---|
| Learns | Operator (family of PDEs) | Solution (one PDE instance) | Operator (family of PDEs) |
| Training data | Input–output pairs from solver | None (PDE residual as loss) | Input–output pairs from solver |
| Inference | One forward pass | Retraining per instance | One forward pass |
| Resolution | Invariant (Fourier modes) | Fixed (collocation points) | Flexible (branch evaluates anywhere) |
| Architecture | Iterative Fourier layers | Standard MLP/CNN | Branch + trunk networks |
| Best for | Smooth, periodic-ish PDEs on regular grids | Data-scarce, one-off PDE instances | General operators, irregular domains |
| Weakness | Periodic BCs, uniform grids, needs data | Slow training, failure modes | Less efficient global communication |
The big picture: FNO is a spectral method that learned its own basis functions. Classical spectral methods expand solutions in fixed bases (Fourier, Chebyshev, Legendre) and solve for coefficients. FNO expands in Fourier modes but learns how modes interact through the tensor $R$. The modes are fixed, but the mixing is learned from data. This is why FNO inherits both the power and the limitations of spectral methods: extraordinary accuracy for smooth problems, but vulnerability to Gibbs-like phenomena near discontinuities.
This concludes our breakdown of the FNO paper. The core ideas — operator learning, the neural operator framework, and the Fourier parameterization — have spawned a large family of follow-up work: Geo-FNO (irregular geometries), U-FNO (multiscale), FFNO (factorized), and adaptive approaches that dynamically choose which modes to retain. The foundation you've built here provides the vocabulary and intuition to read all of them.