Chapter 5

FNO on Real PDEs

Burgers' equation, Darcy flow benchmarks, Navier–Stokes, zero-shot super-resolution, and an honest look at what FNO can and can't do.

In this chapter
  1. 23Burgers' Equation (1D)
  2. 24Darcy Flow (2D) — Our Running Example
  3. 25Navier–Stokes (2D + Time)
  4. 26Super-Resolution and Inverse Problems
  5. 27What FNO Can and Can't Do

23Burgers' Equation (1D)

The simplest benchmark in the paper is the 1D viscous Burgers' equation, a nonlinear PDE that develops steep gradients (near-shocks) as viscosity decreases:

Burgers' Equation
$$ \frac{\partial u}{\partial t} + \frac{\partial}{\partial x}\!\left(\frac{u^2}{2}\right) = \nu \frac{\partial^2 u}{\partial x^2}, \qquad x \in (0, 1), \;\; t \in (0, 1] \tag{7} $$

with periodic boundary conditions and viscosity $\nu = 0.1$.

The operator learned by FNO is:

$$ G^\dagger: u_0(x) \mapsto u(x, T=1) $$

— mapping the initial condition $u_0$ to the solution at time $T=1$. This is a 1D problem, so the FFT is a 1D FFT, and $k_{\max} = 16$ modes suffice.

Setup: Initial conditions $u_0$ are drawn from a Gaussian random field $\mu = \mathcal{N}(0, 625(-\Delta + 25I)^{-2})$. Training: 1000 pairs, testing: 200 pairs, grid: 8192 points (subsampled to various resolutions for training).

Results: FNO achieves a relative $L^2$ test error of $0.0018$ (0.18%) — excellent for this smooth problem. For comparison, a fully-connected network gets $0.0154$ and a DeepONet variant gets $0.0028$. The 1D case is the easiest benchmark; the real test comes with higher-dimensional problems.


24Darcy Flow (2D) — Our Running Example

This is the PDE we've been building toward all course. Now we see the full experimental setup from the paper.

PDE: $-\nabla\cdot(a(x)\nabla u(x)) = 1$ on $D = (0,1)^2$ with $u|_{\partial D} = 0$ (as in §1).

Coefficient distribution: The permeability field $a(x)$ is generated by pushing a Gaussian random field through a binary thresholding function:

$$ a \sim \psi_\#\, \mathcal{N}\!\left(0,\; (-\Delta + 9I)^{-2}\right), \qquad \psi(z) = \begin{cases} 12 & z \geq 0 \\ 3 & z < 0 \end{cases} $$

This produces piecewise-constant permeability fields that take only two values (12 or 3), with smooth random boundaries between regions — exactly the checkerboard-like fields we visualized in Chapter 1.

Training: $N = 1000$ input–output pairs at resolution $85 \times 85$ (subsampled from $421 \times 421$ solutions). FNO uses $\dv = 32$, $k_{\max} = 12$, $T = 4$ layers.

MethodRelative $L^2$ ErrorParameters
NN (fully connected)$0.0394$297K
RBM (random basis)$0.0275$339K
FCN (fully convolutional)$0.0299$480K
U-Net$0.0245$~500K
FNO-2D$\mathbf{0.0108}$928K

FNO cuts the error by more than half compared to U-Net. The advantage comes from the inductive bias: FNO is designed for operator learning (function-to-function), while U-Net treats it as an image-to-image regression without resolution invariance.

Resolution transfer

The paper's most striking result: an FNO trained on $85 \times 85$ data is evaluated without retraining at higher resolutions:

Test Resolution$85 \times 85$$141 \times 141$$211 \times 211$$421 \times 421$
FNO error$0.0108$$0.0098$$0.0098$$0.0098$

Error actually decreases at higher resolution — the finer grid better approximates the continuous function that FNO is really learning. This is resolution invariance in action: the learned tensor $R$ acts on the same Fourier modes regardless of grid size.


25Navier–Stokes (2D + Time)

The most challenging benchmark: the 2D Navier–Stokes equations in vorticity form on a periodic torus $D = (0,1)^2$:

Navier–Stokes (Vorticity Form)
$$ \frac{\partial w}{\partial t} + u \cdot \nabla w = \nu\, \Delta w + f(x), \tag{8} $$ $$ \nabla \cdot u = 0, \qquad w = \nabla \times u, \tag{9} $$

where $w$ is vorticity, $u$ is velocity, $\nu$ is viscosity, and $f(x) = 0.1(\sin(2\pi(x_1+x_2)) + \cos(2\pi(x_1+x_2)))$ is a fixed forcing.

The operator maps vorticity over a past window to vorticity in the future:

$$ G^\dagger: w\big|_{[0,\,10]} \;\mapsto\; w\big|_{(10,\,T]} \tag{10} $$

The paper tests two FNO variants:

Method$\nu = 10^{-3}$$\nu = 10^{-4}$$\nu = 10^{-5}$
FNO-2D ($T = 50$)$0.0086$$0.0820$$0.1893$
FNO-3D ($T = 50$)$\mathbf{0.0056}$$\mathbf{0.0556}$$\mathbf{0.1556}$
U-Net$0.0245$$0.2048$$0.1982$
ResNet$0.0701$$0.2311$$0.2753$

At low viscosity ($\nu = 10^{-3}$), the flow is smooth and FNO excels with $< 1\%$ error. As viscosity decreases ($\nu = 10^{-5}$), the flow becomes more turbulent with finer structures, and errors grow to $\sim 15\%$. This is expected: turbulence populates high-frequency modes that FNO's mode truncation discards.


26Super-Resolution and Inverse Problems

Zero-shot super-resolution

Because FNO's parameters (the tensor $R$ and the matrix $W$) are defined independently of the grid, an FNO trained at one resolution can be evaluated at a completely different resolution with no modifications. The paper demonstrates this by training on coarse grids and testing on fine grids.

Why it works mechanically: $R$ acts on Fourier modes $k = 0, 1, \ldots, k_{\max}-1$. On a $64 \times 64$ grid, mode $k=5$ is the same frequency as mode $k=5$ on a $256 \times 256$ grid — it's $5/(2\pi)$ oscillations across the domain. The same $R(k)$ matrix applies at both resolutions. The FFT produces more modes on the finer grid, but FNO only touches the low ones.

Key idea: Resolution invariance isn't just a nice theory — it's practically useful. Train cheaply on coarse grids, deploy on fine grids. No retraining, no architectural changes. This is impossible for CNNs or MLPs, whose weight dimensions are tied to the input size.

Bayesian inverse problems

A powerful application: use FNO as a fast surrogate for the forward model in Bayesian inference. The paper considers the Darcy flow inverse problem: given observations of the pressure $u$ at sparse locations, infer the permeability field $a$.

Bayesian inference requires evaluating the forward model $G^\dagger(a)$ thousands of times (for MCMC sampling). Using a traditional PDE solver, this takes $\sim 18$ hours. Using FNO as the forward model: $\sim 2.5$ minutes — a 430× speedup with negligible loss of accuracy.


27What FNO Can and Can't Do

Strengths

Limitations

Comparison with other methods

PropertyFNOPINNsDeepONet
LearnsOperator (family of PDEs)Solution (one PDE instance)Operator (family of PDEs)
Training dataInput–output pairs from solverNone (PDE residual as loss)Input–output pairs from solver
InferenceOne forward passRetraining per instanceOne forward pass
ResolutionInvariant (Fourier modes)Fixed (collocation points)Flexible (branch evaluates anywhere)
ArchitectureIterative Fourier layersStandard MLP/CNNBranch + trunk networks
Best forSmooth, periodic-ish PDEs on regular gridsData-scarce, one-off PDE instancesGeneral operators, irregular domains
WeaknessPeriodic BCs, uniform grids, needs dataSlow training, failure modesLess efficient global communication

The big picture: FNO is a spectral method that learned its own basis functions. Classical spectral methods expand solutions in fixed bases (Fourier, Chebyshev, Legendre) and solve for coefficients. FNO expands in Fourier modes but learns how modes interact through the tensor $R$. The modes are fixed, but the mixing is learned from data. This is why FNO inherits both the power and the limitations of spectral methods: extraordinary accuracy for smooth problems, but vulnerability to Gibbs-like phenomena near discontinuities.

This concludes our breakdown of the FNO paper. The core ideas — operator learning, the neural operator framework, and the Fourier parameterization — have spawned a large family of follow-up work: Geo-FNO (irregular geometries), U-FNO (multiscale), FFNO (factorized), and adaptive approaches that dynamically choose which modes to retain. The foundation you've built here provides the vocabulary and intuition to read all of them.