FNO on Real PDEs — FNO Course

In this chapter

23Burgers' Equation (1D)
24Darcy Flow (2D) — Our Running Example
25Navier–Stokes (2D + Time)
26Super-Resolution and Inverse Problems
27What FNO Can and Can't Do

23Burgers' Equation (1D)

The simplest benchmark in the paper is the 1D viscous Burgers' equation, a nonlinear PDE that develops steep gradients (near-shocks) as viscosity decreases:

Burgers' Equation

$$ \frac{\partial u}{\partial t} + \frac{\partial}{\partial x}\!\left(\frac{u^2}{2}\right) = \nu \frac{\partial^2 u}{\partial x^2}, \qquad x \in (0, 1), \;\; t \in (0, 1] \tag{7} $$

with periodic boundary conditions and viscosity $\nu = 0.1$.

The operator learned by FNO is:

$$ G^\dagger: u_0(x) \mapsto u(x, T=1) $$

— mapping the initial condition $u_0$ to the solution at time $T=1$. This is a 1D problem, so the FFT is a 1D FFT, and $k_{\max} = 16$ modes suffice.

Setup: Initial conditions $u_0$ are drawn from a Gaussian random field $\mu = \mathcal{N}(0, 625(-\Delta + 25I)^{-2})$. Training: 1000 pairs, testing: 200 pairs, grid: 8192 points (subsampled to various resolutions for training).

Results: FNO achieves a relative $L^2$ test error of $0.0018$ (0.18%) — excellent for this smooth problem. For comparison, a fully-connected network gets $0.0154$ and a DeepONet variant gets $0.0028$. The 1D case is the easiest benchmark; the real test comes with higher-dimensional problems.

24Darcy Flow (2D) — Our Running Example

This is the PDE we've been building toward all course. Now we see the full experimental setup from the paper.

PDE: $-\nabla\cdot(a(x)\nabla u(x)) = 1$ on $D = (0,1)^2$ with $u|_{\partial D} = 0$ (as in §1).

Coefficient distribution: The permeability field $a(x)$ is generated by pushing a Gaussian random field through a binary thresholding function:

$$ a \sim \psi_\#\, \mathcal{N}\!\left(0,\; (-\Delta + 9I)^{-2}\right), \qquad \psi(z) = \begin{cases} 12 & z \geq 0 \\ 3 & z < 0 \end{cases} $$

This produces piecewise-constant permeability fields that take only two values (12 or 3), with smooth random boundaries between regions — exactly the checkerboard-like fields we visualized in Chapter 1.

Training: $N = 1000$ input–output pairs at resolution $85 \times 85$ (subsampled from $421 \times 421$ solutions). FNO uses $\dv = 32$, $k_{\max} = 12$, $T = 4$ layers.

Method	Relative $L^2$ Error	Parameters
NN (fully connected)	$0.0394$	297K
RBM (random basis)	$0.0275$	339K
FCN (fully convolutional)	$0.0299$	480K
U-Net	$0.0245$	~500K
FNO-2D	$\mathbf{0.0108}$	928K

FNO cuts the error by more than half compared to U-Net. The advantage comes from the inductive bias: FNO is designed for operator learning (function-to-function), while U-Net treats it as an image-to-image regression without resolution invariance.

Resolution transfer

The paper's most striking result: an FNO trained on $85 \times 85$ data is evaluated without retraining at higher resolutions:

Test Resolution	$85 \times 85$	$141 \times 141$	$211 \times 211$	$421 \times 421$
FNO error	$0.0108$	$0.0098$	$0.0098$	$0.0098$

Error actually decreases at higher resolution — the finer grid better approximates the continuous function that FNO is really learning. This is resolution invariance in action: the learned tensor $R$ acts on the same Fourier modes regardless of grid size.

25Navier–Stokes (2D + Time)

The most challenging benchmark: the 2D Navier–Stokes equations in vorticity form on a periodic torus $D = (0,1)^2$:

Navier–Stokes (Vorticity Form)

$$ \frac{\partial w}{\partial t} + u \cdot \nabla w = \nu\, \Delta w + f(x), \tag{8} $$ $$ \nabla \cdot u = 0, \qquad w = \nabla \times u, \tag{9} $$

where $w$ is vorticity, $u$ is velocity, $\nu$ is viscosity, and $f(x) = 0.1(\sin(2\pi(x_1+x_2)) + \cos(2\pi(x_1+x_2)))$ is a fixed forcing.

The operator maps vorticity over a past window to vorticity in the future:

$$ G^\dagger: w\big|_{[0,\,10]} \;\mapsto\; w\big|_{(10,\,T]} \tag{10} $$

The paper tests two FNO variants:

FNO-2D: Treats the 10 input time steps as channels ($\da = 10$). The FFT is spatial-only (2D). This is simpler but loses temporal structure.
FNO-3D: Applies a 3D FFT over space and time jointly. More expensive, but captures spatiotemporal correlations directly.

Method	$\nu = 10^{-3}$	$\nu = 10^{-4}$	$\nu = 10^{-5}$
FNO-2D ($T = 50$)	$0.0086$	$0.0820$	$0.1893$
FNO-3D ($T = 50$)	$\mathbf{0.0056}$	$\mathbf{0.0556}$	$\mathbf{0.1556}$
U-Net	$0.0245$	$0.2048$	$0.1982$
ResNet	$0.0701$	$0.2311$	$0.2753$

At low viscosity ($\nu = 10^{-3}$), the flow is smooth and FNO excels with $< 1\%$ error. As viscosity decreases ($\nu = 10^{-5}$), the flow becomes more turbulent with finer structures, and errors grow to $\sim 15\%$. This is expected: turbulence populates high-frequency modes that FNO's mode truncation discards.

26Super-Resolution and Inverse Problems

Zero-shot super-resolution

Because FNO's parameters (the tensor $R$ and the matrix $W$) are defined independently of the grid, an FNO trained at one resolution can be evaluated at a completely different resolution with no modifications. The paper demonstrates this by training on coarse grids and testing on fine grids.

Why it works mechanically: $R$ acts on Fourier modes $k = 0, 1, \ldots, k_{\max}-1$. On a $64 \times 64$ grid, mode $k=5$ is the same frequency as mode $k=5$ on a $256 \times 256$ grid — it's $5/(2\pi)$ oscillations across the domain. The same $R(k)$ matrix applies at both resolutions. The FFT produces more modes on the finer grid, but FNO only touches the low ones.

Key idea: Resolution invariance isn't just a nice theory — it's practically useful. Train cheaply on coarse grids, deploy on fine grids. No retraining, no architectural changes. This is impossible for CNNs or MLPs, whose weight dimensions are tied to the input size.

Bayesian inverse problems

A powerful application: use FNO as a fast surrogate for the forward model in Bayesian inference. The paper considers the Darcy flow inverse problem: given observations of the pressure $u$ at sparse locations, infer the permeability field $a$.

Bayesian inference requires evaluating the forward model $G^\dagger(a)$ thousands of times (for MCMC sampling). Using a traditional PDE solver, this takes $\sim 18$ hours. Using FNO as the forward model: $\sim 2.5$ minutes — a 430× speedup with negligible loss of accuracy.

27What FNO Can and Can't Do

Strengths

Speed: 1000× faster inference than traditional PDE solvers once trained. One forward pass takes milliseconds.
Resolution invariance: Train at one resolution, evaluate at another. Parameters don't depend on the grid.
Accuracy on smooth problems: State-of-the-art for elliptic PDEs (Darcy) and low-Reynolds-number flows.
Data efficiency relative to performance: With 1000 training examples, FNO outperforms all baselines by a significant margin.

Limitations

Periodic boundary assumption: The FFT implicitly assumes periodic boundaries. For non-periodic problems (like our Darcy flow with $u=0$ boundaries), the $W$ path provides a partial fix, but there's a fundamental mismatch. Later work (e.g., Fourier continuation) addresses this.
Struggles with shocks and discontinuities: Mode truncation is a low-pass filter. Problems with sharp features (shocks, contact discontinuities, fractures) lose critical high-frequency information.
Data-hungry training: Needs 1000+ PDE solves for training data. Generating this data is expensive — you need a working traditional solver to bootstrap FNO.
Mode truncation loses fine detail: With $k_{\max} = 12$, anything finer than $\sim 1/12$ of the domain length is invisible to the Fourier path. High-$k$ physics (turbulent cascades, microstructure) is poorly captured.
Uniform grids only: Standard FNO requires data on a uniform grid for the FFT. Unstructured meshes and irregular geometries need different approaches (see GNO, Geo-FNO).

Comparison with other methods

Property	FNO	PINNs	DeepONet
Learns	Operator (family of PDEs)	Solution (one PDE instance)	Operator (family of PDEs)
Training data	Input–output pairs from solver	None (PDE residual as loss)	Input–output pairs from solver
Inference	One forward pass	Retraining per instance	One forward pass
Resolution	Invariant (Fourier modes)	Fixed (collocation points)	Flexible (branch evaluates anywhere)
Architecture	Iterative Fourier layers	Standard MLP/CNN	Branch + trunk networks
Best for	Smooth, periodic-ish PDEs on regular grids	Data-scarce, one-off PDE instances	General operators, irregular domains
Weakness	Periodic BCs, uniform grids, needs data	Slow training, failure modes	Less efficient global communication

The big picture: FNO is a spectral method that learned its own basis functions. Classical spectral methods expand solutions in fixed bases (Fourier, Chebyshev, Legendre) and solve for coefficients. FNO expands in Fourier modes but learns how modes interact through the tensor $R$. The modes are fixed, but the mixing is learned from data. This is why FNO inherits both the power and the limitations of spectral methods: extraordinary accuracy for smooth problems, but vulnerability to Gibbs-like phenomena near discontinuities.

This concludes our breakdown of the FNO paper. The core ideas — operator learning, the neural operator framework, and the Fourier parameterization — have spawned a large family of follow-up work: Geo-FNO (irregular geometries), U-FNO (multiscale), FFNO (factorized), and adaptive approaches that dynamically choose which modes to retain. The foundation you've built here provides the vocabulary and intuition to read all of them.