FNO for Crack Detection

In this chapter

33Parameterizing the Crack Input
34FNO Architecture for Spatiotemporal Wavefields
35Training Data Generation
36Training & Evaluation
37Challenges & What Comes Next

33Parameterizing the Crack Input

In the Darcy flow problem (Chapters 1–5), the input $a(x)$ was a function defined at every grid point — a permeability field with $\da = 1$ per spatial location. For the NDT problem, the input is fundamentally different: we're specifying the geometry of a straight-line crack inside the aluminum specimen. Four numbers fully determine it.

Crack parameterization

$$ a = (x_c,\; y_c,\; L,\; \theta) \;\in\; \R^4 \tag{16} $$

where $(x_c, y_c)$ is the crack center in mm, $L$ is the crack length in mm, and $\theta$ is the orientation angle in degrees measured from vertical (positive = clockwise tilt). The crack endpoints are:

$$ \mathbf{p}_{\pm} = \begin{pmatrix} x_c \\ y_c \end{pmatrix} \pm \frac{L}{2} \begin{pmatrix} \sin\theta \\ \cos\theta \end{pmatrix} $$

This parameterization has several properties that make it the right choice for this problem:

Compact: $\da = 4$ numbers vs. $128 \times 128 = 16{,}384$ for a field encoding. The FNO never sees a grid of mostly-zeros.
Interpretable: Every parameter directly answers a question an NDT practitioner asks — "Where is the crack?", "How big is it?", "Which way is it tilted?"
Invertible: When we later use the FNO for Bayesian inversion (§37), the posterior $p(a \mid V_{\text{obs}})$ lives in $\R^4$, which is easy to sample via MCMC. Inverting a 16K-dimensional binary field would be far harder.
Easy to sample: We can draw uniformly from physically realistic ranges without worrying about the combinatorics of placing a thin line on a grid.

The straight-line assumption is well-matched to fatigue cracks in metals, which nucleate at stress concentrators and grow along planes of maximum shear stress — producing approximately straight, planar flaws. More complex morphologies (branching, curved, or multiple interacting cracks) would require either additional parameters or a switch to a field encoding, but for the canonical NDT inspection problem, four numbers suffice.

The sampling distribution $\mu$

The distribution from which we draw training crack configurations defines the region of parameter space the FNO will learn to cover. Each parameter is sampled independently:

Parameter	Distribution	Range	Physical rationale
$x_c$	Uniform	[5 mm, 30 mm]	Stay 5 mm from side boundaries (absorbing BC artifacts)
$y_c$	Uniform	[5 mm, 27 mm]	Stay 5 mm from top/bottom surfaces
$L$	Uniform	[1 mm, 8 mm]	$L/\lambda_s$ from 0.5 to 4 (sub-wavelength to multi-wavelength)
$\theta$	Uniform	[$-30°$, $+30°$]	Fatigue cracks grow roughly perpendicular to surface

The length range deserves attention. The S-wave wavelength is $\lambda_s = \cs / f_0 = 3120 / 1.5 \times 10^6 \approx 2.1\;\text{mm}$. A 1 mm crack is sub-wavelength ($L/\lambda_s \approx 0.5$) — it scatters weakly and is hard to detect. An 8 mm crack is roughly $4\lambda_s$ — it produces strong specular reflections and well-separated tip diffraction signals. This range covers the physically interesting regime where scattering transitions from "barely detectable" to "geometrically resolvable."

Figure 7.1. Five sample crack configurations drawn from the training distribution $\mu$. Each crack is fully specified by $(x_c, y_c, L, \theta)$. Labels show orientation and length (mm). The sampling region (dashed) keeps crack centers away from boundary artifacts.

Key idea: Unlike Darcy flow where the input is a spatially-varying field, the NDT input is a compact 4-vector. This changes how the lifting operator $P$ works: instead of acting pointwise on a function $a(x)$, it must broadcast the same 4 numbers to every point on the computational grid. The FNO then learns how those 4 numbers control the spatiotemporal wavefield everywhere.

34FNO Architecture for Spatiotemporal Wavefields

With the crack parameterization fixed at $a = (x_c, y_c, L, \theta) \in \R^4$, the architecture question is: what does the FNO predict? We develop two complete pipelines, both taking the same parametric input but targeting different outputs. Each has its own architecture, its own trade-offs, and its own applications.

Pipeline 1: Signal-level FNO — crack parameters $\to$ transducer voltage $V(t)$

The most direct formulation for NDT: predict the time-domain voltage signal that the transducer would record for a given crack. This is a 1D output — a single real number at each of $N_t$ time steps.

The operator:

$$ G^\dagger_{\text{sig}}: (x_c, y_c, L, \theta) \;\mapsto\; V(t), \qquad V: [0,\, T] \to \R $$

Input construction. We have $\da = 4$ crack parameters and a time grid $\{t_j\}_{j=1}^{N_t}$ with $N_t = 1024$ points spanning $[0,\, 20\;\mu\text{s}]$. At each time point, concatenate the time coordinate with the crack parameters:

$$ \text{input at time } t_j: \quad \bigl[\,t_j,\;\; x_c,\;\; y_c,\;\; L,\;\; \theta\,\bigr] \;\in\; \R^5 $$

The full input tensor is $\in \R^{1024 \times 5}$ — each row is the same 4 crack parameters appended to a different time coordinate. This is a standard technique for conditioning FNO on global parameters: broadcast the parameters to every grid point so the network has access to them everywhere.

Forward pass, step by step:

Lift: The pointwise lifting operator $P: \R^5 \to \R^{64}$ maps each 5-vector to a 64-channel representation. Output: $v^{(0)} \in \R^{1024 \times 64}$.
Fourier layer 1: Apply 1D FFT along the time axis. Multiply the lowest $k_{\max} = 32$ modes by $R_1 \in \mathbb{C}^{64 \times 64 \times 32}$. Inverse FFT back to physical space. Add the local path $W_1 v^{(0)}$, where $W_1 \in \R^{64 \times 64}$. Apply GELU activation. Output: $v^{(1)} \in \R^{1024 \times 64}$.
Fourier layers 2–4: Identical structure. Each layer has its own $R_\ell$ and $W_\ell$. Output after layer 4: $v^{(4)} \in \R^{1024 \times 64}$.
Project: The pointwise projection $Q: \R^{64} \to \R^1$ maps each 64-channel vector to a single voltage value. Output: $\hat{V} \in \R^{1024}$.

Signal-level FNO — forward pass

$$ \hat{V}(t) = Q \circ \bigl(\sigma(W_4 + \Fti(R_4 \cdot \Ft))\bigr) \circ \cdots \circ \bigl(\sigma(W_1 + \Fti(R_1 \cdot \Ft))\bigr) \circ P\bigl([t,\; a]\bigr) \tag{17} $$

with $P: \R^5 \to \R^{64}$, each $R_\ell \in \mathbb{C}^{64 \times 64 \times 32}$, each $W_\ell \in \R^{64 \times 64}$, $Q: \R^{64} \to \R^1$, and $\sigma$ = GELU.

Figure 7.2. Signal-level FNO (Pipeline 1). The 4 crack parameters are concatenated with a time coordinate at each of 1024 time points, lifted to 64 channels, processed through 4 Fourier layers (1D FFT, $k_{\max} = 32$), and projected to the predicted voltage signal.

Parameter count. Each Fourier layer contributes: $R_\ell$ has $64 \times 64 \times 32 \times 2 = 262{,}144$ real parameters (complex tensor, factor of 2), and $W_\ell$ has $64 \times 64 = 4{,}096$. With 4 layers plus $P$ ($5 \times 64 = 320$) and $Q$ ($64 \times 1 = 64$), the total is approximately 1.07M parameters.

Key idea: The signal-level FNO is a function-to-function map from the time axis to itself, conditioned on 4 global parameters. The 1D FFT captures temporal correlations in the signal — for instance, the echo from a deeper crack arrives later (shifted in time) and is weaker (lower amplitude), which is a smooth function of $(x_c, y_c)$. The Fourier layers learn this time-shift structure naturally.

Pipeline 2: Full-field FNO-3D — crack parameters $\to$ velocity wavefield $\vv(x,y,t)$

The full-field formulation predicts the complete spatiotemporal velocity wavefield — two velocity components $(v_x, v_y)$ at every spatial point and every time step. This is vastly more information than the transducer signal, and it's what you need for advanced applications like wavefield imaging, full waveform inversion, or validating the physics learned by the network.

The operator:

$$ G^\dagger_{\text{field}}: (x_c, y_c, L, \theta) \;\mapsto\; \vv(\mathbf{x}, t), \qquad \vv: \Omega \times [0,\,T] \to \R^2 $$

Input construction. This is where the parametric approach creates a design challenge. The output lives on a 3D grid $(x_i, y_j, t_k)$, but the input is just 4 numbers. We need to build an input tensor that the 3D Fourier layers can operate on. The standard approach: at every grid point, concatenate the spatial coordinates, the time coordinate, and the crack parameters:

$$ \text{input at } (x_i, y_j, t_k): \quad \bigl[\,x_i,\;\; y_j,\;\; t_k,\;\; x_c,\;\; y_c,\;\; L,\;\; \theta\,\bigr] \;\in\; \R^7 $$

The crack parameters are broadcast (copied identically) to every grid point. The spatial and temporal coordinates vary across the grid, giving the network positional information. The full input tensor is $\in \R^{N_x \times N_y \times N_t \times 7}$.

For our NDT problem with $N_x = N_y = 128$ spatial points and $N_t = 64$ time snapshots (subsampled from the full simulation):

Forward pass, step by step:

Lift: $P: \R^7 \to \R^{48}$ maps each 7-vector pointwise to a 48-channel representation. Output: $v^{(0)} \in \R^{128 \times 128 \times 64 \times 48}$.
Fourier layer 1: Apply 3D FFT over $(x, y, t)$. The transform produces a 3D array of Fourier coefficients. Retain only the lowest modes: $k_x \leq 12$, $k_y \leq 12$, $k_t \leq 16$. Multiply these retained coefficients by $R_1 \in \mathbb{C}^{48 \times 48 \times 12 \times 12 \times 16}$. Inverse 3D FFT back to physical space. Add the local path $W_1 v^{(0)}$ ($W_1 \in \R^{48 \times 48}$). Apply GELU. Output: $v^{(1)} \in \R^{128 \times 128 \times 64 \times 48}$.
Fourier layers 2–4: Same structure. Each layer has independent $R_\ell$ and $W_\ell$. All intermediate representations stay at $\dv = 48$.
Project: $Q: \R^{48} \to \R^2$ maps each 48-vector pointwise to the two velocity components $(v_x, v_y)$. Output: $\hat{\vv} \in \R^{128 \times 128 \times 64 \times 2}$.

Full-field FNO-3D — forward pass

$$ \hat{\vv}(\mathbf{x}, t) = Q \circ \bigl(\sigma(W_4 + \Fti_{3\text{D}}(R_4 \cdot \Ft_{3\text{D}}))\bigr) \circ \cdots \circ \bigl(\sigma(W_1 + \Fti_{3\text{D}}(R_1 \cdot \Ft_{3\text{D}}))\bigr) \circ P\bigl([x, y, t, a]\bigr) \tag{18} $$

with $P: \R^7 \to \R^{48}$, each $R_\ell \in \mathbb{C}^{48 \times 48 \times 12 \times 12 \times 16}$, each $W_\ell \in \R^{48 \times 48}$, $Q: \R^{48} \to \R^2$, and $\sigma$ = GELU. The 3D FFT acts jointly on the spatial and temporal axes.

Figure 7.3. Full-field FNO-3D (Pipeline 2). Crack parameters and grid coordinates form a 7-channel input at each of $128 \times 128 \times 64$ spatiotemporal grid points. 3D FFTs capture joint spatial and temporal correlations. Output is the full velocity field $(v_x, v_y)$.

Parameter count. Each $R_\ell$ has $48 \times 48 \times 12 \times 12 \times 16 \times 2 = 10{,}616{,}832$ real parameters. With 4 layers, plus $W_\ell$ ($4 \times 48^2 = 9{,}216$), $P$ ($7 \times 48 = 336$), and $Q$ ($48 \times 2 = 96$): approximately 42.5M parameters — about 40× larger than the signal-level network.

Memory. The intermediate representation $v^{(\ell)} \in \R^{128 \times 128 \times 64 \times 48}$ contains $128^2 \times 64 \times 48 \approx 50$M floats, or ~200 MB at float32. With activations stored for backpropagation through 4 layers, a single training sample requires roughly 2–4 GB of GPU memory. This limits batch size to 2–4 on a typical 24 GB GPU.

Physics note: Why use a 3D FFT rather than separate spatial and temporal transforms? Because the wavefield has strong spatiotemporal correlations: a wavefront at position $(x, y)$ at time $t$ is tightly linked to its position $(x + \Delta x, y + \Delta y)$ at time $t + \Delta t$. The 3D FFT captures these correlations in a single tensor multiplication. A factorized approach (separate 2D spatial + 1D temporal FFTs) would miss these cross-dimensional patterns, though it can be more efficient — see FFNO in §37.

Why the spatial coordinates matter

A subtle but important point: why does the full-field input include $(x_i, y_j, t_k)$ alongside the crack parameters? Consider what would happen without them. The crack parameters $(x_c, y_c, L, \theta)$ are the same 4 numbers at every grid point. Without coordinates, the network would see an identical 4-vector everywhere — it couldn't distinguish "what should the wavefield look like at this point near the crack" from "what should it look like far away." The coordinates break this symmetry: they tell the network where in the domain it's making a prediction.

In the Darcy problem (Chapters 1–5), coordinates weren't needed because the input function itself varied across the grid — the permeability $a(x)$ was different at every point, providing positional information implicitly. Here, the parametric input is spatially constant, so coordinates must be supplied explicitly.

Extracting $V(t)$ from the full-field prediction

The full-field FNO gives us more than we need for NDT diagnosis. To extract the transducer signal from the full-field prediction, we evaluate the piezoelectric coupling integral over the transducer surface $\Gamma_T$:

$$ \hat{V}(t) = \int_{\Gamma_T} \hat{\vv}(\mathbf{x}, t) \cdot \vn \; d\Gamma $$

In practice, this reduces to averaging the normal velocity component over the transducer face — a simple post-processing step. This means the full-field FNO can reproduce every result the signal-level FNO gives, plus the entire wavefield.

Comparing the two pipelines

Property	Signal-level (Pipeline 1)	Full-field (Pipeline 2)
Operator	$a \mapsto V(t) \in \R$	$a \mapsto \vv(x,y,t) \in \R^2$
FFT type	1D (time only)	3D (space + time)
Input channels $\da$	5 (4 params + $t$)	7 (4 params + $x, y, t$)
Hidden channels $\dv$	64	48
Output channels $\du$	1	2
Mode truncation $k_{\max}$	32	(12, 12, 16)
Grid size	1024	$128 \times 128 \times 64$
Parameters	~1.1M	~42.5M
Memory/sample (training)	~10 MB	~2–4 GB
Batch size (24 GB GPU)	64	2–4
Training time (2000 samples)	~10 min	~6–12 hours
Inference time	~1 ms	~50 ms

When to use which:

Pipeline 1 (signal-level) is the right choice for crack characterization from transducer data — the standard NDT workflow. It's fast to train, cheap to deploy, and its output directly matches what you measure in the lab or field. Use it for Bayesian inversion (§37), real-time sizing, and automated inspection.
Pipeline 2 (full-field) is needed when you want to see what the wave is doing inside the specimen. Applications include: synthetic aperture focusing (SAFT), where you combine multiple wavefield snapshots to create a focused image; full waveform inversion, where you match the entire spatiotemporal wavefield to infer material properties; and model validation, where you compare FNO-predicted wavefield snapshots against COMSOL frame-by-frame to verify the network has learned the physics correctly.

Physics note: The two pipelines are not competitors — they serve different purposes. Pipeline 1 answers "what will the transducer see?", which is sufficient for most inspection tasks. Pipeline 2 answers "what is happening everywhere in the specimen?", which is necessary for imaging and physics validation. A practical workflow might train Pipeline 1 first (fast iteration), then use Pipeline 2 for the cases where spatial understanding is critical.

35Training Data Generation

Both FNO pipelines rely on a FEM solver (COMSOL, or an equivalent like Abaqus or a custom discontinuous Galerkin code) as the data factory. Each simulation takes a crack configuration $a = (x_c, y_c, L, \theta)$ and produces the full wavefield from which we extract either the transducer voltage $V(t)$ or the gridded velocity field $\vv(x,y,t)$ — or both.

Sample crack parameters: Draw $(x_c, y_c, L, \theta)$ from the distribution $\mu$ defined in §33. Generate $N = 2000$ samples.
Build geometry: For each sample, update the COMSOL model: place a zero-thickness fracture boundary at the specified position, length, and orientation. The crack endpoints follow from Eq. 16.
Mesh: Generate a finite element mesh. The mesh must resolve the shortest wavelength — at 1.5 MHz, $\lambda_s \approx 2.1$ mm in aluminum, requiring element size $\leq \lambda_s / 6 \approx 0.35$ mm for the dG-FEM method.
Solve: Run the time-domain simulation using the "Elastic Waves, Time Explicit" interface (dG-FEM with explicit Runge–Kutta time integration). Typical time step: $\Delta t \approx 0.3$ ns (CFL condition), total time: $T \approx 20\;\mu$s.
Extract signal data: Compute the terminal voltage $V(t)$ at $N_t = 1024$ uniformly-spaced time points. Store as $(x_c, y_c, L, \theta,\, V(t))$ — approximately 8 KB per sample.
Extract field data (if training Pipeline 2): Interpolate $(v_x, v_y)$ onto a regular $128 \times 128$ spatial grid at 64 time snapshots. Store as a tensor $\in \R^{128 \times 128 \times 64 \times 2}$ — approximately 8 MB per sample (float16) or 16 MB (float32).
Normalize: Standardize each crack parameter to $[0, 1]$ using the sampling ranges. Normalize signals by the maximum peak amplitude across the training set. Normalize wavefields by the global maximum absolute velocity.

Computational budget: Each COMSOL simulation takes approximately 2–5 minutes on a modern workstation (depending on mesh density and time window). For 2000 samples:

$$ 2000 \times 3\;\text{min} \approx 100\;\text{hours of compute} $$

This is parallelizable across cores (or across multiple COMSOL licenses). COMSOL's LiveLink for MATLAB or LiveLink for Python enables automated parametric sweeps — a script iterates over crack configurations, updates the geometry, solves, and extracts results without manual intervention.

Storage budget: For signal-level data (Pipeline 1): $2000 \times 8\;\text{KB} \approx 16\;\text{MB}$ — trivial. For full-field data (Pipeline 2): $2000 \times 8\;\text{MB} \approx 16\;\text{GB}$ at float16 — manageable but substantial. Both outputs can be extracted from the same COMSOL solve, so there's no extra simulation cost for running both pipelines.

The train/validation/test split is 1600/200/200 (80/10/10). The validation set guides early stopping and hyperparameter selection; the test set provides the final unbiased error estimate.

Warning: Data generation is the bottleneck. Each COMSOL solve is expensive, and the total dataset requires ~100 hours of computation. Budget this carefully — 2000 simulations is a minimum for reliable training, but each gives you a high-fidelity ground truth that no physics-informed approach can match for free. If you plan to train Pipeline 2 (full-field), extract both signal and field data from every solve.

36Training & Evaluation

With training data in hand, we train both FNO pipelines using the same relative $L^2$ loss from Chapter 1 (§5):

Loss function

$$ \mathcal{L}(\theta) = \frac{1}{N}\sum_{j=1}^{N} \frac{\|G_\theta(a_j) - u_j\|_2^2}{\|u_j\|_2^2} \tag{19} $$

For Pipeline 1: $u_j = V_j(t)$ and $\|\cdot\|_2$ is the $L^2$ norm over time: $\|V\|_2^2 = \sum_{k} |V(t_k)|^2 \,\Delta t$.

For Pipeline 2: $u_j = \vv_j(x,y,t)$ and $\|\cdot\|_2$ is the $L^2$ norm over space and time: $\|\vv\|_2^2 = \sum_{i,j,k} (v_x^2 + v_y^2) \,\Delta x\,\Delta y\,\Delta t$.

The denominator normalizes by signal energy, preventing large-amplitude samples from dominating the loss.

Training details

Hyperparameter	Pipeline 1 (signal)	Pipeline 2 (full-field)
Optimizer	Adam, initial lr = $10^{-3}$
LR schedule	Cosine annealing over 500 epochs (decays to ~$10^{-5}$)
Batch size	64	2–4
Epochs	500	500
Training wall time	~10 min (single GPU)	~6–12 hours (single GPU)
Data augmentation	Flip $x_c \to x_{\max} - x_c$, $\theta \to -\theta$ (exploits left-right symmetry)

Evaluation metrics

Raw $L^2$ error measures overall prediction accuracy, but NDT practitioners care about specific features. We evaluate on different metrics depending on the pipeline:

Signal-level metrics (Pipeline 1):

Metric	Definition	NDT relevance
Relative $L^2$ error	$\\|G_\theta(a) - V\\|_2 / \\|V\\|_2$	Overall waveform fidelity
Peak amplitude error	$\|A_{\text{pred}} - A_{\text{true}}\| / A_{\text{true}}$	Crack size estimation (reflectivity)
Time-of-flight error	$\|t_{\text{pred}}^{\text{peak}} - t_{\text{true}}^{\text{peak}}\|$	Crack depth/position estimation
Pointwise max error	$\max_t \|G_\theta(a)(t) - V(t)\|$	Worst-case prediction reliability

Full-field metrics (Pipeline 2):

Metric	Definition	Purpose
Relative $L^2$ error	$\\|\hat{\vv} - \vv\\|_2 / \\|\vv\\|_2$	Global accuracy over all space and time
Per-snapshot error	$\\|\hat{\vv}(\cdot,t_k) - \vv(\cdot,t_k)\\|_2 / \\|\vv(\cdot,t_k)\\|_2$	How accuracy varies in time (early vs. late, pre- vs. post-scattering)
Near-crack error	Same as $L^2$ but restricted to a box around the crack	Accuracy where it matters most — the scattering region
Derived signal error	$\\|\hat{V}_{\text{derived}} - V\\|_2 / \\|V\\|_2$	Does the full-field prediction give the correct transducer signal?

Key idea: For Pipeline 2, the per-snapshot error is particularly revealing. We expect low error during the smooth incident-beam phase ($t < 8\;\mu$s) and higher error during and after crack scattering ($t > 8\;\mu$s), where the wavefield develops sharp features. If the error spikes at the scattering time, it signals that the FNO needs more modes or a different architecture to resolve the scattered wavefield.

Baseline comparisons

To justify using FNO over simpler approaches, compare each pipeline against appropriate baselines:

MLP (both pipelines): Crack parameters $\to$ output directly via a multi-layer perceptron. No spectral structure, no resolution invariance. Simplest baseline.
DeepONet (both pipelines): Branch network encodes crack parameters, trunk network evaluates at query points $(t)$ or $(x,y,t)$. Natural architecture for parametric-input operator learning.
U-Net (Pipeline 2 only): Image-to-image mapping. Works well for smooth fields but lacks resolution invariance and struggles with the parametric input format.

Resolution transfer test

The signature test for FNO: train at coarse resolution and evaluate at fine resolution without retraining.

Pipeline 1: Train at $N_t = 256$ time points, test at $N_t = 1024$. If the FNO has learned the continuous voltage signal, finer sampling should maintain or reduce error.
Pipeline 2: Train at $64 \times 64 \times 32$ spatiotemporal grid, test at $128 \times 128 \times 64$. Here we test resolution invariance in both space and time simultaneously.

$$ G_\theta^{(\text{coarse})} \stackrel{?}{\approx} G_\theta^{(\text{fine})} \tag{20} $$

This test is especially informative for wave problems: if the FNO resolves the wavefronts at coarse resolution, the finer grid just samples the same wavefronts more densely. But if sharp scattered features were aliased at coarse resolution, the fine-grid evaluation may expose errors that the coarse test missed.

37Challenges & What Comes Next

Applying FNO to ultrasonic wave scattering exposes limitations that don't arise in the smooth, periodic-ish problems from Chapter 5. Mehtaj & Banerjee (2025) provide a thorough review of these challenges across elastic and acoustic wave problems, documenting where standard FNO succeeds and where specialized variants are needed. Here are the four main challenges and the FNO variants designed to address them.

Challenge 1: High-frequency scattering

Scattered wavefronts have sharp spatial gradients — exactly the kind of fine-scale features that FNO's mode truncation (keeping only $k \leq k_{\max}$) discards. The incident beam is relatively smooth (1.5 MHz $\approx$ 2 mm wavelength in aluminum), but tip diffraction creates cylindrical waves with structure at and below the wavelength scale. This affects Pipeline 2 (full-field) far more than Pipeline 1 (signal), because the signal integrates over the transducer face, which naturally low-pass filters the wavefield.

Mitigations:

Increase $k_{\max}$ (retain more Fourier modes, at computational cost)
Use PINO (physics-informed neural operator) — add the elastic wave equation residual to the loss, encouraging physical consistency even at unresolved scales
Lean on the $W$ path (local/pointwise) to capture sharp features that the Fourier path misses

Challenge 2: Non-periodic boundaries

The aluminum specimen has absorbing BCs on the sides and bottom, and a free surface on top — far from the periodic boundaries that the FFT assumes. The standard FFT treats the domain as if it wraps around, which creates artifacts at boundaries. This primarily affects Pipeline 2 (full-field), since Pipeline 1 operates only on the time axis where the signal naturally starts and ends near zero.

Mitigations:

Zero-pad the domain before the FFT (extend with zeros to reduce wrap-around artifacts)
Use Geo-FNO for the spatial dimensions — it maps irregular geometries to a regular latent domain via a learned deformation, avoiding the periodicity assumption
The $W$ path naturally handles boundary effects since it's a pointwise linear transform with no periodicity assumption

Challenge 3: Multi-scale physics

The wavefield has structure at multiple scales: smooth beam propagation (mm scale), sharp wavefronts (sub-mm), and fine tip diffraction patterns (wavelength scale). A single $k_{\max}$ can't efficiently represent all these scales.

Mitigations:

U-FNO: Adds U-Net-style skip connections to the Fourier layers, capturing both coarse and fine features simultaneously
Factorized FNO (FFNO): Decomposes the 3D FFT into sequential 1D FFTs along each axis — more efficient and allows different $k_{\max}$ per dimension (e.g., more temporal modes than spatial modes, since the signal evolves over many cycles but the spatial domain is only a few wavelengths across)

Challenge 4: Generalization beyond the training distribution

What if the actual crack is outside the training $\mu$? For example: larger than 8 mm, at a position near a boundary, or oriented at $\theta > 30°$. FNO, like all supervised ML, extrapolates poorly beyond its training distribution.

Mitigations:

Physics-informed approaches (PINO) help by encoding the wave equation as a constraint, improving generalization where the PDE is known
Transfer learning: pre-train on the original $\mu$, then fine-tune on a small dataset of new crack types
Active learning: iteratively identify crack configurations where FNO is least certain, generate new COMSOL data there, and retrain

FNO Variant	Key Idea	Addresses
Standard FNO	FFT-based global kernel	Smooth, periodic-ish problems
PINO	PDE residual in loss	Data scarcity, physical consistency
Geo-FNO	Learned geometry deformation	Non-periodic, irregular domains
U-FNO	U-Net skip connections	Multi-scale features
FFNO	Factorized 1D FFTs	Efficiency, per-axis mode control

FFNO (factorized FFTs)

Figure 7.4. Decision tree for choosing an FNO variant based on problem characteristics. For ultrasonic NDT (non-periodic, multi-scale, moderate data), the path leads to Geo-FNO or PINO with U-FNO features.

Looking forward

With a trained FNO surrogate for the NDT forward model, several applications open up:

Inverse problem: Given a measured signal $V_{\text{obs}}(t)$, infer the crack parameters $(x_c, y_c, L, \theta)$ using Bayesian inversion: $$ p(a \mid V_{\text{obs}}) \propto p(V_{\text{obs}} \mid a)\, p(a) \approx \mathcal{N}(V_{\text{obs}};\; G_\theta(a),\; \sigma^2 I)\, p(a) \tag{21} $$ With Pipeline 1 as the forward model $G_\theta$, MCMC sampling becomes feasible — each forward pass takes ~1 ms instead of ~3 min. For 10,000 MCMC samples: ~10 seconds (FNO) vs. ~21 days (COMSOL). This is the same idea as §26 for Darcy flow, now in 4D parameter space.
Wavefield imaging (Pipeline 2): Use the full-field FNO as a fast forward model for synthetic aperture focusing. Simulate the wavefield for multiple transducer positions, then coherently sum to create a focused image of the crack — all without a single FEM solve at inference time.
Real-time inspection: Deploy Pipeline 1 on an edge device for in-field crack sizing. A technician positions the transducer, acquires a signal, and the FNO-based inverse solver returns crack parameters in real time.
Inspection design: Optimize transducer frequency, wedge angle, and scan position for maximum sensitivity to a given class of defects — using FNO to rapidly evaluate thousands of configurations.
Multi-physics: Extend to combined thermal + ultrasonic inspection, where thermal imaging identifies regions of interest and ultrasonic NDT characterizes detected flaws in detail.

The big picture: FNO for NDT inverts the usual workflow. Instead of "measure signal → guess crack → simulate to check → iterate," it becomes "measure signal → run fast FNO inversion → get crack parameters." The expensive simulation happens once, during training data generation. Inference is instantaneous. This is the promise of operator learning for engineering applications: train once on physics, deploy everywhere.

References

Li et al. (2020) — "Fourier Neural Operator for Parametric Partial Differential Equations." arXiv:2010.08895. arxiv.org/abs/2010.08895 — The primary FNO paper (Chapters 1–5). Architecture, Darcy/Burgers/Navier–Stokes benchmarks, resolution invariance.
Mehtaj & Banerjee (2025) — "Scientific Machine Learning for Elastic and Acoustic Wave Propagation: Neural Operator and Physics-Guided Neural Network." Sensors, 25(11), 3588. doi.org/10.3390/s25113588 — Comprehensive review of FNO, DeepONet, and PINNs for wave problems. Documents challenges (high-frequency scattering, non-periodic BCs) and architecture variants (PINO, Geo-FNO, U-FNO, FFNO) discussed in §37.
COMSOL (2024) — "Angle Beam Nondestructive Testing" tutorial (model ID 19585). comsol.com/model/angle-beam-nondestructive-testing-19585 — Source for the NDT geometry, material properties, transducer excitation, and fracture boundary condition used throughout Chapter 6.
Li et al. (2022) — "Physics-Informed Neural Operator for Learning Partial Differential Equations." arXiv:2111.03794. arxiv.org/abs/2111.03794 — PINO: adds PDE residual loss to the neural operator framework.
Li et al. (2022) — "Fourier Neural Operator with Learned Deformations for PDEs on General Geometries." arXiv:2207.05209. arxiv.org/abs/2207.05209 — Geo-FNO: handles non-periodic and irregular domains via learned coordinate deformations.
Wen et al. (2022) — "U-FNO — An Enhanced Fourier Neural Operator with U-Net for Multiscale Problems." arXiv:2109.03697. arxiv.org/abs/2109.03697 — U-FNO: U-Net skip connections for multi-scale features.
Tran et al. (2023) — "Factorized Fourier Neural Operators." arXiv:2111.13802. arxiv.org/abs/2111.13802 — FFNO: factorized 1D FFTs for efficiency and per-axis mode control.