Chapter 7

FNO for Crack Detection

Crack parameterization, signal-level and full-field FNO architectures with complete dimension walkthroughs, training data generation from COMSOL, evaluation metrics, and the challenges of applying FNO to high-frequency wave scattering.

In this chapter
  1. 33Parameterizing the Crack Input
  2. 34FNO Architecture for Spatiotemporal Wavefields
  3. 35Training Data Generation
  4. 36Training & Evaluation
  5. 37Challenges & What Comes Next

33Parameterizing the Crack Input

In the Darcy flow problem (Chapters 1–5), the input $a(x)$ was a function defined at every grid point — a permeability field with $\da = 1$ per spatial location. For the NDT problem, the input is fundamentally different: we're specifying the geometry of a straight-line crack inside the aluminum specimen. Four numbers fully determine it.

Crack parameterization
$$ a = (x_c,\; y_c,\; L,\; \theta) \;\in\; \R^4 \tag{16} $$

where $(x_c, y_c)$ is the crack center in mm, $L$ is the crack length in mm, and $\theta$ is the orientation angle in degrees measured from vertical (positive = clockwise tilt). The crack endpoints are:

$$ \mathbf{p}_{\pm} = \begin{pmatrix} x_c \\ y_c \end{pmatrix} \pm \frac{L}{2} \begin{pmatrix} \sin\theta \\ \cos\theta \end{pmatrix} $$

This parameterization has several properties that make it the right choice for this problem:

The straight-line assumption is well-matched to fatigue cracks in metals, which nucleate at stress concentrators and grow along planes of maximum shear stress — producing approximately straight, planar flaws. More complex morphologies (branching, curved, or multiple interacting cracks) would require either additional parameters or a switch to a field encoding, but for the canonical NDT inspection problem, four numbers suffice.

The sampling distribution $\mu$

The distribution from which we draw training crack configurations defines the region of parameter space the FNO will learn to cover. Each parameter is sampled independently:

ParameterDistributionRangePhysical rationale
$x_c$Uniform[5 mm, 30 mm]Stay 5 mm from side boundaries (absorbing BC artifacts)
$y_c$Uniform[5 mm, 27 mm]Stay 5 mm from top/bottom surfaces
$L$Uniform[1 mm, 8 mm]$L/\lambda_s$ from 0.5 to 4 (sub-wavelength to multi-wavelength)
$\theta$Uniform[$-30°$, $+30°$]Fatigue cracks grow roughly perpendicular to surface

The length range deserves attention. The S-wave wavelength is $\lambda_s = \cs / f_0 = 3120 / 1.5 \times 10^6 \approx 2.1\;\text{mm}$. A 1 mm crack is sub-wavelength ($L/\lambda_s \approx 0.5$) — it scatters weakly and is hard to detect. An 8 mm crack is roughly $4\lambda_s$ — it produces strong specular reflections and well-separated tip diffraction signals. This range covers the physically interesting regime where scattering transitions from "barely detectable" to "geometrically resolvable."

Aluminum cross-section (35 mm × 32 mm) θ=0°, L=5 θ=7°, L=7 θ=−16°, L=3 θ=0°, L=2.5 θ=20°, L=5 dashed box: sampling region for $(x_c, y_c)$ — 5 mm margin from boundaries
Figure 7.1. Five sample crack configurations drawn from the training distribution $\mu$. Each crack is fully specified by $(x_c, y_c, L, \theta)$. Labels show orientation and length (mm). The sampling region (dashed) keeps crack centers away from boundary artifacts.

Key idea: Unlike Darcy flow where the input is a spatially-varying field, the NDT input is a compact 4-vector. This changes how the lifting operator $P$ works: instead of acting pointwise on a function $a(x)$, it must broadcast the same 4 numbers to every point on the computational grid. The FNO then learns how those 4 numbers control the spatiotemporal wavefield everywhere.


34FNO Architecture for Spatiotemporal Wavefields

With the crack parameterization fixed at $a = (x_c, y_c, L, \theta) \in \R^4$, the architecture question is: what does the FNO predict? We develop two complete pipelines, both taking the same parametric input but targeting different outputs. Each has its own architecture, its own trade-offs, and its own applications.

Pipeline 1: Signal-level FNO — crack parameters $\to$ transducer voltage $V(t)$

The most direct formulation for NDT: predict the time-domain voltage signal that the transducer would record for a given crack. This is a 1D output — a single real number at each of $N_t$ time steps.

The operator:

$$ G^\dagger_{\text{sig}}: (x_c, y_c, L, \theta) \;\mapsto\; V(t), \qquad V: [0,\, T] \to \R $$

Input construction. We have $\da = 4$ crack parameters and a time grid $\{t_j\}_{j=1}^{N_t}$ with $N_t = 1024$ points spanning $[0,\, 20\;\mu\text{s}]$. At each time point, concatenate the time coordinate with the crack parameters:

$$ \text{input at time } t_j: \quad \bigl[\,t_j,\;\; x_c,\;\; y_c,\;\; L,\;\; \theta\,\bigr] \;\in\; \R^5 $$

The full input tensor is $\in \R^{1024 \times 5}$ — each row is the same 4 crack parameters appended to a different time coordinate. This is a standard technique for conditioning FNO on global parameters: broadcast the parameters to every grid point so the network has access to them everywhere.

Forward pass, step by step:

  1. Lift: The pointwise lifting operator $P: \R^5 \to \R^{64}$ maps each 5-vector to a 64-channel representation. Output: $v^{(0)} \in \R^{1024 \times 64}$.
  2. Fourier layer 1: Apply 1D FFT along the time axis. Multiply the lowest $k_{\max} = 32$ modes by $R_1 \in \mathbb{C}^{64 \times 64 \times 32}$. Inverse FFT back to physical space. Add the local path $W_1 v^{(0)}$, where $W_1 \in \R^{64 \times 64}$. Apply GELU activation. Output: $v^{(1)} \in \R^{1024 \times 64}$.
  3. Fourier layers 2–4: Identical structure. Each layer has its own $R_\ell$ and $W_\ell$. Output after layer 4: $v^{(4)} \in \R^{1024 \times 64}$.
  4. Project: The pointwise projection $Q: \R^{64} \to \R^1$ maps each 64-channel vector to a single voltage value. Output: $\hat{V} \in \R^{1024}$.
Signal-level FNO — forward pass
$$ \hat{V}(t) = Q \circ \bigl(\sigma(W_4 + \Fti(R_4 \cdot \Ft))\bigr) \circ \cdots \circ \bigl(\sigma(W_1 + \Fti(R_1 \cdot \Ft))\bigr) \circ P\bigl([t,\; a]\bigr) \tag{17} $$

with $P: \R^5 \to \R^{64}$, each $R_\ell \in \mathbb{C}^{64 \times 64 \times 32}$, each $W_\ell \in \R^{64 \times 64}$, $Q: \R^{64} \to \R^1$, and $\sigma$ = GELU.

input [$t_j$, $x_c$, $y_c$, $L$, $\theta$] at each of 1024 time pts 1024 × 5 $P$ 1024 × 64 4 Fourier layers (1D FFT) $\Fti\!\bigl(R_\ell \cdot \Ft(v)\bigr)$ — global (32 modes) $W_\ell v$ — local (pointwise) add + GELU — repeat ×4 $Q$ $\hat{V}(t)$ predicted voltage signal 1024 × 1
Figure 7.2. Signal-level FNO (Pipeline 1). The 4 crack parameters are concatenated with a time coordinate at each of 1024 time points, lifted to 64 channels, processed through 4 Fourier layers (1D FFT, $k_{\max} = 32$), and projected to the predicted voltage signal.

Parameter count. Each Fourier layer contributes: $R_\ell$ has $64 \times 64 \times 32 \times 2 = 262{,}144$ real parameters (complex tensor, factor of 2), and $W_\ell$ has $64 \times 64 = 4{,}096$. With 4 layers plus $P$ ($5 \times 64 = 320$) and $Q$ ($64 \times 1 = 64$), the total is approximately 1.07M parameters.

Key idea: The signal-level FNO is a function-to-function map from the time axis to itself, conditioned on 4 global parameters. The 1D FFT captures temporal correlations in the signal — for instance, the echo from a deeper crack arrives later (shifted in time) and is weaker (lower amplitude), which is a smooth function of $(x_c, y_c)$. The Fourier layers learn this time-shift structure naturally.


Pipeline 2: Full-field FNO-3D — crack parameters $\to$ velocity wavefield $\vv(x,y,t)$

The full-field formulation predicts the complete spatiotemporal velocity wavefield — two velocity components $(v_x, v_y)$ at every spatial point and every time step. This is vastly more information than the transducer signal, and it's what you need for advanced applications like wavefield imaging, full waveform inversion, or validating the physics learned by the network.

The operator:

$$ G^\dagger_{\text{field}}: (x_c, y_c, L, \theta) \;\mapsto\; \vv(\mathbf{x}, t), \qquad \vv: \Omega \times [0,\,T] \to \R^2 $$

Input construction. This is where the parametric approach creates a design challenge. The output lives on a 3D grid $(x_i, y_j, t_k)$, but the input is just 4 numbers. We need to build an input tensor that the 3D Fourier layers can operate on. The standard approach: at every grid point, concatenate the spatial coordinates, the time coordinate, and the crack parameters:

$$ \text{input at } (x_i, y_j, t_k): \quad \bigl[\,x_i,\;\; y_j,\;\; t_k,\;\; x_c,\;\; y_c,\;\; L,\;\; \theta\,\bigr] \;\in\; \R^7 $$

The crack parameters are broadcast (copied identically) to every grid point. The spatial and temporal coordinates vary across the grid, giving the network positional information. The full input tensor is $\in \R^{N_x \times N_y \times N_t \times 7}$.

For our NDT problem with $N_x = N_y = 128$ spatial points and $N_t = 64$ time snapshots (subsampled from the full simulation):

Forward pass, step by step:

  1. Lift: $P: \R^7 \to \R^{48}$ maps each 7-vector pointwise to a 48-channel representation. Output: $v^{(0)} \in \R^{128 \times 128 \times 64 \times 48}$.
  2. Fourier layer 1: Apply 3D FFT over $(x, y, t)$. The transform produces a 3D array of Fourier coefficients. Retain only the lowest modes: $k_x \leq 12$, $k_y \leq 12$, $k_t \leq 16$. Multiply these retained coefficients by $R_1 \in \mathbb{C}^{48 \times 48 \times 12 \times 12 \times 16}$. Inverse 3D FFT back to physical space. Add the local path $W_1 v^{(0)}$ ($W_1 \in \R^{48 \times 48}$). Apply GELU. Output: $v^{(1)} \in \R^{128 \times 128 \times 64 \times 48}$.
  3. Fourier layers 2–4: Same structure. Each layer has independent $R_\ell$ and $W_\ell$. All intermediate representations stay at $\dv = 48$.
  4. Project: $Q: \R^{48} \to \R^2$ maps each 48-vector pointwise to the two velocity components $(v_x, v_y)$. Output: $\hat{\vv} \in \R^{128 \times 128 \times 64 \times 2}$.
Full-field FNO-3D — forward pass
$$ \hat{\vv}(\mathbf{x}, t) = Q \circ \bigl(\sigma(W_4 + \Fti_{3\text{D}}(R_4 \cdot \Ft_{3\text{D}}))\bigr) \circ \cdots \circ \bigl(\sigma(W_1 + \Fti_{3\text{D}}(R_1 \cdot \Ft_{3\text{D}}))\bigr) \circ P\bigl([x, y, t, a]\bigr) \tag{18} $$

with $P: \R^7 \to \R^{48}$, each $R_\ell \in \mathbb{C}^{48 \times 48 \times 12 \times 12 \times 16}$, each $W_\ell \in \R^{48 \times 48}$, $Q: \R^{48} \to \R^2$, and $\sigma$ = GELU. The 3D FFT acts jointly on the spatial and temporal axes.

input [$x_i$, $y_j$, $t_k$, $x_c$, $y_c$, $L$, $\theta$] at each point on 3D grid 128×128×64 × 7 $P$ 128² × 64 × 48 4 Fourier layers (3D FFT) $\Fti_{3\text{D}}\!\bigl(R_\ell \cdot \Ft_{3\text{D}}(v)\bigr)$ $k_x{\leq}12,\; k_y{\leq}12,\; k_t{\leq}16$ $W_\ell v$ — local (pointwise) add + GELU — repeat ×4 $Q$ $\hat{\vv}(x,y,t)$ predicted wavefield $(v_x, v_y)$ 128×128×64 × 2
Figure 7.3. Full-field FNO-3D (Pipeline 2). Crack parameters and grid coordinates form a 7-channel input at each of $128 \times 128 \times 64$ spatiotemporal grid points. 3D FFTs capture joint spatial and temporal correlations. Output is the full velocity field $(v_x, v_y)$.

Parameter count. Each $R_\ell$ has $48 \times 48 \times 12 \times 12 \times 16 \times 2 = 10{,}616{,}832$ real parameters. With 4 layers, plus $W_\ell$ ($4 \times 48^2 = 9{,}216$), $P$ ($7 \times 48 = 336$), and $Q$ ($48 \times 2 = 96$): approximately 42.5M parameters — about 40× larger than the signal-level network.

Memory. The intermediate representation $v^{(\ell)} \in \R^{128 \times 128 \times 64 \times 48}$ contains $128^2 \times 64 \times 48 \approx 50$M floats, or ~200 MB at float32. With activations stored for backpropagation through 4 layers, a single training sample requires roughly 2–4 GB of GPU memory. This limits batch size to 2–4 on a typical 24 GB GPU.

Physics note: Why use a 3D FFT rather than separate spatial and temporal transforms? Because the wavefield has strong spatiotemporal correlations: a wavefront at position $(x, y)$ at time $t$ is tightly linked to its position $(x + \Delta x, y + \Delta y)$ at time $t + \Delta t$. The 3D FFT captures these correlations in a single tensor multiplication. A factorized approach (separate 2D spatial + 1D temporal FFTs) would miss these cross-dimensional patterns, though it can be more efficient — see FFNO in §37.

Why the spatial coordinates matter

A subtle but important point: why does the full-field input include $(x_i, y_j, t_k)$ alongside the crack parameters? Consider what would happen without them. The crack parameters $(x_c, y_c, L, \theta)$ are the same 4 numbers at every grid point. Without coordinates, the network would see an identical 4-vector everywhere — it couldn't distinguish "what should the wavefield look like at this point near the crack" from "what should it look like far away." The coordinates break this symmetry: they tell the network where in the domain it's making a prediction.

In the Darcy problem (Chapters 1–5), coordinates weren't needed because the input function itself varied across the grid — the permeability $a(x)$ was different at every point, providing positional information implicitly. Here, the parametric input is spatially constant, so coordinates must be supplied explicitly.

Extracting $V(t)$ from the full-field prediction

The full-field FNO gives us more than we need for NDT diagnosis. To extract the transducer signal from the full-field prediction, we evaluate the piezoelectric coupling integral over the transducer surface $\Gamma_T$:

$$ \hat{V}(t) = \int_{\Gamma_T} \hat{\vv}(\mathbf{x}, t) \cdot \vn \; d\Gamma $$

In practice, this reduces to averaging the normal velocity component over the transducer face — a simple post-processing step. This means the full-field FNO can reproduce every result the signal-level FNO gives, plus the entire wavefield.

Comparing the two pipelines

PropertySignal-level (Pipeline 1)Full-field (Pipeline 2)
Operator$a \mapsto V(t) \in \R$$a \mapsto \vv(x,y,t) \in \R^2$
FFT type1D (time only)3D (space + time)
Input channels $\da$5 (4 params + $t$)7 (4 params + $x, y, t$)
Hidden channels $\dv$6448
Output channels $\du$12
Mode truncation $k_{\max}$32(12, 12, 16)
Grid size1024$128 \times 128 \times 64$
Parameters~1.1M~42.5M
Memory/sample (training)~10 MB~2–4 GB
Batch size (24 GB GPU)642–4
Training time (2000 samples)~10 min~6–12 hours
Inference time~1 ms~50 ms

When to use which:

Physics note: The two pipelines are not competitors — they serve different purposes. Pipeline 1 answers "what will the transducer see?", which is sufficient for most inspection tasks. Pipeline 2 answers "what is happening everywhere in the specimen?", which is necessary for imaging and physics validation. A practical workflow might train Pipeline 1 first (fast iteration), then use Pipeline 2 for the cases where spatial understanding is critical.


35Training Data Generation

Both FNO pipelines rely on a FEM solver (COMSOL, or an equivalent like Abaqus or a custom discontinuous Galerkin code) as the data factory. Each simulation takes a crack configuration $a = (x_c, y_c, L, \theta)$ and produces the full wavefield from which we extract either the transducer voltage $V(t)$ or the gridded velocity field $\vv(x,y,t)$ — or both.

  1. Sample crack parameters: Draw $(x_c, y_c, L, \theta)$ from the distribution $\mu$ defined in §33. Generate $N = 2000$ samples.
  2. Build geometry: For each sample, update the COMSOL model: place a zero-thickness fracture boundary at the specified position, length, and orientation. The crack endpoints follow from Eq. 16.
  3. Mesh: Generate a finite element mesh. The mesh must resolve the shortest wavelength — at 1.5 MHz, $\lambda_s \approx 2.1$ mm in aluminum, requiring element size $\leq \lambda_s / 6 \approx 0.35$ mm for the dG-FEM method.
  4. Solve: Run the time-domain simulation using the "Elastic Waves, Time Explicit" interface (dG-FEM with explicit Runge–Kutta time integration). Typical time step: $\Delta t \approx 0.3$ ns (CFL condition), total time: $T \approx 20\;\mu$s.
  5. Extract signal data: Compute the terminal voltage $V(t)$ at $N_t = 1024$ uniformly-spaced time points. Store as $(x_c, y_c, L, \theta,\, V(t))$ — approximately 8 KB per sample.
  6. Extract field data (if training Pipeline 2): Interpolate $(v_x, v_y)$ onto a regular $128 \times 128$ spatial grid at 64 time snapshots. Store as a tensor $\in \R^{128 \times 128 \times 64 \times 2}$ — approximately 8 MB per sample (float16) or 16 MB (float32).
  7. Normalize: Standardize each crack parameter to $[0, 1]$ using the sampling ranges. Normalize signals by the maximum peak amplitude across the training set. Normalize wavefields by the global maximum absolute velocity.

Computational budget: Each COMSOL simulation takes approximately 2–5 minutes on a modern workstation (depending on mesh density and time window). For 2000 samples:

$$ 2000 \times 3\;\text{min} \approx 100\;\text{hours of compute} $$

This is parallelizable across cores (or across multiple COMSOL licenses). COMSOL's LiveLink for MATLAB or LiveLink for Python enables automated parametric sweeps — a script iterates over crack configurations, updates the geometry, solves, and extracts results without manual intervention.

Storage budget: For signal-level data (Pipeline 1): $2000 \times 8\;\text{KB} \approx 16\;\text{MB}$ — trivial. For full-field data (Pipeline 2): $2000 \times 8\;\text{MB} \approx 16\;\text{GB}$ at float16 — manageable but substantial. Both outputs can be extracted from the same COMSOL solve, so there's no extra simulation cost for running both pipelines.

The train/validation/test split is 1600/200/200 (80/10/10). The validation set guides early stopping and hyperparameter selection; the test set provides the final unbiased error estimate.

Warning: Data generation is the bottleneck. Each COMSOL solve is expensive, and the total dataset requires ~100 hours of computation. Budget this carefully — 2000 simulations is a minimum for reliable training, but each gives you a high-fidelity ground truth that no physics-informed approach can match for free. If you plan to train Pipeline 2 (full-field), extract both signal and field data from every solve.


36Training & Evaluation

With training data in hand, we train both FNO pipelines using the same relative $L^2$ loss from Chapter 1 (§5):

Loss function
$$ \mathcal{L}(\theta) = \frac{1}{N}\sum_{j=1}^{N} \frac{\|G_\theta(a_j) - u_j\|_2^2}{\|u_j\|_2^2} \tag{19} $$

For Pipeline 1: $u_j = V_j(t)$ and $\|\cdot\|_2$ is the $L^2$ norm over time: $\|V\|_2^2 = \sum_{k} |V(t_k)|^2 \,\Delta t$.

For Pipeline 2: $u_j = \vv_j(x,y,t)$ and $\|\cdot\|_2$ is the $L^2$ norm over space and time: $\|\vv\|_2^2 = \sum_{i,j,k} (v_x^2 + v_y^2) \,\Delta x\,\Delta y\,\Delta t$.

The denominator normalizes by signal energy, preventing large-amplitude samples from dominating the loss.

Training details

HyperparameterPipeline 1 (signal)Pipeline 2 (full-field)
OptimizerAdam, initial lr = $10^{-3}$
LR scheduleCosine annealing over 500 epochs (decays to ~$10^{-5}$)
Batch size642–4
Epochs500500
Training wall time~10 min (single GPU)~6–12 hours (single GPU)
Data augmentationFlip $x_c \to x_{\max} - x_c$, $\theta \to -\theta$ (exploits left-right symmetry)

Evaluation metrics

Raw $L^2$ error measures overall prediction accuracy, but NDT practitioners care about specific features. We evaluate on different metrics depending on the pipeline:

Signal-level metrics (Pipeline 1):

MetricDefinitionNDT relevance
Relative $L^2$ error$\|G_\theta(a) - V\|_2 / \|V\|_2$Overall waveform fidelity
Peak amplitude error$|A_{\text{pred}} - A_{\text{true}}| / A_{\text{true}}$Crack size estimation (reflectivity)
Time-of-flight error$|t_{\text{pred}}^{\text{peak}} - t_{\text{true}}^{\text{peak}}|$Crack depth/position estimation
Pointwise max error$\max_t |G_\theta(a)(t) - V(t)|$Worst-case prediction reliability

Full-field metrics (Pipeline 2):

MetricDefinitionPurpose
Relative $L^2$ error$\|\hat{\vv} - \vv\|_2 / \|\vv\|_2$Global accuracy over all space and time
Per-snapshot error$\|\hat{\vv}(\cdot,t_k) - \vv(\cdot,t_k)\|_2 / \|\vv(\cdot,t_k)\|_2$How accuracy varies in time (early vs. late, pre- vs. post-scattering)
Near-crack errorSame as $L^2$ but restricted to a box around the crackAccuracy where it matters most — the scattering region
Derived signal error$\|\hat{V}_{\text{derived}} - V\|_2 / \|V\|_2$Does the full-field prediction give the correct transducer signal?

Key idea: For Pipeline 2, the per-snapshot error is particularly revealing. We expect low error during the smooth incident-beam phase ($t < 8\;\mu$s) and higher error during and after crack scattering ($t > 8\;\mu$s), where the wavefield develops sharp features. If the error spikes at the scattering time, it signals that the FNO needs more modes or a different architecture to resolve the scattered wavefield.

Baseline comparisons

To justify using FNO over simpler approaches, compare each pipeline against appropriate baselines:

Resolution transfer test

The signature test for FNO: train at coarse resolution and evaluate at fine resolution without retraining.

$$ G_\theta^{(\text{coarse})} \stackrel{?}{\approx} G_\theta^{(\text{fine})} \tag{20} $$

This test is especially informative for wave problems: if the FNO resolves the wavefronts at coarse resolution, the finer grid just samples the same wavefronts more densely. But if sharp scattered features were aliased at coarse resolution, the fine-grid evaluation may expose errors that the coarse test missed.


37Challenges & What Comes Next

Applying FNO to ultrasonic wave scattering exposes limitations that don't arise in the smooth, periodic-ish problems from Chapter 5. Mehtaj & Banerjee (2025) provide a thorough review of these challenges across elastic and acoustic wave problems, documenting where standard FNO succeeds and where specialized variants are needed. Here are the four main challenges and the FNO variants designed to address them.

Challenge 1: High-frequency scattering

Scattered wavefronts have sharp spatial gradients — exactly the kind of fine-scale features that FNO's mode truncation (keeping only $k \leq k_{\max}$) discards. The incident beam is relatively smooth (1.5 MHz $\approx$ 2 mm wavelength in aluminum), but tip diffraction creates cylindrical waves with structure at and below the wavelength scale. This affects Pipeline 2 (full-field) far more than Pipeline 1 (signal), because the signal integrates over the transducer face, which naturally low-pass filters the wavefield.

Mitigations:

Challenge 2: Non-periodic boundaries

The aluminum specimen has absorbing BCs on the sides and bottom, and a free surface on top — far from the periodic boundaries that the FFT assumes. The standard FFT treats the domain as if it wraps around, which creates artifacts at boundaries. This primarily affects Pipeline 2 (full-field), since Pipeline 1 operates only on the time axis where the signal naturally starts and ends near zero.

Mitigations:

Challenge 3: Multi-scale physics

The wavefield has structure at multiple scales: smooth beam propagation (mm scale), sharp wavefronts (sub-mm), and fine tip diffraction patterns (wavelength scale). A single $k_{\max}$ can't efficiently represent all these scales.

Mitigations:

Challenge 4: Generalization beyond the training distribution

What if the actual crack is outside the training $\mu$? For example: larger than 8 mm, at a position near a boundary, or oriented at $\theta > 30°$. FNO, like all supervised ML, extrapolates poorly beyond its training distribution.

Mitigations:

FNO VariantKey IdeaAddresses
Standard FNOFFT-based global kernelSmooth, periodic-ish problems
PINOPDE residual in lossData scarcity, physical consistency
Geo-FNOLearned geometry deformationNon-periodic, irregular domains
U-FNOU-Net skip connectionsMulti-scale features
FFNOFactorized 1D FFTsEfficiency, per-axis mode control
Your wave problem periodic BCs? non-periodic? Standard FNO Geo-FNO smooth? multi-scale? FNO (as-is) U-FNO data-rich? data-scarce? Geo-FNO PINO high-dim? → FFNO (factorized FFTs)
Figure 7.4. Decision tree for choosing an FNO variant based on problem characteristics. For ultrasonic NDT (non-periodic, multi-scale, moderate data), the path leads to Geo-FNO or PINO with U-FNO features.

Looking forward

With a trained FNO surrogate for the NDT forward model, several applications open up:

The big picture: FNO for NDT inverts the usual workflow. Instead of "measure signal → guess crack → simulate to check → iterate," it becomes "measure signal → run fast FNO inversion → get crack parameters." The expensive simulation happens once, during training data generation. Inference is instantaneous. This is the promise of operator learning for engineering applications: train once on physics, deploy everywhere.


References