Variational Free Energy & Naive Mean Field

In this chapter

1Variational Free Energy
2The NMF Ansatz
31D Ising Application
42D Ising Application

1Variational Free Energy

We follow the notation of Wu, Wang & Zhang (arXiv:1809.10606), referred to throughout as the VAN paper. The key idea is: we cannot compute the partition function $Z$ exactly for interesting models, so we introduce a tractable approximate distribution $q_\theta(s)$ and minimize a variational upper bound on the true free energy.

The Boltzmann distribution and the partition function

Consider a system of $N$ binary spins $s = (s_1, s_2, \ldots, s_N)$ where each $s_i \in \{+1, -1\}$. The joint probability follows the Boltzmann distribution:

Boltzmann Distribution

$$ p(s) \;=\; \frac{e^{-\beta E(s)}}{Z}\,, \qquad Z \;=\; \sum_{s} e^{-\beta E(s)}\,, \tag{1} $$

where $\beta = 1/(k_B T)$ is the inverse temperature, $E(s)$ is the energy of configuration $s$, and $Z$ is the partition function. The sum $\sum_s$ runs over all $2^N$ configurations.

The free energy is:

Free Energy

$$ F = -\frac{1}{\beta} \ln Z = -k_B T \ln Z\,. \tag{2} $$

Computing $F$ exactly requires summing over exponentially many configurations — a $\#$P-hard problem in general.

The variational ansatz $q_\theta(s)$

We introduce a variational distribution $q_\theta(s)$, parameterized by some set of variational parameters $\theta$. The idea: we will choose $q_\theta$ from a tractable family of distributions, and adjust $\theta$ so that $q_\theta$ is as close as possible to the true Boltzmann distribution $p$.

The closeness is measured by the Kullback–Leibler (KL) divergence:

KL Divergence

$$ \KL(q_\theta \,\|\, p) \;=\; \sum_s q_\theta(s)\, \ln \frac{q_\theta(s)}{p(s)} \;\geq\; 0\,. \tag{3} $$

$\KL = 0$ if and only if $q_\theta(s) = p(s)$ for all $s$.

KL divergence and the variational free energy $\Fq$

Substituting $p(s) = e^{-\beta E(s)}/Z$ into (3):

$$ \KL(q_\theta \| p) = \sum_s q_\theta(s) \left[\ln q_\theta(s) - \ln p(s)\right] = \sum_s q_\theta(s) \left[\ln q_\theta(s) + \beta E(s) + \ln Z\right]. $$
Since $\sum_s q_\theta(s) = 1$: $$ \KL(q_\theta \| p) = \sum_s q_\theta(s)\left[\beta E(s) + \ln q_\theta(s)\right] + \ln Z = \beta\!\left(\underbrace{\frac{1}{\beta}\sum_s q_\theta(s)\left[\beta E(s) + \ln q_\theta(s)\right]}_{\displaystyle \Fq}\right) - \beta F\,. $$

This gives us the fundamental variational relation (VAN paper, Eq. 2):

Fundamental Variational Relation

$$ \boxed{ \KL(q_\theta \| p) \;=\; \beta\bigl(\Fq - F\bigr) \;\geq\; 0 } \tag{4} $$

where the variational free energy is (VAN paper, Eq. 3):

Variational Free Energy

$$ \Fq \;=\; \frac{1}{\beta}\sum_s q_\theta(s)\bigl[\beta E(s) + \ln q_\theta(s)\bigr] \;=\; \avg{E}_q + k_B T\avg{\ln q_\theta}_q\,. \tag{5} $$

We can write this more transparently as:

\Fq \;=\; \underbrace{\sum_s q_\theta(s)\, E(s)}_{\text{average energy } \avg{E}_q} \;-\; T\,\underbrace{k_B\!\Bigl(-\sum_s q_\theta(s) \ln q_\theta(s)\Bigr)}_{\text{thermodynamic entropy } S_q}\,. \tag{6}

That is,

Free Energy Decomposition

$$ \Fq = \avg{E}_q - T\, S_q\,, \qquad S_q = k_B\, \mathcal{H}_q\,, \tag{7} $$

where we have introduced the Shannon entropy (dimensionless, measured in nats):

Shannon Entropy

$$ \mathcal{H}_q \;\equiv\; -\sum_s q_\theta(s)\,\ln q_\theta(s)\,, \tag{7a} $$

and the thermodynamic (Gibbs) entropy $S_q = k_B \,\mathcal{H}_q$, which carries the dimensions of $k_B$ (energy/temperature).

Shannon entropy vs. physics entropy. Throughout these notes we use $\mathcal{H}$ for the dimensionless Shannon entropy and $S = k_B \mathcal{H}$ for the physical entropy. The relation is simply:

$$ S_{\text{physics}} = k_B \times \mathcal{H}_{\text{Shannon}}\,. $$

Many machine-learning references (including the VAN paper) set $k_B = 1$, so that $S$ and $\mathcal{H}$ coincide and $\beta = 1/T$. Most physics textbooks keep $k_B$ explicit. We carry it here so you can compare directly with both conventions. When you see a physics reference writing $S = -k_B \sum p \ln p$, or an ML reference writing $H = -\sum p \ln p$ with $\beta = 1/T$, they are saying the same thing.

Key point: Since $\KL \geq 0$, we have $\Fq \geq F$. The variational free energy is an upper bound on the true free energy. Minimizing $\Fq$ over $\theta$ tightens this bound, making $q_\theta$ approximate $p$ as closely as possible within the chosen family.

2The NMF Ansatz

The naive mean field (NMF) ansatz

The simplest variational family is a fully factorized distribution:

NMF Factorized Ansatz

$$ q_\theta(s) \;=\; \prod_{i=1}^{N} q_i(s_i)\,, \tag{8} $$

where each $q_i(s_i)$ is a Bernoulli distribution over $s_i \in \{+1, -1\}$. Since each $q_i$ must be normalized, it is characterized by a single parameter. Let us parameterize it via the local magnetization:

m_i \;\equiv\; \avg{s_i}_q \;=\; \sum_{s_i = \pm 1} s_i\, q_i(s_i) \;=\; q_i(+1) - q_i(-1)\,. \tag{9}

Combined with normalization $q_i(+1) + q_i(-1) = 1$, we get:

NMF Marginal Distribution

$$ q_i(s_i) \;=\; \frac{1 + m_i\, s_i}{2}\,, \qquad m_i \in [-1,\, +1]\,. \tag{10} $$

So the variational parameters are $\theta = (m_1, m_2, \ldots, m_N)$.

Physical meaning. The factorized ansatz says: the spins are statistically independent. Each spin $s_i$ has its own probability of being up or down, but there are no correlations between different spins. This is the defining approximation of naive mean field theory. The VAN paper generalizes this by replacing the product of independent marginals with a product of conditional probabilities — the autoregressive decomposition — which can capture correlations.

Evaluating $\Fq$ under the NMF ansatz

We need to compute the two pieces of $\Fq$: the average energy and the entropy.

Entropy

Since $q_\theta(s) = \prod_i q_i(s_i)$ and the spins are independent under $q$, the Shannon entropy factorizes:

$$ \mathcal{H}_q = -\sum_s q_\theta(s) \ln q_\theta(s) = -\sum_s \prod_j q_j(s_j) \sum_i \ln q_i(s_i)\,. $$
In the sum over all configurations, the product factorizes. For each term $i$, the sums over all $s_j$ with $j \neq i$ just give 1 (normalization), leaving only the sum over $s_i$: $$ \mathcal{H}_q = -\sum_{i=1}^{N} \sum_{s_i=\pm 1} q_i(s_i) \ln q_i(s_i) \;\equiv\; \sum_{i=1}^{N} \mathcal{H}(q_i)\,, $$
where $\mathcal{H}(q_i)$ is the binary Shannon entropy of the $i$-th marginal. Substituting $q_i(s_i) = (1+m_i s_i)/2$:

NMF Entropy

$$ \mathcal{H}_q = \sum_{i=1}^{N} \mathcal{H}(m_i)\,, \qquad \mathcal{H}(m) \equiv -\frac{1+m}{2}\ln\frac{1+m}{2} - \frac{1-m}{2}\ln\frac{1-m}{2}\,, \tag{11} $$

and the thermodynamic entropy is $S_q = k_B\,\mathcal{H}_q = k_B \sum_i \mathcal{H}(m_i)$.

Note that $\mathcal{H}(0) = \ln 2$ (maximum uncertainty, $\sim 0.693$ nats) and $\mathcal{H}(\pm 1) = 0$ (fully determined spin).

Figure 1.1. The binary Shannon entropy $\mathcal{H}(m)$. Maximum at $m = 0$ (complete uncertainty about spin direction); zero at $m = \pm 1$ (spin fully determined). This is the entropy per spin in nats; multiply by $k_B$ for the thermodynamic entropy per spin.

Average energy

The average energy depends on the specific model. We keep things general here and specialize in Sections 3–4. For any energy that is a sum of pairwise interactions and local fields,

$$ E(s) = -\sum_{\langle i,j\rangle} J_{ij}\, s_i s_j - \sum_i h_i\, s_i\,, $$

the factorized ansatz gives (since $\avg{s_i s_j}_q = \avg{s_i}_q \avg{s_j}_q = m_i m_j$ for $i \neq j$):

NMF Average Energy

$$ \avg{E}_q = -\sum_{\langle i,j\rangle} J_{ij}\, m_i\, m_j - \sum_i h_i\, m_i\,. \tag{12} $$

This is the key simplification. Under the factorized ansatz, the two-point correlation $\avg{s_i s_j}_q$ factorizes into $m_i m_j$. This is the mathematical content of "replacing interactions by an effective mean field." It is exact within the variational distribution $q$ (the spins really are independent under $q$), but it makes $q$ a poor approximation to $p$ whenever the true correlations matter.

Putting it together

The variational free energy for the general pairwise model is:

NMF Variational Free Energy

$$ \Fq = -\sum_{\langle i,j\rangle} J_{ij}\, m_i m_j - \sum_i h_i m_i - k_B T \sum_{i=1}^N \mathcal{H}(m_i)\,. \tag{13} $$

Minimizing $\Fq$: the NMF self-consistency equations

We minimize $\Fq$ with respect to each $m_i$. Setting $\partial \Fq / \partial m_k = 0$:

Energy derivative: $$ \frac{\partial}{\partial m_k}\left(-\sum_{\langle i,j\rangle} J_{ij} m_i m_j - \sum_i h_i m_i\right) = -\sum_{j \in \partial k} J_{kj}\, m_j - h_k\,, $$ where $\partial k$ denotes the neighbors of site $k$.
Entropy derivative: $$ \frac{\partial}{\partial m_k}\bigl(-k_B T\, \mathcal{H}(m_k)\bigr) = -k_B T \cdot \frac{\partial \mathcal{H}}{\partial m_k} = -k_B T \cdot \left(-\frac{1}{2}\ln\frac{1+m_k}{2} + \frac{1}{2}\ln\frac{1-m_k}{2}\right)\,. $$ This simplifies using $\tanh^{-1}$: $$ \frac{\partial \mathcal{H}}{\partial m_k} = -\frac{1}{2}\ln\frac{1+m_k}{1-m_k} = -\tanh^{-1}(m_k)\,. $$ So the entropy contribution is $+k_B T\,\tanh^{-1}(m_k)$.
Setting to zero: $$ -\sum_{j\in\partial k} J_{kj}\, m_j - h_k + k_B T\,\tanh^{-1}(m_k) = 0\,. $$

Figure 1.2. The entropy derivative $\partial\mathcal{H}/\partial m = -\tanh^{-1}(m)$, which acts as the entropic "restoring force" in the NMF self-consistency equation. It diverges logarithmically as $m \to \pm 1$ (resisting full polarization) and has slope $-1$ at the origin (dashed red line). The NMF fixed point occurs where this entropic term balances the energetic driving term $\beta\sum_j J_{kj} m_j$.

Solving for $m_k$:

NMF Self-Consistency Equations

$$ \boxed{ m_k = \tanh\!\left(\beta\Bigl[\sum_{j\in\partial k} J_{kj}\, m_j + h_k\Bigr]\right) } \tag{14} $$

These are the naive mean field (NMF) self-consistency equations. There is one equation for each site $k$. The term in brackets is the effective field experienced by spin $k$:

h_k^{\mathrm{eff}} = \sum_{j\in\partial k} J_{kj}\, m_j + h_k\,. \tag{15}

Each spin sees its neighbors only through their mean magnetization $m_j$ — hence the name "mean field."

Comparison with the VAN framework. In the VAN paper, the NMF ansatz $q_\theta(s) = \prod_i q_i(s_i)$ is the simplest special case of the autoregressive ansatz $q_\theta(s) = \prod_i q_\theta(s_i | s_1,\ldots,s_{i-1})$. In NMF, each conditional $q_\theta(s_i | s_{<i})$ is independent of $s_{<i}$; it collapses to $q_i(s_i)$. The VAN paper's contribution is to parameterize the conditionals with neural networks, allowing correlations while retaining a tractable, normalized $q_\theta$.

31D Ising Application

Setup and energy function

Consider a ring of $N$ spins with nearest-neighbor ferromagnetic coupling $J > 0$ and no external field ($h = 0$). The energy is:

1D Ising Energy

$$ E(s) = -J \sum_{i=1}^{N} s_i\, s_{i+1}\,, \qquad s_{N+1} \equiv s_1 \quad \text{(periodic BCs)}\,. \tag{16} $$

Figure 1.3. 1D Ising chain ($N = 8$) with periodic boundary conditions: $s_9 \equiv s_1$.

Each site has exactly $z = 2$ neighbors (left and right). There are $N$ bonds in total (the periodic bond $s_N s_1$ gives the $N$-th bond).

NMF variational free energy for the 1D chain

We assume a uniform (translation-invariant) solution: $m_i = m$ for all $i$. This is natural given the periodic boundary conditions and the translation symmetry of the Hamiltonian. Then:

Average energy

From Eq. (12) with $J_{ij} = J$ for neighboring pairs and $h_i = 0$:

\avg{E}_q = -J \sum_{i=1}^{N} m_i\, m_{i+1} = -J \cdot N \cdot m^2\,. \tag{17}

(There are $N$ bonds, each contributing $-J\, m^2$.)

Entropy

From Eq. (11), the Shannon entropy is $\mathcal{H}_q = N\,\mathcal{H}(m)$, so the thermodynamic entropy is:

S_q = k_B\, N\, \mathcal{H}(m) = -k_B N\left[\frac{1+m}{2}\ln\frac{1+m}{2} + \frac{1-m}{2}\ln\frac{1-m}{2}\right]. \tag{18}

Variational free energy

Combining via $\Fq = \avg{E}_q - T S_q$, and dividing by $N$ to get the free energy per site $f_q \equiv \Fq / N$:

1D NMF Free Energy per Site

$$ f_q(m) = -J m^2 + k_B T\left[\frac{1+m}{2}\ln\frac{1+m}{2} + \frac{1-m}{2}\ln\frac{1-m}{2}\right]. \tag{19} $$

Structure of $f_q(m)$. The first term ($-Jm^2$) favors large $|m|$ — it is the energy gain from alignment. The second term (the entropy cost, with coefficient $+k_BT$) favors $m = 0$ — it is maximized when the spin is equally likely up and down. The competition between energy and entropy is the essence of the phase transition physics.

Deriving the NMF equations

We minimize $f_q(m)$ by setting $df_q/dm = 0$:

Differentiate: $$ \frac{df_q}{dm} = -2Jm + k_B T \cdot \frac{1}{2}\ln\frac{1+m}{1-m} = -2Jm + k_B T\,\tanh^{-1}(m)\,. $$
Set to zero and solve: $$ k_B T\,\tanh^{-1}(m) = 2Jm \qquad\Longrightarrow\qquad m = \tanh\!\left(\frac{2J}{k_B T}\, m\right) = \tanh(2\beta J\, m)\,. $$

This is the special case of the general NMF equation (14) for a uniform solution on the 1D chain. Each spin has $z = 2$ neighbors, each with magnetization $m$, so the effective field is $h^{\mathrm{eff}} = 2Jm$:

1D NMF Equation

$$ \boxed{m = \tanh(2\beta J\, m)} \tag{20} $$

Note on the coordination number. The factor of 2 in $2\beta J m$ comes from the coordination number $z = 2$ (each spin has 2 neighbors on the periodic chain). In general dimension $d$ on a hypercubic lattice, $z = 2d$, and the NMF equation becomes $m = \tanh(z \beta J m)$ in the uniform case.

Solving for the phase transition

Equation (20) is a transcendental equation for $m$. We solve it graphically/analytically.

Figure 1.4. Graphical solution of $m = \tanh(2\beta Jm)$: intersections of $y = m$ with $y = \tanh(2\beta Jm)$.

The graphical construction makes the bifurcation structure clear: below $T_c$, the sigmoid $\tanh(2\beta J m)$ is steeper than the diagonal at the origin, producing two additional fixed points at $\pm m_0(T)$. At $T = T_c$ the sigmoid is tangent to $y = m$ at the origin (slope $= 1$). Above $T_c$ only $m = 0$ survives.

$m = 0$ is always a solution (paramagnetic state).

Non-zero solutions exist when the slope of the RHS at $m = 0$ exceeds 1. Since $\frac{d}{dm}\tanh(2\beta J m)\big|_{m=0} = 2\beta J$, a bifurcation occurs when:

1D NMF Critical Temperature

$$ 2\beta_c J = 1 \qquad\Longrightarrow\qquad k_B T_c^{\text{NMF}} = 2J\,. \tag{21} $$

For $T < T_c^{\text{NMF}}$, there are two additional solutions $m = \pm m_0(T)$ with $m_0 > 0$, corresponding to a ferromagnetically ordered state. This is a continuous (second-order) phase transition within NMF.

Critical behavior near $T_c$

Expanding $\tanh(x) \approx x - x^3/3$ for small $m$ near $T_c$:

$$ m \approx 2\beta J\, m - \tfrac{1}{3}(2\beta J)^3 m^3 \qquad\Longrightarrow\qquad m^2 \approx \frac{3(2\beta J - 1)}{(2\beta J)^3}\,. $$
Near $T_c$, with $t \equiv (T_c - T)/T_c$ the reduced temperature: $$ m \sim |t|^{1/2}\,, $$ which gives the mean-field critical exponent $\beta_{\text{MF}} = 1/2$.

Comparison with the exact solution

NMF is qualitatively wrong in 1D! The 1D Ising model with short-range interactions has no phase transition at any finite temperature. This was shown by Ising himself (1925) and can be proved exactly using the transfer matrix method. The exact free energy per site is:

$$ f_{\text{exact}} = -k_B T \ln\!\left(2\cosh\beta J\right) + \text{(boundary corrections that vanish as } N\to\infty\text{)}\,. $$

The magnetization is $m = 0$ for all $T > 0$ (in the absence of an external field). The correlation length is $\xi = -1/\ln(\tanh \beta J)$, which is finite for all $T > 0$.

NMF predicts a spurious phase transition at $k_B T_c = 2J$. This is because the factorized ansatz completely ignores fluctuations, which in 1D are strong enough to destroy long-range order at any nonzero temperature.

Why does NMF fail in 1D? The factorized ansatz $q(s) = \prod_i q_i(s_i)$ sets all correlations to zero: $\avg{s_i s_j}_q - m^2 = 0$. But in 1D, thermal fluctuations create domain walls (boundaries between up and down regions) at a free-energy cost of only $2J$ per wall, offset by an entropy gain of $\sim T \ln N$ from placing the wall anywhere on the chain. Since the entropy always wins for $T > 0$, the chain disorders. NMF misses this because it does not see domain walls — those are inherently correlated structures.

This is a well-known limitation discussed in many textbooks. The VAN paper demonstrates that more expressive ansatze (autoregressive networks) can correctly capture such correlations and avoid the spurious transition.

Figure 1.5. Free energy per site: NMF (dashed) vs exact (solid) for the 1D Ising chain.

The NMF variational free energy $\mathcal{F}_q$ always lies above the exact free energy $F$ (it is less negative), as guaranteed by $D_{\text{KL}} \geq 0$. The two curves agree at low $T$ (where both approach $-J$, the ground state energy) and diverge most around $T \sim T_c^{\text{NMF}} = 2J$.

Where does the phase transition show up in $f_q$? Since the NMF transition is second-order, $f_q(T)$ itself is continuous and its first derivative $-\partial f_q/\partial T = S$ (the entropy) is also continuous at $T_c$. The transition is invisible in the free energy curve! It only appears in the second derivative: the specific heat $C = -T\,\partial^2 f_q / \partial T^2$ has a jump discontinuity at $T_c$. Above $T_c$ the NMF gives $f_q = -k_BT\ln 2$ (a straight line, zero specific heat), while just below $T_c$ the ordered solution bends the curve, giving a finite $C$. This specific-heat jump $\Delta C = \frac{3}{2}k_B$ per spin is the classic mean-field result.

By contrast, the exact 1D free energy $f = -k_BT\ln(2\cosh\beta J)$ is analytic for all $T > 0$ — smooth to all orders, with no transition at all.

42D Ising Application

Setup and energy function

Consider an $L \times L$ square lattice with $N = L^2$ spins and periodic boundary conditions (a torus). The energy is:

2D Ising Energy

$$ E(s) = -J \sum_{\langle i,j\rangle} s_i\, s_j\,, \tag{22} $$

where $\langle i,j \rangle$ runs over all nearest-neighbor pairs. On the square lattice, each site has coordination number $z = 4$. With periodic BCs, there are $N_{\text{bonds}} = 2N$ bonds in total (each site contributes 2 bonds: one to the right and one downward).

Figure 1.6. Square lattice: each site has $z = 4$ neighbors.

NMF variational free energy for the 2D lattice

Again assuming a uniform solution $m_i = m$ for all $i$ (which respects the lattice translation symmetry):

Average energy

\avg{E}_q = -J \sum_{\langle i,j\rangle} m_i m_j = -J \cdot 2N \cdot m^2\,. \tag{23}

(There are $2N$ bonds on the periodic square lattice, each contributing $-Jm^2$. Equivalently, each of the $N$ sites contributes $\frac{1}{2} z \cdot (-Jm^2) = -2Jm^2$ per site.)

Entropy

Exactly as before:

S_q = k_B\, N\, \mathcal{H}(m)\,. \tag{24}

Variational free energy per site

2D NMF Free Energy per Site

$$ f_q(m) = -2J m^2 + k_B T\left[\frac{1+m}{2}\ln\frac{1+m}{2} + \frac{1-m}{2}\ln\frac{1-m}{2}\right]. \tag{25} $$

Comparison with 1D. The only difference from the 1D case (Eq. 19) is the coefficient of $m^2$ in the energy term: $-Jm^2$ (1D, $z=2$) vs. $-2Jm^2$ (2D, $z=4$). In general, for coordination number $z$, the energy per site is $-\tfrac{z}{2}Jm^2$. The entropy per site is the same: $-k_B\mathcal{H}(m)$.

Deriving the NMF equations

Minimizing $f_q(m)$ proceeds identically to the 1D case.

Differentiate: $$ \frac{df_q}{dm} = -4Jm + k_BT\,\tanh^{-1}(m) = 0\,. $$
Solve: $$ m = \tanh\!\left(\frac{4J}{k_BT}\, m\right) = \tanh(4\beta J\, m)\,. $$

This is the general result (14) specialized to the 2D square lattice with $z = 4$ neighbors:

2D NMF Equation

$$ \boxed{m = \tanh(4\beta J\, m)} \tag{26} $$

General pattern. For any lattice with coordination number $z$ and uniform coupling $J$, the NMF equation in zero field is always:

$$ m = \tanh(z\beta J\, m)\,, $$

with $\beta = 1/(k_BT)$. This follows directly from Eq. (14): each site has $z$ neighbors, each contributing $Jm$ to the effective field, giving $h^{\mathrm{eff}} = zJm$. The NMF critical temperature is $k_BT_c = zJ$.

Lattice	$d$	$z$	NMF equation
1D chain	1	2	$m = \tanh(2\beta J m)$
2D square	2	4	$m = \tanh(4\beta J m)$
2D triangular	2	6	$m = \tanh(6\beta J m)$
3D cubic	3	6	$m = \tanh(6\beta J m)$
$d$-dim hypercubic	$d$	$2d$	$m = \tanh(2d\,\beta J m)$

Solving for the phase transition

The analysis is identical to Section 3 but with $z = 4$ instead of $z = 2$.

Bifurcation condition: $\frac{d}{dm}\tanh(4\beta J m)\big|_{m=0} = 4\beta J = 1$, giving:

2D NMF Critical Temperature

$$ \boxed{k_B T_c^{\text{NMF}} = 4J} \qquad\text{(2D square lattice, NMF)}\,. \tag{27} $$

Below $T_c$: Two non-trivial solutions $m = \pm m_0(T)$ appear continuously from $m = 0$, indicating a second-order (continuous) phase transition to a ferromagnetic state.

Critical exponents: Near $T_c$, expanding as before gives $m \sim |t|^{1/2}$ with $t = (T_c - T)/T_c$. All NMF critical exponents take the classical mean-field values regardless of lattice or dimension:

\beta_{\text{MF}} = \tfrac{1}{2}\,, \quad \gamma_{\text{MF}} = 1\,, \quad \delta_{\text{MF}} = 3\,, \quad \alpha_{\text{MF}} = 0\,. \tag{28}

Landau expansion of $f_q(m)$

For small $m$, we can expand the Shannon entropy to get a Landau-type free energy. Using $\mathcal{H}(m) = \ln 2 - \frac{m^2}{2} - \frac{m^4}{12} - \cdots$:

$$ f_q(m) = -2Jm^2 - k_BT\left(\ln 2 - \frac{m^2}{2} - \frac{m^4}{12} - \cdots\right) $$ $$ = \text{const} + \frac{1}{2}(k_BT - 4J)m^2 + \frac{k_BT}{12} m^4 + \cdots $$
$$ = f_0 + \frac{a}{2}\, m^2 + \frac{b}{4}\, m^4 + \cdots \tag{29} $$ with $a = k_BT - k_BT_c = k_B(T - T_c)$ where $k_BT_c = 4J$, and $b = k_BT/3 > 0$.

This is precisely the Landau free energy. For $a > 0$ ($T > T_c$), the minimum is at $m = 0$. For $a < 0$ ($T < T_c$), minima appear at $m = \pm\sqrt{-a/b} = \pm\sqrt{3(4J - k_BT)/(k_BT)}$. The transition is continuous with $m \sim |a|^{1/2}$.

Comparison with the exact Onsager solution

The 2D Ising model on the square lattice was solved exactly by Onsager (1944). The exact results are:

Critical temperature:

Onsager Exact Result $$ k_B T_c^{\text{exact}} = \frac{2J}{\ln(1+\sqrt{2})} \approx 2.269\, J\,. \tag{30} $$

Critical exponent for magnetization: $\beta_{\text{exact}} = 1/8$ (Yang, 1952).

NMF overestimates $T_c$ and gets the wrong exponents.

Quantity	NMF	Exact
$k_BT_c / J$	$4.000$	$2.269$
$\beta$ (magn. exponent)	$1/2$	$1/8$
$\gamma$ (suscept. exponent)	$1$	$7/4$

NMF overestimates $k_BT_c$ by a factor of $\sim 1.76$. This is because by ignoring correlations, NMF overestimates the tendency toward ordering (every spin "thinks" it has $z = 4$ perfectly aligned neighbors, rather than fluctuating ones). As the VAN paper demonstrates (Fig. 2a), more expressive variational ansatze dramatically reduce this error.

However, NMF is qualitatively correct in 2D: it correctly predicts the existence of a phase transition and the existence of spontaneous magnetization below $T_c$. This is a significant improvement over the 1D case, where NMF gives a qualitatively wrong prediction. The general rule is:

In 1D: NMF is qualitatively wrong (predicts a transition where none exists).
In 2D: NMF is qualitatively correct but quantitatively poor.
In $d \geq 4$ (the upper critical dimension): NMF becomes exact for the critical exponents.

References and Further Reading

[VAN] D. Wu, L. Wang, and P. Zhang, "Solving Statistical Mechanics Using Variational Autoregressive Networks," Phys. Rev. Lett. 122, 080602 (2019). arXiv:1809.10606. — The primary reference for the variational framework and notation used here.

[Pandey–Xu] M. Pandey and M. Xu, "Mean Field Theory and its application to deep learning" (2019). mimee.xyz/meanfield.pdf. — Appendix 8.2 gives a full derivation of the NMF equations from the factorized $q(\sigma) = \prod q_i(\sigma_i)$ ansatz via KL divergence minimization, applied to the Ising model.

[Arovas] D. Arovas, "Lecture Notes on Thermodynamics and Statistical Mechanics," Ch. 7: Mean Field Theory of Phase Transitions, UCSD Physics 210a. PDF. — Comprehensive treatment using the variational density matrix $\varrho_N = \prod_i \varrho_i(\sigma_i)$.

[Sakthivadivel] D. Sakthivadivel, "Magnetisation and Mean Field Theory in the Ising Model," SciPost Phys. Lect. Notes 35 (2022). PDF. — Careful pedagogical derivation via the Bogoliubov inequality.

[Sudderth] E. Sudderth, "Lecture 9A: Variational Methods and Naive Mean Field," Brown CS242 (2016). PDF. — Clean derivation of naive MF for general pairwise MRFs from the information-theoretic/KL perspective.

[Kappen–Wiegerinck] H. J. Kappen and W. Wiegerinck, "Mean Field Theory for Graphical Models," Ch. in Advanced Mean Field Methods, MIT Press (2001). Link. — Derives NMF and TAP equations from the KL divergence / Taylor expansion viewpoint.

[Wainwright–Jordan] M. J. Wainwright and M. I. Jordan, "Graphical Models, Exponential Families, and Variational Inference," Found. Trends Mach. Learn. 1(1–2), 1–305 (2008). — The definitive reference on variational inference for graphical models. Chapter 5 covers mean field methods.

[Huang] Y.-P. Huang, "Mean-field theory," Lecture notes, NTHU Statistical Mechanics (2022). Link. — Works through the variational MFT for the Ising model using a factorizable trial Hamiltonian.

[Onsager] L. Onsager, "Crystal Statistics. I. A Two-Dimensional Model with an Order-Disorder Transition," Phys. Rev. 65, 117 (1944). — The exact solution of the 2D Ising model.