Prerequisites: Multivariable calculus, basic probability (Bayes' rule, conditional distributions), some familiarity with statistical mechanics (partition function, Boltzmann distribution). No machine learning background assumed — the neural network connection is built from scratch.
Part I — Variational Free Energy & Mean Field Theory
01
Variational Free Energy & Naïve Mean Field for the Ising Model
The variational principle, KL divergence, the factorized ansatz, NMF self-consistency equations. Applications to the 1D Ising ring (spurious transition) and the 2D square lattice (comparison with Onsager's exact solution).
Sections 1–4 · 6 figures · Landau expansion · Critical exponents
Part II — From NMF to Autoregressive Networks
02
From NMF to Variational Autoregressive Networks
The autoregressive decomposition, Bernoulli conditionals with lower-triangular weights, the bias question (paper vs. notes conventions), why one layer captures pairwise correlations, worked N=3 example, and the VAN training loop.
Sections 5–10 · Bias discussion · Small-W expansion · REINFORCE gradient