Marcus N. Bannerman
[email protected]
Access the slides here:
www.marcusbannerman.co.uk/CCP5School
Ludwig Boltzmann, who spent much of his life studying statistical mechanics, died in 1906, by his own hand. Paul Ehrenfest, carrying on the work, died similarly in 1933. Now it is our turn to study statistical mechanics. Perhaps it will be wise to approach the subject cautiously.
(Opening lines of "States of Matter", by D.L. Goodstein).
Other excellent stat-mech texts include:
Two hours/lectures is nothing, I will hopelessly fail to teach you it all, so I'll give you the key touch points, and try my best to be different/interesting/intuitive.
Interrupt me, question me, tell me how to do it better! Anything is better than silence.
I'll put “loaded” words which I think could be
interesting to ask about
Statistical mechanics is our bridge from the microscopic
scale (simulating individual atoms/molecules) to the
“macroscopic” measurements we make in
experiments (i.e., density, temperature, heat capacity, pressure,
We will use Statistical Mechanics to connect to thermodynamics, and that is the focus here (statistical thermodynamics/equilibrium statistical mechanics), but it also is used to explore dynamics.
Thermodynamics was/is a
The power of thermodynamics arises from its ability to
find simple universal relationships between
First law: Energy is conserved \begin{align*} U_{system}+U_{surroundings}={\rm Constant} \end{align*} which also implies the differential relationship \begin{align*} {\rm d}U_{system}=-{\rm d}U_{surroundings} \end{align*}
Energy is conserved but can be transferred in many ways; however, heat
is
Second law: Entropy must increase ${\rm d}S_{universe}\ge0$ but the key relationship is actually ${\rm d}S \ge \partial Q/T$.
This becomes ${\rm d}S=\partial Q/T$ if the process is
We have now made the
\begin{align*}
{\rm d}U &= T\,{\rm d}S - \partial W
\\
&= T\,{\rm d}S - \sum_i {\color{red}F_i}\,{\color{teal}{\rm d}L_i}
\end{align*}
It relates its
${\color{red}F_i}$ | $p$ | $\mu_\alpha$ | $\gamma$ | $\tau$ | … |
${\color{teal}L_i}$ | $V$ | $-N_\alpha$ | $\Sigma$ | $\dot{\omega}$ | … |
For now, assume we want to calculate $U$ when controlling $S$, and $L_i$, but any rearrangement of this is possible (see implicit function theorem).
Taking some gas (B), then we always have entropy $T\,{\rm d}S$, but also have a volume/pressure work, $V\,{\rm d}p$, as well as mass/chemical-potential work, $N_i\,{\rm d}\mu_i$, as it moves through the boundary.
Adding a baloon skin means energy is stored in the force/tension, ${\rm d}\gamma$, of the stretched elastic surface, $\Sigma$ which must be included; however, ignoring diffusion through the skin means we can ignore the mass change (${\rm d}N=0$) and throw away this term!
The
\begin{align*} {\rm d}U &= T\,{\rm d}S - \partial W = T\,{\rm d}S - \sum_i F_i\,{\rm d}L_i \end{align*}
$F_i$ | $p$ | $\mu_\alpha$ | $\gamma$ | … |
$L_i$ | $V$ | $-N_\alpha$ | $\Sigma$ | … |
Amazingly, as each term is a conjugate pairing of an intensive term and a differential extensive term, this equation can be solved using Euler's solution for homogeneous functions (my notes here). \begin{align*} U(S,\,V,\,\left\{N_\alpha\right\},\ldots) &= T\,S -p\,V+\sum_\alpha \mu_\alpha\,N_\alpha + \ldots\\ \end{align*}
\begin{align*} U(S,\,V,\,\left\{N_\alpha\right\},\ldots) &= T\,S -p\,V+\sum_\alpha \mu_\alpha N_\alpha + \ldots \\ {\rm d}U &= T\,{\rm d}S -p\,{\rm d}V+\sum_\alpha \mu_\alpha\,{\rm d}N_\alpha + \ldots \end{align*} Comparing the equations we see lots of differential identities which will later help us connect thermodynmics to microscopics as we'll have the derivatives: \begin{align*} \left(\frac{{\rm d}U}{{\rm d}S}\right)_{V,\,\left\{N_\alpha\right\},\ldots} &= T & \left(\frac{{\rm d}U}{{\rm d}V}\right)_{S,\,\left\{N_\alpha\right\},\ldots} &= -p \\ \left(\frac{{\rm d}U}{{\rm d}N_\alpha}\right)_{S,\,V,\,\left\{N_\beta\neq\alpha\right\},\ldots} &= \mu & \ldots &= \ldots \end{align*}
In general, when we have a relationship between $U,\,S,\,V,\ldots$, then it is a generating function for the rest of the thermodynamic properties.
\begin{align*} U(S,\,V,\,\left\{N_\alpha\right\},\ldots) &= T\,S -p\,V+\sum_\alpha \mu_\alpha N_\alpha + \ldots \\ {\rm d}U &= T\,{\rm d}S -p\,{\rm d}V+\sum_\alpha \mu_\alpha\,{\rm d}N_\alpha + \ldots \end{align*} $U,\,S,\,V$ can be challenging variables to measure/control, so we want something more observable/controllable. The conjugate variables $T,\,T^{-1},\,p$ are nicer for experiments (maybe not simulation though).
A Legendre transformation defines a new potential that swaps the pair around as independent variable. I.e. Defining the enthalpy $H = U + p\,V$ leads to: \begin{align*} {\rm d}H &= {\rm d}U + V\,{\rm d}p +p\,{\rm d}V\\ &= T\,{\rm d}S +V{\rm d}p+\sum_\alpha \mu_\alpha\,{\rm d}N_\alpha + \ldots \\ H\left(S,\,p,\,\left\{N_\alpha\right\},\ldots\right) &= \ldots \end{align*}
\begin{align*} U\left(S,\,V,\,\left\{N_\alpha\right\}\right)&= T\,S - p\,V + \sum_\alpha \mu_\alpha\,N_\alpha & {\rm d}U &= T\,{\rm d}S - p\,{\rm d}V + \sum_\alpha\mu_{\alpha}\,{\rm d} N_{\alpha} \\ S\left(U,\,V,\,\left\{N_\alpha\right\}\right)&= \frac{U}{T} + \frac{p}{T}\,V - \sum_\alpha \frac{\mu_\alpha}{T}\,N_\alpha & {\rm d}S &= T^{-1}\,{\rm d}S + \frac{p}{T}{\rm d}V - \sum_\alpha\frac{\mu_{\alpha}}{T}{\rm d} N_{\alpha} \end{align*}
Legendre transforms \begin{align*} H\left(S,\,p,\,\left\{N_\alpha\right\}\right)&= U {\color{red} + p\,V} = T\,S + \sum_\alpha\mu_\alpha\,N_\alpha & {\rm d}H &= T\,{\rm d}S + V\,{\rm d}p + \sum_\alpha\mu_\alpha\,{\rm d} N_\alpha \\ A\left(T,\,V,\,\left\{N_\alpha\right\}\right)&= U {\color{red}- T\,S} = - p\,V + \sum_\alpha\mu_\alpha\,N_\alpha & {\rm d}A &= -S\,{\rm d}T - p\,{\rm d}V + \sum_\alpha\mu_{\alpha}\,{\rm d} N_{\alpha} \\ G\left(T,\,p,\,\left\{N_\alpha\right\}\right)&= H {\color{red}- T\,S} = \sum_\alpha\mu_\alpha\,N_\alpha & {\rm d}G &= -S\,{\rm d}T + V\,{\rm d}p + \sum_\alpha\mu_\alpha\,{\rm d} N_\alpha\\ \ldots & & & \ldots \end{align*}
Every time we add a new work term, there is a new Legendre transform, and thus a new free energy (homework, find a new work term and name the free energy after yourself).
We've now covered all of thermodynamics (!). We're ready for statistical mechanics now.
The simplest example I can think of that fits statistical mechanics AND thermodynamics is the game of craps (the sum of two dice).
We'll study this, and try to draw parallels to molecular systems. Whenever the link isn't clear, let me know!
Amazingly to me, we derive entropy first, then add conservation of energy to bring in temperature, then the rest of thermo follows.
In craps, players roll two dice and place bets on the outcome of the sum, which we'll call $U$.
We will later generalise to hyper-craps, where $N$ dice are rolled simultaneously, but for now we start with traditional craps where $N=2$.
The numbers on each individual dice at any point in the game/simulation are called the microstate variables and describe everything about the state of the game.
State variables are observables which must change IFF^{1} the system changes.
1: IFF = IF and only if.
The microstate of molecular systems are the atomic coordinates and velocities.
Players/observers of craps do not really care or measure the microstate. The only state that is observed in craps is the sum of the dice, $U$. The microstate could change, but if $U$ stays the same then as far as the players are concerned its the same roll.
The sum of the dice, $U$, is a macrostate variable (along with $N$). A macrostate can consist of an ensemble of many microstates.
For example, the state of a sum of 7 on the two dice is made of the six microstates, $\left\{\left[1,\,6\right],\,\left[2,\,5\right],\,\left[3,\,4\right],\ldots\right\}$ .
This is like a molecular system, where an observer sees a particular state (i.e. pressure, temperature, and mass) will have many possible microstates (molecular configurations), and they don't particularly care about the microstate.
Note: When we say state we typically mean the macrostate as its the one we can see experimentally/IRL, thus it's the one we interact with and actually care about.
This is intuitive for perfect dice, each possible roll is equally probable. Not intuitive at all for molecular systems; however, it works, and that is all we have time for.
Even though the microstates are equally probable, the macrostates have different probabilities due to the combinations of microstates that make up their ensemble.
For two dice, rolling a $U=7$
$\left\{\left[1,\,6\right],\,\left[2,\,5\right],\,\left[3,\,4\right],\,\left[4,\,3\right],\,\left[5,\,2\right],\,\left[6,\,1\right]\right\}$.
is six times more likely than a snake eyes ($U=2$)
$\left\{\left[1,\,1\right]\right\}$
What happens as we add more dice? Molecular systems have very large numbers of microstate variables $\left(\mathcal{O}\left(10^{26}\right)\right)$, so we need an intuition of what happens when we have so many dice....
Lets say we start with a roll of hyper-snake-eyes (all ones). Its density-of-states/combinations is the lowest possible value in this system of $\Omega(N,\, U=N)=1$. What happens if we start to reroll dice randomly one at a time?
The system moves towards states with higher $\Omega$, simply because they are more probable. This movement is gradual as we're rerolling small portions of dice, thus introducing quasi-dynamics.
Generally ${\rm d}\Omega \ge 0$..... just like entropy!
But entropy is additive, i.e. $2\times$ the system size gives $2\times$ the entropy. Can combinations match this property?
\begin{multline} \Omega\left(N=N_1+N_2,\,U=U_1+U_2\right) \\ = \Omega\left(N_1,\,U_1\right)\times\Omega\left(N_2,\,U_2\right) \end{multline}
\begin{multline} \ln\Omega\left(N=N_1+N_2,\,U=U_1+U_2\right) \\ = \ln\Omega\left(N_1,\,U_1\right) + \ln\Omega\left(N_2,\,U_2\right) \end{multline}
$S = \ln \Omega$ has all the properties of entropy! We have discovered it inside our game of craps, it is not too much of a leap to believe it is more fundamental and in molecular systems too.
Due to a historical accident, the actual relationship to
entropy is multiplied by the Boltzmann constant, $k_B$.
where $W=\Omega$, and $k.=k_B$.
We now intuitively understand entropy as some measure of probability, thus it increases as things move towards the most probable state... but only probably (!).
This has fascinating implications, i.e. see The Infinite Monkey Cage: Does Time Exist?" for one aspect.
The second law of thermodynamics, put simply, is that the most likely things will eventually happen and stay there.
Richard Feynman says in his Lectures on Physics “…
entropy is just the logarithm of the number of ways of
internally arranging a system while
Entropy is a measure of how much we don't know, i.e. it measures our lack of control-over/measurement-of the microstate.
Entropy is NOT disorder/chaos, its just that there's so many ways to make a mess...
Sometimes order is more likely, i.e., look at any crystal transition.
We have derived the key relationships of the micro-canonical ensemble, where the system is isolated (typically $N,\,V,\,E$ held constant in molecular systems): \begin{align*} P(N,\,U, \ldots) &= \frac{\Omega\left(N, U, \ldots\right)}{\sum_U \Omega\left(N,\,U, \ldots\right)} & S(N,\,U, \ldots) &= k_B \ln \Omega\left(N, U, \ldots\right) \end{align*}
Other thermodynamic properties can be generated from here, just like in normal thermodynamics \begin{align*} \left\langle U\right\rangle &= \sum_U U\,P(N,\,U, \ldots) & \mu &= - T\frac{\partial S(N,\,U, \ldots)}{\partial N} \end{align*}
We don't have pressure or any other work variable in this system, we'd need to have some "volume" and "pressure" that somehow contributes to the dice roll for that.
However, we do still have temperature, although it requires us to add the first law.
Lets hold the dice sum, $U$, fixed in ALL our dice rolls. If you haven't guessed yet from the variable name, we're adding conservation of energy.
Divide the $N$ dice into two groups, each with a separate $U_1$ and $U_2 = U - U_1$ which adds to the (fixed) total $U=U_1+U_2$.
The equilibrium value of $U_1$ occurs at the maxiumum entropy, where the total entropy is: \begin{align*} \ln \Omega_{1+2}(U_1+U_2) &= \ln \Omega_1(U_1) + \ln \Omega_2(U_2) \\ \ln \Omega_{1+2}(U,\,U_1) &= \ln \Omega_1(U_1) + \ln \Omega_2(U-U_1) \\ \end{align*} The maximum entropy when varying $U_1$ (but holding $U$ fixed) occurs at a stationary point, i.e. \begin{align*} \left(\frac{{\rm d} \ln \Omega_{1+2}(U,\,U_1)}{{\rm d} U_1}\right)_{N} = 0 \end{align*}
\begin{align*} \frac{{\rm d}}{{\rm d} U_1}\ln \Omega_{1+2}(U,\,U_1) &= 0\\ \frac{{\rm d}}{{\rm d} U_1}\ln \Omega_1(U_1) + \frac{{\rm d}}{{\rm d} U_1}\ln \Omega_2(U-U_1)&=0 \\ \frac{{\rm d}}{{\rm d} U_1}\ln \Omega_1(U_1) &= \frac{{\rm d}}{{\rm d} U_2}\ln \Omega_2(U_2) \end{align*}
We now define some shorthand: \begin{align*} \beta = \frac{{\partial} S\left(N,\,U,\,\ldots\right)}{{\partial} U} = \frac{{\partial} \ln \Omega\left(N,\,U,\,\ldots\right)}{{\partial} U} \end{align*} and note the result, $\beta_1 = \beta_2$ when two systems have fixed energy and are at maximum entropy/equilibrium.
Thus adding conservation of $U$ between two systems that can exchange ${\rm d}U$ gives us that $\beta$ must be equal between the systems at equilibrium. We call this “thermal” equilibrium if $U$ is energy, even though it works for conserved dice sum too.
If we look up the derivative $\left({\rm d}S/{\rm d}U\right)_{N,\ldots}$ in traditional thermodynamics, we realise that $\beta=1/(k_B\,T)$, and thus have proven that systems at thermal equilibrium must have the same temperature.
This generalises to three systems, and thus is proof of the zeroth law.
Lets now generate the probabilty function for when $U$ is conserved (and everything else).
Again, consider system 1 in a microstate, $i$, with energy $U_{1,i}$ in thermal equilibrium with another system 2 at constant $U$. The combinations available are set by system 2's degeneracy: \begin{align*} P\left(i\right) \propto \Omega_2(U_2=U-U_{1,i}) \end{align*}
Expanding system 2 entropy around $U$: \begin{align*} \ln \Omega_2(U_2=U-U_{1,i}) &= \ln\Omega_2(U_2=U) - U_{1,i} \frac{{\rm d} \ln\Omega_2(U)}{{\rm d}U} + \mathcal{O}(1/U) \\ &\approx \ln\Omega_2(U_2=U) - \frac{U_{1,i}}{k_B\,T} \end{align*}
This expansion truncates by assuming system 2 is very large so that changes in $U_1$ become negligble to it, giving $U_2\to U$, substituting, and realising that $\ln\Omega_2(U_2=U)$ is a constant, \begin{align*} P\left(i\right) \propto e^{-U_{1,i}/(k_B\,T)} \end{align*}
Each microstate of a system at thermal equilibrium (i.e. fixed $N,\,T,\ldots$ otherwise known as the canonical ensemble) has a probability proportional to the Boltzmann factor: \begin{align*} P\left(i\right) \propto e^{-U_{1,i} / (k_B\,T)} \end{align*}
We can normalise this by creating the so-called canonical partition function, \begin{align*} Z(N,\,T) &= \sum_{i} e^{-U_{1,i} / (k_B\,T)} \\ P\left(i\right) &= Z^{-1} e^{-U_{i} / (k_B\,T)} \end{align*} We've dropped the system 1 and 2 notations, as system 2 (called the “bath”) is irrelevant (despite its enormous size), aside from its temperature which is also the temperature of system 1 (at equilibrium).
\begin{align*} Z(N,\,T) &= \sum_{i} e^{-U_{1,i} / (k_B\,T)} & P\left(i\right) &= Z^{-1} e^{-U_{i} / (k_B\,T)} \end{align*} Like the relationship for entropy, $S=k_B\ln\Omega(N,\,U,\ldots)$, where its the log of the probability normalisation for $\left(N,\,U\right)$ ensembles, the partition function in $\left(N,\,T\right)$ ensembles is related to the Helmholtz free energy, $F$: \begin{align*} F(N,\,T,\ldots) &= k_B\,T\ln Z(N,\,T,\ldots) \end{align*}
And this is the generating function for all thermodynamic properties of this ensemble: \begin{align*} S &= -\frac{\partial F(N,\,T,\ldots)}{\partial T} & \mu &= \frac{\partial F(N,\,T,\ldots)}{\partial N}\\ F &= U-T\,S & \ldots \end{align*}
This applies to dice, but also molecules. But now we must leave dice behind and add additional work terms.
By using similar arguments of dividing an ensemble of systems, allowing them to exchange something, performing a Taylor series, and comparing against macroscopic thermodynamics we can generate other ensembles.
Isothermal-Isobaric ensemble $(N,\,T,\,p)$ \begin{align*} P\left(i,\,N,\,p,\,T,\ldots\right) &= \mathcal{Z}^{-1} e^{-U_{i}/\left(k_B\,T\right)-p\,V_i/\left(k_B\,T\right)} \\ G\left(N,\,p,\,T,\ldots\right) &= k_B\,T\ln \mathcal{Z}\left(N,\,p,\,T,\ldots\right) \end{align*}
Grand-canonical ensemble $(\mu,\,T,\,p)$ \begin{align*} P\left(i,\,\mu,\,p,\,T,\ldots\right) &= \mathcal{Z}^{-1} e^{-U_{i}/\left(k_B\,T\right)-p\,V_i/\left(k_B\,T\right)+\mu\,N_i/\left(k_B\,T\right)} \\ \boldsymbol{\Omega}\left(\mu,\,p,\,T,\ldots\right) &= k_B\,T\ln \mathcal{Z}\left(N,\,p,\,T,\ldots\right) \end{align*}
Calculating move probabilities is fundamental to Monte Carlo, thus these probabilities (and how to compute the partition functions) is core to that approach.
There are too many important derivations to cover in stat mech, and what you will need highly depends on your interests/research. We cannot do more here, but please ask if you have something in particular to understand in the problem sessions.
I will now try to give you some intuition on the other foundational assumption of stat mech (which I think is much harder to find).
God does not play dice. A. Einstein
God does play dice with the universe. All the evidence points to him being an inveterate gambler, who throws the dice on every possible occasion. S. Hawking
You are the omnipotent being of your simulation, so you get to choose: \begin{align*} \langle{\mathcal A}\rangle &= \sum_{\boldsymbol\Gamma} d{\boldsymbol\Gamma}\,P({\boldsymbol\Gamma})\, {\mathcal A}({\boldsymbol\Gamma}) & \text{(Monte Carlo)}\\ &\text{or}\\ \langle{\mathcal A}\rangle&=\lim_{t_{obs.}\to\infty} \frac{1}{t_{obs.}}\int_0^{t_{obs.}}\mathcal A(t)\,{\rm d}t & \text{(Molecular Dynamics)} \end{align*}
But are these equivalent? If so, the system is Ergodic.
Consider the Simple Harmonic Oscillator:
The microstate variables, in this case position, $r$, and velocity, $v=\dot{r}$, can be collected together to form a vector, $\boldsymbol \Gamma$, which exists in phase space.
The microcanonical SHO system visits all regions of phase space pretty quickly, it is therefore Ergodic. Stat-Mech and (molecular) dynamics averages are equivalent.
Its not surprising that it forms a circle in phase space, as the total energy (called the Hamiltonian), is the equation of a circle in $r$ and $v$. \begin{align*} U = \mathcal{H}(\boldsymbol{\Gamma}) = \frac{1}{2}k\,r^2 + \frac{1}{2}m\,v^2 \end{align*} where $k$ is the spring constant and $m$ is the mass.
What is all of phase space like? We can sample all energies, $U$, and plot the "flow" of trajectories:
The systems flow around phase space, the trajectory lines do not cross, thus this behaves exactly as a incompressible fluid.
Whatever ensemble of starting states we begin with, there's a symmetry to its motion through phase space. If evenly distributed, it will remain evenly distributed.
We can treat an ensemble of systems flowing around phase space as an incompressible fluid with density $\rho(\boldsymbol{\Gamma})$, thus it behaves as follows. \begin{align*} \nabla_{\boldsymbol{\Gamma}} \cdot \dot{\boldsymbol\Gamma}\,\rho({\boldsymbol\Gamma}) = \sum_i\left(\dot{\boldsymbol{r}}_i\cdot \frac{\partial}{\partial \boldsymbol{r}_i} + \dot{\boldsymbol{v}}_i \cdot \frac{\partial}{\partial \boldsymbol{v}_i}\right)\rho\left(\left\{\boldsymbol{r}_i,\,\boldsymbol{v}_i\right\}^N_i\right) = 0 \end{align*} which should be familiar to you as a simplification of the continuity equation from any fluid flow class you might have had: \begin{align*} \frac{\partial \rho}{\partial t} +\nabla \cdot \rho\,\boldsymbol{v} =0 \end{align*}
This is the foundation of kinetic theory, as from here comes the BBGKY heirarchy and ultimately Boltzmann's/Enskog's equation.
Back to ergodicity.
The SHO system has a parabolic potential energy: \begin{align*} U(\boldsymbol{\Gamma}) = \mathcal{H}(r,\,v) = \frac{1}{2}k\,r^2 + \frac{1}{2}m\,v^2 \end{align*}
We can make a double-well potential simply by shifting the spring rest position away from the origin: \begin{align*} U(\boldsymbol{\Gamma}) = \mathcal{H}(r,\,v) = \frac{1}{2}k\,\left(\left|r\right|-r_0\right)^2 + \frac{1}{2}m\,v^2 \end{align*} This potential is not a particularly nice one to integrate as it is discontinuous at $r=0$. Even though I use velocity verlet (which is symplectic) as the integrator in the next example, you can still see the system drifting in energy due to where/when the time step crosses the $r=0$ line. But the demonstration is quite beautiful so I claim artistic licence, whatever that means.
Are the systems in the lower energies (closer to the centre of the orbits) ergodic? No, they cannot visit each other even though they're at the same energy level (thus at the same state)!
Microstates in $U=0.25$ are energetically isolated from each other, thus its not ergodic; however, a lot of properties (i.e. average velocity) will still work out!
Phase space is so unbelievably big even for small systems it actually “feels” impossible for any trajectory to meaningfully visit all microstates (or come close enough to do so) in finite time.
Practically, in simulation we typically just check
averages are converging and hope we're
There are many techniques for accelerating dynamics past energy barriers and “encourage” good sampling of phase space which you will learn about in your MC lectures. These are also applied more frequently to MD simulations too.
The ideas of statistical mechanics are simple, yet the implications and relationships it provides are deep and meaningful.
My dice example should make you believe that its applications are wider than just molecular systems. Isaac Asimov even explored the statistical mechanics of galactic societies in his excellent (but a little dated) Foundation Series. Who knows where it will turn up next.
While much of the derivations covered here are “ancient”, there's still intense research into the field, and many more surprising results to find.
I hope you're keen to find out more, and will look at the statistical mechanics underlying all your upcoming lectures as a mountain to climb up, rather than a chasm to fall in.
God does not play dice. A. Einstein
God does play dice with the universe. All the evidence points to him being an inveterate gambler, who throws the dice on every possible occasion. S. Hawking
You are the omnipotent being of your simulation, so you get to choose: \begin{align*} \langle{\mathcal A}\rangle &= \int d{\boldsymbol\Gamma}\,{\mathcal P}({\boldsymbol\Gamma})\, {\mathcal A}({\boldsymbol\Gamma}) & \text{(Monte Carlo)}\\ &\text{or}\\ \langle{\mathcal A}\rangle&=\lim_{t_{obs.}\to\infty} \frac{1}{t_{obs.}}\int_0^{t_{obs.}}\mathcal A(t)\,{\rm d}t & \text{(Molecular Dynamics)} \end{align*}
Imagine simulating roulette to evaluate betting strategies on your balance $\mathcal{A}$. You could evaluate them over time $t$ by simulating the motion of the ball as it visits the different numbers/states of the wheel, \begin{align*} \langle{\mathcal A}\rangle&=\lim_{t_{obs.}\to\infty} \frac{1}{t_{obs.}}\int_0^{t_{obs.}}\mathcal A(t)\,{\rm d}t & \text{(MD)}. \end{align*}
Or you can directly sample the numbers/states ${\boldsymbol\Gamma}$ as you know their probability distribution beforehand, \begin{align*} \langle{\mathcal A}\rangle &= \int d{\boldsymbol\Gamma}\,{\mathcal P}({\boldsymbol\Gamma})\, {\mathcal A}({\boldsymbol\Gamma}) & \text{(MC)}. \end{align*}
The system must be ergodic and you must know the (relative) probability of the states for these averages to be equal. Suprisingly, this is often the case for molecular systems (see stat. mech. lectures).
Molecular dynamics has a very large number of states, we cannot visit them all.
Stanisław Marcin Ulam (re)invented the Monte Carlo method while working on the nuclear weapons project in 1946. The code name came from the Monte Carlo Casino where Ulam's uncle gambled, and the idea came from considering the large number of states in Canfield solitaire.
The ergodic idea is that we can visit "enough" to get a good approximation of the observables $\mathcal{A}$. My lectures will cover the basics of Monte Carlo integration, and its application to molecular systems.
Monte Carlo is an extremely bad method; it should be used only when all alternative methods are worse.
A. D. Sokal
Calculating $\pi=\int^1_{-1}\int^1_{-1} \theta(1-x^2-y^2){\rm d}x\,{\rm d}y$ by randomly dropping points in a square and testing if they fall in the circle (fraction should be $\pi/4$).
Rerunning the simulation yields different scatter, but similar rate of convergence.
The average of $N$ normally distributed random numbers, centered about $1$, with standard deviation of $\sigma_f$ also gives the same sort of convergence!
Distribution of results as a function of $N$, the
number of random points used for the simple integral
of $sin(x)$.
Note the convergence and evolution of a peaked
distribution…
Consider a set of $N$ independent variables $x_1$, $x_2$,...,$x_N$ which all follow the same probability distribution function $P(x)$, with mean, $\mu$, and variance, $\sigma$.
How is the mean $\bar{x}=\frac{1}{N}\sum_{k=1}^N x_k$ of these variables distributed?
This is given by the central limit theorem, which becomes increasingly accurate as $N\to\infty$ \begin{align*} p_{\bar{x}}(\bar{x}) &\approx (2\pi\sigma_{\bar{x}}^2)^{-1/2} \exp\left(-\frac{(\bar{x}-\mu)^2}{2\sigma_{\bar{x}}^2}\right) \\ \sigma_{\bar{x}} &\approx \frac{\sigma}{\sqrt{N}} \end{align*}
All our results follow this so far, and it applies to most processes, physics has never been so easy…
The uniform distribution (black), averaged over multiple ($N$) samples, tends towards a peaked gaussian distribution.
This is also true for the exponential distribution (black), and many others.
How quickly does the average of central-limit
processes (like Monte Carlo) converge to the correct answer?
Central limit theorem:
\begin{align*}
\sigma_{\left\langle f\right\rangle} &= \frac{\sigma_f}{\sqrt{N}}
&
\sigma_{f} &= \left\langle \left(f(x)\right)^2\right\rangle - \left\langle f(x)\right\rangle^2
\end{align*}
This "slow" $N^{-1/2}$ convergence is unavoidable! But we can reduce $\sigma_f$ by a factor of $10^6$ or more using importance sampling (more in a moment).
However, the dependence of the convergence with sample size is independent of the dimensionality of the integral, so convergence is independent of dimensionality (molecular systems typically have at least $d\,N$ dimensions!).
The position and velocity of every atom/molecule form a $3N$-dimensional phase space, so we need multidimensional MC. \begin{equation*} I = \int_{V} f({\bf x}) {\rm d}{\bf x} \end{equation*}
Factor of $2/3$ in $\mathcal{P}$ is to normalise the distribution over $[0,1]$.
Curves represent the distribution of results for the previous integral when MC using uniform (unfilled) and importance (filled) sampling.
We need to choose the transition matrix to ensure that the stationary distribution
von Neumann is simply trying to caution you against believing that random number generators (RNG) are actually random.
But GPU graphics programmers really do use a state of $\sin$! \begin{align*} x_i = {\rm fract}(43758.5453\sin(12.9898\,i)) \end{align*}
When performing Monte Carlo, you must make sure your randomness is sufficiently random.
Using lists of "truly random" random numbers [provided by experiment via punch cards] was extremely slow, but von Neumann developed a way to calculate pseudorandom numbers using the middle-square method. Though this method has been criticized as crude, von Neumann was aware of this: he justified it as being faster than any other method at his disposal, and also noted that when it went awry it did so obviously, unlike methods that could be subtly incorrect.Taken from the MC wikipedia page
Subtle errors are easy to make. We must not violate detailed balance (or do so very very carefully), and must have good tests to ensure the correct ensemble is simulated.
Boltzmann distribution:
$
{\mathcal P}({\boldsymbol\Gamma}) \propto e^{-\beta\,H({\boldsymbol\Gamma})}
$
Monte Carlo integration
\begin{align*}
\langle {\mathcal A} \rangle
&= \int d{\boldsymbol\Gamma}\, {\mathcal P}({\boldsymbol\Gamma}) {\mathcal A}({\boldsymbol\Gamma})
\approx \frac{1}{{\mathcal N}}
\sum_{k=1}^{\mathcal N} {\mathcal A}({\boldsymbol\Gamma}_k)
\end{align*}
(provided ${\boldsymbol\Gamma}_k$ is generated from ${\mathcal P}({\boldsymbol\Gamma})$)
Transition probability: \begin{equation*} W({\boldsymbol\Gamma}_n\leftarrow{\boldsymbol\Gamma}_o) = g({\boldsymbol\Gamma}_n\leftarrow{\boldsymbol\Gamma}_o) A({\boldsymbol\Gamma}_n\leftarrow{\boldsymbol\Gamma}_o) \end{equation*} where \begin{align*} g({\boldsymbol\Gamma}_n\leftarrow{\boldsymbol\Gamma}_o) &= \mbox{random particle move} \\ A ({\boldsymbol\Gamma}_n\leftarrow{\boldsymbol\Gamma}_o) &= \left\{ \begin{array}{ll} 1 & \mbox{if $H({\boldsymbol\Gamma}_n) \lt H({\boldsymbol\Gamma}_o)$} \\ e^{-\beta(H({\boldsymbol\Gamma}')-H({\boldsymbol\Gamma}))} & \mbox{if $H({\boldsymbol\Gamma}_n) \gt H({\boldsymbol\Gamma}_o)$} \end{array} \right. \end{align*}
We must be careful to ensure detailed balance, when moving a particle it must be able to return to its initial starting point, so spheres and cubes are fine, but not triangular test regions!
In this simulation, the intermediate maximum trial move distance results in the fastest decay (i.e., the fastest sampling of phase space) per move.
\begin{align*} u_{\rm LJ}(r) &= 4\varepsilon\left[\left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6}\right] & u(r) &= \left\{ \begin{array}{ll} u_{\rm LJ}(r) - u_{\rm LJ}(r_c) & \mbox{for $r\lt r_c$} \\ 0 & \mbox{for $r\gt r_c$} \\ \end{array} \right. \end{align*} where $r_c=2.5$ for all that follows.
Observing properties over the duration of the simulation can give confidence we have reached equilibrium (or become stuck...).
Regardless of the ensemble (how we specify the thermodynamic state), at the same state the average properties are nearly identical. There are many fluctuations, even in the specified variables, do we have information on nearby states? (tune in to MC3 for more)
Thanks to Dr. Leo Lue (Strathclyde) for his excellent notes which this course is largely based on, also thanks to his predecessor, Dr Philip Camp (Edinburgh), where some slides originated from.
Thank you to you for your attention.
Example: LJ fluid $\rho\sigma^3=0.5$. Solid lines are simulations, dashed lines are extrapolated histograms from $k_B\,T=2\varepsilon$.
Example: LJ fluid $\rho\sigma^3=0.5$. The extrapolated histograms allow evaluation of properties at any/all temperatures (within sampled states, above is at $k_B\,T=2\varepsilon$).
Extrapolation of the virial (i.e., excess) contribution to the pressure.
\begin{align*} U(x) &= \left[\sin\frac{\pi x}{2} \sin 5\pi x\right]^2 \\ p(x,\beta) &\propto e^{-\beta U} \end{align*}
Ergodic hypothesis: A system with fixed $N$, $V$, and $E$ ($U$) is equally likely to be found in any of its $\Omega(N,V,E)$ microscopic states.
Consider two subsets of states, $\Omega_A$ and $\Omega_B$. A system is more likely to be found in set A if $\Omega_A>\Omega_B$.
Therefore $S(\Omega_A)>S(\Omega_B)$ as A is more likely, thus $S$ must be a monotonically increasing function in $\Omega$.
As states increase multiplicatively, yet entropy is additive, the relationship must be logarithmic. \begin{equation*} S(N,V,E) = k_B \ln \Omega(N,V,E) \end{equation*} where $k_B=1.3806503\times10^{-23}$ J K$^{-1}$ is the Boltzmann constant, present for historic reasons.
\begin{equation*} {\mathcal P}(E) \propto \Omega(E;E_{\rm tot}) \qquad \qquad \Omega(E;E_{\rm tot}) = \Omega(E) \Omega_{\rm surr.} (E_{\rm tot}-E) \end{equation*} (Note: the $N$ and $V$ variables for the system and the surroundings are implicit)
\begin{align*} {\mathcal P}(E,E_{\rm tot}) &\propto \Omega(E) \Omega_{\rm surr.}(E_{\rm tot}-E) \\ &\propto \Omega(E) e^{\ln \Omega_{surr.}(E_{\rm tot}-E)} \end{align*}
Performing a taylor expansion of $\ln\Omega_{\rm surr.}$ around $E_{\rm tot}$. \begin{align*} {\mathcal P}(E,E_{\rm tot}) &\propto \Omega(E) e^{\ln \Omega_B(E_{\rm tot}) - E \frac{\partial}{\partial E_{\rm tot}}\ln \Omega_B(E_{\rm tot}) + \cdots} \end{align*}
First-order is fine as $E\ll E_{tot}$. Note that $\partial \ln \Omega(E) / \partial E=\beta$.
At equilibrium the surroundings and system have the same temperature thus:
\begin{align*}
{\mathcal P}(E,E_{\rm tot}) &\propto \Omega(E) e^{\ln \Omega_B(E_{\rm
tot}) - \beta E} \end{align*}
\begin{align*} {\mathcal P}(E) &\propto
\Omega(E) e^{- \beta E} \end{align*}
(If the surroundings are large, $E_{\rm tot}$ is unimportant and the constant term $\ln \Omega_B(E_{\rm tot})$ cancels on normalisation)