Ising models and neural networks

This is the html version of the file http://dissertations.ub.rug.nl/FILES/faculties/science/2005/h.g.schapp/thesis.pdf.
G o o g l e automatically generates html versions of documents as we crawl the web.

Google is neither affiliated with the authors of this page nor responsible for its content.

Page 1

H.G. Schaap

Page 2

The work described in this thesis was performed at the Center for Theoretical Physics

in Groningen, with support from the “Stichting voor Fundamenteel Onderzoek der Ma-

terie” (FOM).

Printed by Universal Press - Science Publishers / Veenendaal, The Netherlands.

Cover design by Karlijn Hut.

Page 3

Rijksuniversiteit Groningen

Ising models and neural networks

Proefschrift

ter verkrijging van het doctoraat in de

Wiskunde en Natuurwetenschappen

aan de Rijksuniversiteit Groningen

op gezag van de

Rector Magnificus, dr. F. Zwarts,

in het openbaar te verdedigen op

maandag 23 mei 2005

om 16.15 uur

door

Hendrikjan Gerrit Schaap

geboren op 7 mei 1977

te Emmen

Page 4

Promotores:

Prof. dr. A. C. D. van Enter

Prof. dr. M. Winnink

Beoordelingscommissie: Prof. dr. A. Bovier

Prof. dr. H. W. Broer

Prof. dr. W. Th. F. Den Hollander

ISBN: 90-367-2260-8

Page 5

Contents

1 Introduction

2 General overview

2.1 Gibbs measures: Ising model . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.1 The Ising model . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.2 Thermodynamical limit . . . . . . . . . . . . . . . . . . . . . . 23

2.1.3 Some choices of boundary conditions . . . . . . . . . . . . . . 24

2.2 Spin glasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.1 Ising spin glasses . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.2 Mean field: SK-model . . . . . . . . . . . . . . . . . . . . . . 31

2.3 Hopfield model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.2 Dynamics and ground states . . . . . . . . . . . . . . . . . . . 32

2.3.3 System-size-dependent patterns . . . . . . . . . . . . . . . . . 35

2.3.4 Some generalizations . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Scenarios for the spin glass . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.1 Droplet-picture short-range spin-glasses . . . . . . . . . . . . . 37

2.4.2 Parisi’s Replica Symmetry breaking picture . . . . . . . . . . . 39

2.4.3 Chaotic Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.5 Metastates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 Gaussian Potts-Hopfield model

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2 Notations and definitions . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Ground states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.1 Ground states for 1 pattern . . . . . . . . . . . . . . . . . . . . 46

3.3.2 Ground states for 2 patterns . . . . . . . . . . . . . . . . . . . 47

3.4 Positive temperatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Page 6

3.4.1 Fixed-point mean-field equations . . . . . . . . . . . . . . . . 50

3.4.2 Induced measure on order parameters . . . . . . . . . . . . . . 51

3.4.3 Radius of the circles labeling the Gibbs states . . . . . . . . . . 53

3.5 Stochastic symmetry breaking for q = 3 . . . . . . . . . . . . . . . . . 54

4 The 2d Ising model with random boundary conditions

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.4 Geometrical representation of the model . . . . . . . . . . . . . . . . . 66

4.5 Cluster expansion of balanced contours . . . . . . . . . . . . . . . . . 73

4.6 Absence of large boundary contours . . . . . . . . . . . . . . . . . . . 77

4.7 Classification of unbalanced contours . . . . . . . . . . . . . . . . . . 79

4.8 Sequential expansion of unbalanced contours . . . . . . . . . . . . . . 82

4.8.1 Renormalization of contour weights . . . . . . . . . . . . . . . 84

4.8.2 Cluster expansion of the interaction between n-aggregates . . . 85

4.8.3 Expansion of corner aggregates . . . . . . . . . . . . . . . . . 87

4.8.4 Estimates on the aggregate partition functions . . . . . . . . . . 89

4.9 Asymptotic triviality of the constrained Gibbs measure ν

. . . . . . . 89

4.10 Random free energy difference . . . . . . . . . . . . . . . . . . . . . . 91

4.11 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.11.1 Proof of Proposition 4.20 . . . . . . . . . . . . . . . . . . . . . 95

4.11.2 Proof of Proposition 4.24 . . . . . . . . . . . . . . . . . . . . . 98

4.11.3 Proof of Proposition 4.25 . . . . . . . . . . . . . . . . . . . . . 103

4.11.4 Proof of Lemma 4.26 . . . . . . . . . . . . . . . . . . . . . . . 103

4.11.5 Proof of Lemma 4.28 . . . . . . . . . . . . . . . . . . . . . . . 105

4.12 High field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.12.1 Contour representation . . . . . . . . . . . . . . . . . . . . . . 108

4.12.2 Partitioning contour families . . . . . . . . . . . . . . . . . . . 110

4.13 Concluding remarks and some open questions . . . . . . . . . . . . . . 114

4.14 Appendix on cluster models . . . . . . . . . . . . . . . . . . . . . . . 116

4.15 Appendix on interpolating local limit theorem . . . . . . . . . . . . . . 120

A Cluster expansions for Ising-type models

123

A.1 High temperature results . . . . . . . . . . . . . . . . . . . . . . . . . 123

A.1.1 1D Ising model by Mayer expansion . . . . . . . . . . . . . . . 123

A.1.2 Uniqueness of Gibbs measure for high temperature or d=1 . . 124

A.1.3 High temperature polymer expansions . . . . . . . . . . . . . . 126

A.2 Low temperature expansions of 2D Ising model . . . . . . . . . . . . . 129

Page 7

A.2.1 Upper bound on the pressure by cluster expansion . . . . . . . 129

A.2.2 Site percolation . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A.3 Nature of the clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

A.4 Multi-scale expansion for random systems . . . . . . . . . . . . . . . . 136

Publications

141

Bibliography

143

Samenvatting

149

Dankwoord

157

Page 8

Page 9

Chapter 1

Introduction

Imagine a river, continuously flowing on, in a constant way running its course; the

water in the river always tends to flow in the direction with the minimal resistance

possible. Suppose during heavy rain the water in the river becomes higher and higher.

Then, suddenly, the water becomes so high that the river overflows its banks, allowing

the water to flow in the nearby meadows. Then it takes some time before the river is

adjusted to this new situation.

Statistical physics is an attempt for modelling natural processes. It tries to connect

the microscopic properties with the macroscopic properties. For the river the movement

of the water molecules is connected to the overall flow. The global properties of the river

are given by some universal laws for some characteristics, notwithstanding the huge

number of involved water molecules. Because of this huge number we have to apply

probabilistic methods (stochastics) on the underlying microscopic differential equations

defining the movement of all of the water molecules. Then if we look at the macroscopic

properties, we know almost for certain its global behavior. This global behavior can

be described by equations only depending on macroscopic properties. The underlying

microscopic part is removed by the performed stochastics.

Let us take a die to demonstrate some of the involved stochastic principles. As we

know, we have the same probability to throw a 1 or a 6. However, experience tells us that

after a small number of throws, the resulting relative frequencies of individual numbers

can significantly differ from each other. Only when we throw a die a large number of

times, the resulting relative frequencies of the numbers become more and more equal.

The same result can be arranged when we throw not one die many times, but rather

when we throw a lot of dice at once and then look at the relative number frequencies.

Obviously throwing one die 1000 times is equivalent to throwing 1000 dice one time.

Eventually all of the relative frequencies approach

: on average every number appears

Page 10

once if we throw a die six times. This relative frequency of a number is sometimes

identified as the probability of the number to appear. To shorten notation it is denoted

by P(i). For every number i on our die, P(i) =

. The function which assigns to

each number i the corresponding probability, we call the probability distribution of the

property.

Suppose we have two dice. Then the combined probability equals P(n

), where

represents the numbers on the two dice. For instance P(1,1), the probability of

throwing with both dice a 1 equals

. Note that the throw of one die does not

depend on the outcome of the throw of the other die. We say that the outcome of die

1 is independent of the outcome of die 2. When two properties are dependent on each

other the expression for the combined probability is in general more complicated.

All matter around us is made out of atoms. Every gram of matter contains around

atoms. Often one considers a collection of some global (bulk) properties in addi-

tion to the atomic properties. In the description of matter, all the atoms together with

the mentioned global properties define the system. Any particular realization of the

corresponding atomic values is called a configuration of the system. Determining the

configuration resembles the throwing of 10

dice at once.

Suppose one wants to measure a macroscopic property, for instance the average

density. Because of the large number of atoms, in the probability distribution of the

atomic values there is no need of the possibility of tracking the locations of the single

atoms. In case of the river: during the heavy rainfall, the river has a way of flowing

which does change in time. After some adjustments are made, the changes do stop;

the river flow becomes stationary. The time scale of the adjustments is extremely large

compared to the scale of the local movements of the water molecules. The way in which

the system properties are evolving we call the dynamics of the system. In the global flow

of the river the microscopic movements of the water molecules have been averaged out.

In the next chapter we give a general overview of the part of statistical physics

which is important for the two particular kind of models we study in this thesis: neural

networks and Ising models.

Neural networks

The first subject of the thesis is about a model originating in the theory of neural net-

works. In particular we like to understand the concept of memory. Our brain is built up

out of billions of neurons connected in a highly non-trivial way. This structure we call

a neural network. It is difficult to study it directly, because of the huge number of neu-

rons involved in a relatively small area. In order to understand how the memory works,

a common approach is to build a simpler model which captures its main features. Just

as the neural network of the brain, the model should be sufficiently robust: in transmit-

Page 11

Figure 1.1: Components of a neuron

(taken from [19])

ting signals between neurons there are always some small errors involved. Given this

slightly deformed signal, the brain is able to remove the noise and to reconstruct the

pure signal. For a good general overview of neural networks see [19].

A neuron is build up of three parts: the cell body, the dendrites and the axon, see

Figure 1.1. The dendrites have a tree-like branched structure and are connected to the

cell body. The axon is the only outgoing connection from the neuron. At the end of the

axon it branches and it is connected to the dendrites of other neurons via synapses. The

end of any branch of the axon is separated from a dendrite by a space called the synaptic

gap.

Neurons communicate with other neurons via electric signals. The electric signal of

a neuron i transfers to a neuron j in the following way, see Figure 1.2. First it travels

from the cell body of neuron i into the axon which is connected to neuron j. This is the

output signal of neuron i. When the signal of the neuron arrives at the end of the axon

it transmits neurotransmitters into the synaptic gap. Then by receptors on the dendrite

of neuron j the neurotransmitters are transformed back to an electric signal. There are

several types of neurotransmitters. Some of transmitters amplify the incoming signal

before transmitting it to the dendrites of other neurons, whereas others weaken it.

This resulting signal originating from the receptors of neuron j we call the input

signal from neuron i to neuron j. Finally the signal arrives at the cell body of neuron j.

In the cell body of neuron j all the inputs come together. The cell processes the

inputs (as we will model mathematically by performing a weighted sum), what we call

the total input h

of neuron j. Then, depending on the outcome, the cell produces a new

signal which is transported to the axon of the neuron j in order to be transferred to other

neurons. This is called the output or the condition of neuron j.

For making a useful model based on these neural processes we need to make some

Page 12

Figure 1.2: The synapse

(taken from [19])

simplifications. As a first simplification we assume that every neuron interacts with

every other neuron. We say that the neural network is fully connected. Further we

assume that each neuron can have only two possible outputs, i.e. it can only be in two

conditions. For reference we denote by σ

the condition of neuron i: σ

= +1 if it is

excited and σ

= −1, when it is at rest.

We also assume that no alteration of the signal takes place when it travels across a

synaptic gap. As result the input to neuron j which comes from neuron i is equal to the

output σ

from neuron i which is send to neuron j.

For modelling the dynamics of our model we introduce the time t. At every time step

∆t (with ∆ very small) every neuron output is changed simultaneously. The processing

of the cell body of every neuron j we model by two steps:

1. At time t we multiply every input σ

(t) coming from the other neurons with a

weight. To obtain the total input h

(t) at time t we sum the result over all of the

neurons (except neuron j).

2. For the output σ

(t + ∆t) of neuron j at time t + ∆t we take the outcome of a

probability distribution over the two possible neuron conditions. This distribution

is formed by a stochastic rule on h

(t).

We assume that the connections are treated by the neuron cell bodies in a symmet-

rical way: the weight given in neuron j to the input of neuron i is equal to the weight

given in neuron i to the input of neuron j. In realistic neural networks in general this

interaction symmetry does not hold.

The dynamics of our model is summarized by Figure 1.3. The stable configurations

under this dynamics form the memory of the system. Stability means that starting from

a stable configuration the system only reaches configurations which are very much alike

Page 13

output of all of the

neurons at time t

→

total input h

(t):

weighted sum of

all of the outputs

→

output of neuron j at time t + ∆t:

the outcome of stochastic rule on h

(t)

Figure 1.3: Dynamics of the neural network model

(in the sense of neuron configurations). By choosing appropriate weights we can tune

the dynamics such that the memory is formed by a finite number m of preselected

neuron configurations ξ

(m)

, also called patterns.

The stochastic rule depends on a parameter β. The inverse T = 1/β of parameter

β is called temperature. If parameter β is large, the neuron has a strong tendency (high

probability) to become equal to the sign of its total input. As β approaches infinity the

stochastic rule turns into a deterministic one. Then, if we put in a configuration which

is close enough to e.g. ξ

(1)

, the system evolves to configurations equal to the pattern

(1)

. In other words the neural network remembers the configuration ξ

(1)

of its memory.

This means that the neuron configuration becomes equal to ξ

(1)

and, afterwards, the

system stays in this configuration. This is the so-called zero-temperature dynamics of

the Hopfield model, see Section 2.3.2. It is e.g. very useful for information transmission.

The dynamics defines algorithms to remove noise from the received signals. Often it is

advisable to allow the parameter β to be finite. Then, when we perform the dynamics of

Figure 1.3, we have excluded the probability of getting trapped in undesired so-called

metastable configurations.

In order to increase the capacity of the memory, obviously one can make the gener-

alization of increasing the number of possible conditions to a finite larger number q. In

information transmission if one takes q = 26, every neuron state corresponds to a letter.

Of course, for q < 26 one can also deal with words by a more carefully encoding, but

then the encoding becomes less clear. If we make in the above model this generalization

to have more possible neuron conditions than two, the resulting model is also known as

the Potts-Hopfield model.

In Chapter 3 we choose the weights in the total inputs in a different way. For this we

need to define first for each neuron a set of p continuous variables ξ

(p)

which we refer

to as patterns. We take at random a realization of these variables. They have a Gaussian

distribution. This special distribution is often used in statistics. Then with the values

of the introduced patterns ξ

(j)

we determine the weights for the total input. What will

be the memory of the resulting model? Are there any stable neuron configurations? We

will also look what happens when we increase the total number of neurons. What will

be the effect on the memory?

We form the weights of the total input by two Gaussian patterns. The possible num-

ber of conditions of a neuron we set to three. When we increase the number of neurons,

Page 14

for large numbers the following will happen. For a fixed number of neurons, the mem-

ory is concentrated around six neuron configurations. These configurations are related

to each other by a discrete symmetry. Every neuron configuration is associated to a

point in macroscopic space formed by some macroscopic variables. We can differenti-

ate the six stable neuron configurations into pairs of diametrical opposite points. When

one increases the number of neurons the discrete symmetry always occurs, however the

six configurations tend to rotate on three circles. This we will see in Chapter 3. If we

look at the sequence of increasing number of neurons, then in the macroscopic space of

above, the appearing stable neuron configurations do fill up the three circles in a regular

uniform way.

Ferromagnets

In Chapter 4 we consider a famous model for magnetic materials: the Ising model. In

general there are several kinds of magnetism. For the so-called paramagnets only when

we are applying an external field to it, the metal is magnetized. Otherwise there is no

magnetization. Another important type of metal are the ferromagnets. These metals

retain their magnetization, once they have been exposed to an external field. Initially

the ferromagnetic metals have no magnetization. This is comparable with what happens

when we magnetize pieces of iron with the help of a magnet. When we heat the material,

then eventually this effect disappears: the metal behaves like a paramagnet. For more

general information about magnetism we refer to e.g. [46]. We will use the Ising model

as a model for ferromagnetism.

Ising models

For justification of the Ising model as a model for a ferromagnet, we need to make

some assumptions. We assume that the unpaired electrons of the outermost shell of the

atoms are localized: i.e. closely bound to the corresponding atoms. Only these unpaired

electrons are responsible for the magnetization. For the Ising model we assume that for

every atom only one unpaired electron is in the outermost shell.

Every electron has an intrinsic angular momentum which we call spin. This spin

generates a magnetic moment. Due to quantum mechanics the spin of the electron can

have only two orientations with respect to this magnetic moment, which we call up

and down [5]. With a bit abuse of notation we mostly refer to these orientations as the

values of the spin. Because we have assumed every atom only has one unpaired electron

in the outermost shell, we also have only two orientations for the total spin per atom.

Most of the metals do consist of atoms with more than one unpaired electrons in the

outermost shell. For these metals there can be more than two orientations of the total

Page 15

spin per atom. Most solid materials are crystalline. The atoms, or ions or molecules do

lay in a regular repeated 3 dimensional pattern. This makes some finite number of spin

orientations energetically favorable.

In magnetizable metals the metal is divided into domains which have net magnetic

moments. The boundaries between these domains are called domain walls [46]. The

Ising model only allows for configurations in which the spins of two neighboring elec-

trons are parallel or anti-parallel with respect to each other. If there is a domain wall

present, the thickness of the domain wall is automatically zero.

The interactions between the localized electrons are also called the Weiss interac-

tions. In general two types of interactions do frequently occur: nearest neighbor and

mean-field. When we restrict ourselves to nearest-neighbor interactions, we assume that

all of the remaining interactions between the electrons, which are not nearest neighbors,

are zero. When the interaction is mean-field, then the interaction between the moments

of any pair of sites is non-zero and all of them are equal.

For the Ising model we restrict ourselves to the nearest-neighbor interactions. For

the lanthanide series (a particular series of elements) this is a good approximation. Al-

though the model is simple and is for other magnetic metals at most only a rough ap-

proximation it is and has been very useful model. It is the first model (and for long time

the only model in statistical physics) which displays the phenomenon of phase transi-

tion (e.g. think about the liquid → gas transition). Furthermore it is exactly solvable in

1 and 2 dimensions. Nowadays the Ising model (and generalizations of it) appears in

several places, e.g. all kinds of optimization problems, voter problems, models for gas

versus liquid, etc.

Now we give a mathematical description of the model. Take a piece of a lattice.

Every point where a vertical line does cross a horizontal one we refer to as a site. The

horizontal and vertical line-pieces starting from a site and ending by the nearest next

crossing we refer to as bonds. On every site i there is an atom which has a net spin

magnetic moment to which the spin can have only two orientations. We denote the

spin-value by σ

= +1 when the spin is oriented up and σ

= −1 when the orientation

of the spin is down. We refer both to the atom as to the spin orientations as spin. The

configuration σ of the spins is in our case an array, which contains the spin-values σ

every site.

Between each pair of nearest-neighbor spins (i.e. every pair of spins associated with

a single bond) there is an interaction

= −βσ

≡ Jσ

(1.1)

We call often this interaction also the exchange energy between the atoms on site i and

j. The energy of a configuration is the total of these exchange energies. The variable β

Page 16

is the inverse of the temperature times a constant, which depends on the type of material

considered.

The probability of the configurations are determined by these interactions. In ferro-

magnets the nearest-neighbor spins tend to have equal orientations, i.e. they tend to be

aligned. Therefore we have chosen the interactions in the model such that it becomes

more probable for spins to be aligned: we have set J < 0. When J > 0, the model

behaves like an antiferromagnet. Then it becomes more probable for spins to be anti-

aligned. The higher the energy is the less probable the configuration becomes. The

probability of a single configuration equals

P(σ) =

exp β

i,j

Z(σ)

exp −

i,j

Z(σ)

(1.2)

where Z(σ) is the sum of the numerator over all configurations. We see that when

the temperature gets lower, the interaction (1.1) becomes stronger. Then it becomes

more probable for nearest-neighbor spins to be aligned. From (1.2) we immediately

see that for zero temperature only the two configurations which minimize the energy do

appear with positive probability: i.e. every spin has the same orientation. For very high

temperatures every configuration becomes almost equally probable. Then the model

behaves like a paramagnet. The temperature is thus a measure of the disorder in the

system. For low temperatures most of the spins do align with each other, for high

temperature the orientations of the spins are more or less randomly up or down. In

Chapter 4 we will consider the most interesting part, the low-temperature ferromagnetic

region of the Ising model.

Until this moment we did not bother about the environment. When the energy of

the system is independent of this environment, we say that the system has free boundary

conditions. But what happens when this environment is formed by a different material

with a particular chosen configuration of spins? The values of the spins next to the

boundary not only tend to align the internal spins but also feel the nearest-neighbor

spins in the environment.

In general a piece of metal contains a lot of atoms. Already one gram contains

around 10

atoms. One likes to consider volumes which are of the order of the size

of the piece of metal. The volume size is measured in the number of atoms, thus also

of the order of 10

. In the mathematical description of the model we approximate

this huge number by infinity. First one takes a large finite-volume version of the Ising

model. Then one tries to extrapolate the resulting expressions to an ’infinite’ volume

size model.

What will happen to the system when we increase the volume size, and choose for

each step the orientations of the external spins arbitrarily up or down, i.e. we take ran-

dom boundary conditions? How does the alignment of the spins change in the process

Page 17

of increasing the volume? It turns out to be dependent on the way we let the volume size

increase. Our results depend on letting it increase fast enough. Furthermore we need to

choose the temperature very low so that there is a strong tendency for the spins to align.

Then, in the long run, by (1.2), in the appearing configurations almost every spin

has the same orientation. However, because we have chosen the boundary conditions

randomly, for half of the volumes the appearing configurations will have almost all of

the spins up and for the other half of the volumes the configurations have almost all of

its spins down.

But now, if we look into the volume but far away from the boundary? Do we still

see an effect of the boundary conditions? We prove that the local volume density of the

area’s of aligned spins becomes asymptotically independent of the boundary conditions.

However, even for very large volumes, there is a significant effect on the density of spin

values. If we look at a fixed (very large) volume, then with probability one, either

all configurations have all the spins up or have all the spins down. Almost all of the

orientations becomes equal to the orientation of the majority of the external spins which

are involved in the boundary condition. Because of the non-zero temperature a small

part of the spins has an opposite orientation.

Because we have increased our volumes fast enough the so-called mixtures do not

appear. This means we do not have with nonzero probability both type of configurations:

i.e. having configurations with most of the spins up and configurations with most of the

spins down.

This is the subject of Chapter 4. There as a technical tool we need to introduce

non-trivial expansion techniques, called multi-scale cluster expansions. Our multi-scale

expansion method is inspired by the ideas of Frohlich and Imbrie [35]. The multi-

scale expansion is a generalization of the more familiar ’uniform’ cluster expansion

technique. To simplify our estimates we choose to use a different representation of

the expansions from the one used in [35], the so-called Kotecky-Preiss representation,

which was developed just two years later [50].

In order to have useful expansions, one needs to prove certain criteria: we need the

convergence of some summations related to the expansions. For cluster expansions it is

crucial to check the Kotecky-Preiss criterion. However, in our expansions it is impossi-

ble to prove it directly. Therefore we introduce a new criterion, which we prove to be

equivalent. This new criterion enables us to obtain useful estimates even for our expan-

sions. In the final chapter the uniform and multi-scale cluster expansions are explained

more thoroughly.

Schematically the thesis is built up as follows:

Page 18

1. Introduction → 2. General overview

3. Hopfield model

4. Ising model → 5. Cluster expansions

Page 19

Chapter 2

General overview

2.1 Gibbs measures: Ising model

2.1.1 The Ising model

In some metals, some fraction of the atoms becomes spontaneously magnetized, when

the temperature is low enough. This happens for instance in iron and nickel. The

magnetized spins, which are the intrinsic magnetic moments of the atoms, tend to be

polarized in the same direction (e.g. all up) which gives rise to a macroscopic mag-

netic field. We call this ferromagnetic behaviour. However, when the temperature is

above some T

then all spins are oriented randomly and there is no macroscopic mag-

netic field anymore [44]. The interaction between the magnetic moments is short-range.

However these short-range interactions do provoke long-range ferromagnetic behavior

in the system. These metals have a rather homogeneous-crystalline structure with the

atoms fixed, apart for some minor moving. This makes that the short-range interactions

are typically homogeneous ones.

The Ising model tries to model this transformation of typically-homogeneous short-

range interactions into long-range phenomena in physical ferromagnets. In this model

we look only at the basic features of a ferromagnet. We assume that the metal atoms are

on a regular crystalline lattice Λ, which is in general a subset of

. Every point of the

lattice contains precisely one atom.

Furthermore this atom is fixed and the only degree of freedom is its spin i.e. its

magnetic moment. In reality the atom moves a bit around its lattice point, but be-

cause of strong crystalline binding this movement is limited. In this model we have

neglected the effect of these movements. We assume that the environment outside the

metal changes adiabatically slowly. For real ferromagnets this is indeed typically the

case when we compare the microscopic changes in the crystal with the macroscopic

Page 20

exterior environment. Without any harm we consider this environment to be fixed: the

so called boundary condition to Λ. On every point i of Λ there is precisely one particle

which has only its spin-value σ

as degree of freedom. The spins can only point up or

down, or equivalently its spin-values σ

are restricted to σ

= ±1. Here is some re-

scaling involved, but for the total picture this pre-factor is not important. There are only

nearest-neighbor pair interactions between the spins.

In reality crystals are never perfect, and because of thermal excitations some points

of the lattice are empty and other parts of the lattice are deformed. Also more spin-

values are allowed. Despite its serious restrictions compared to reality, the Ising model

still shows the long-range ferromagnetism it was designed for (if the dimension d ≥ 2

and the temperature T is low enough) e.g. [20]. This is in contrast to Ising’s claim; he

found no ferromagnetism in dimension 1 and he conjectured wrongly that the same was

true for d ≥ 2.

As usually happens to simple models, all sorts of generalizations to the Ising model

have been done. The reality connection with the ferromagnets is often not so clear

or not even there at all. However, we see Ising models in various places to explain

many phenomena; Ising models are equivalent to lattice gases, closely related to many

percolation problems and useful for optimization problems as well.

We can generalize the Ising model by allowing the spins to have more spin-values.

The result is a so-called Potts model when this amount of spin-values is finite. It was

proposed by Domb as a subject for his student Potts. Using duality arguments Potts was

able to determine for the standard Potts model for d = 2 the critical points β

for all

values of q. Further on, in Chapter 3, we will see these Potts spins of the standard Potts

model. Another way of generalizing is to allow the spins to have continuous values on

a sphere: e.g. the Heisenberg model.

We return to the Ising model and make things more concrete. So -in other words-

let’s put the model into math. We use the canonical-ensemble description from statisti-

cal physics. It describes systems for which their exterior functions as a heat reservoir.

Each member of the ensemble is represented by a point in the phase space. All the

possible system behavior is described by this phase space together with a probability

distribution on the ensemble. For Ising systems the phase space is discrete because the

only freedom of the system are the spins. Because each spin can take only two values

the phase space equals {−1,1}

. Each point of the phase space we call a (spin) con-

figuration. Denote by σ the spin configuration σ ∈ {−1,1}

. The restriction of σ to

the finite-sized Λ we refer to as σ

∈ {−1,1}

, where

Λ = {−L,−L + 1,··· ,L − 1,L}

(2.1)

and Λ

\ Λ.

Page 21

When the system settles into thermodynamical equilibrium, the probability of the

spins to be in the configuration σ is described by the so called (finite-size) Gibbs mea-

sure:

(σ

=σ

) =

exp(−H

(σ

))

(2.2)

We denote by < . >

the expectation of the argument with respect to the Gibbs measure

. Z

is the partition function which we obtain by summing over all configurations

the corresponding Gibbs-weight of the configuration.

The free energy of the system per spin equals

= −

β|Λ|

log Z

, with β =

(2.3)

For a set A ⊂

, the symbol |A| refers to the number of sites contained in A. For more

details and the derivation for the particular choice of the Gibbs measure µ

we refer to

any statistical mechanics book, for instance [44].

The functions H

(σ) are the energy functions or the Hamiltonians of the configu-

rations σ

. For the Ising model they are defined as follows:

(σ

) = −β

x,y ⊂Λ

(σ

− 1) − β

x,y

x∈Λ, y∈Λ

(2.4)

where x,y stands for nearest neighboring sites. This means in particular that x −

y = 1, where . is the Euclidean norm. By η we denote the fixed boundary conditions,

i.e. to the spin-values of the spins in Λ

. When we do not include boundary conditions

we speak about free boundary conditions. Equivalently we drop the second term in the

Hamiltonian. Indeed, the expression for the resulting free energy then is independent of

the boundary condition. For the corresponding Hamiltonian we write H

(σ).

Note that because the interactions are only nearest neighbor only the η’s in the sites

x ∈ Λ

with d(x,Λ) = 1 are involved. Z

is the partition function which we obtain

by summing over the Gibbs-weights of all configurations σ

. As we see from (2.4) the

spins tend to align to each other.

Mean field: Curie Weiss

In general, the partition function Z

is hard to compute for the Ising model. For one

dimension this can be treated simply by the so called transfer matrix methods. When

d = 2 there is the famous, much more involved, Onsager solution which gives an

complete analytic expression also by using transfer matrices. For higher dimensions

however only partial results are known. So some approximation is introduced: the

Page 22

mean-field theory (we follow closely [68]). With this approximation we are able to

obtain an explicit expression for the Gibbs average of the global magnetization.

We look at free boundary conditions and we rewrite the Hamiltonian to

(σ

) = βN(L) − β

x,y ⊂Λ

(2.5)

where N(L) is the number of nearest-neighbor bond pairs. Because the first term is not

dependent on the spin-variables it drops out in the Gibbs measure. So we are allowed

to ignore it.

Then we ’expand’ every spin σ

around its Gibbs mean value < σ

>≡ m and

denote the fluctuations by ∆

= σ

− m. Rewriting the Hamiltonian (2.4) gives for free

boundary conditions

(σ

) = −β

x,y ⊂Λ

(m + ∆

)(m + ∆

)

(2.6)

Now we assume that we can neglect the higher order terms in ∆ so

(σ

) = βm

N(L) − βm

<x,y>

(σ

+ σ

) = βm

N(L) − 2dβm

(2.7)

Here we have assumed that every site i has 2d bonds coming out from it. The corners

and intersecting planes on the boundary of Λ are of lower dimension and therefore

ignored.

With the above the partition function easily follows:

= Tr

exp(−βH

(σ

)) = Tr

exp βm

N(L) − 2dβm

expβm

N(L)(2coshexp2dβm)

|Λ|

(2.8)

By Tr

we mean the sum over all possible 2

|Λ|

configurations. Now we remember

that m =< σ

> which is the Gibbs-expectation of the mean of a single spin-value.

When we put it in, we obtain the so called mean-field equation for m:

m =

exp(−βH

(σ

))

= tanh2dβm

(2.9)

This equation has three solutions m , 0 and −m whenever 2dβ > 1, i.e. when β >

1/2d. The critical value β

: 2dβ

= 1 is the value where region ends where there is no

global magnetization, i.e. there is no non-zero solution.

Page 23

It turns out that the above mean-field equation (2.9) (after re-scaling) is the exact

solution for m =< σ

> of the infinite range version of the Ising model (see e.g. [68]).

This version is also called the Curie-Weiss model which has as Hamiltonian

(σ) = −

i=j

(2.10)

where 1 ≤ i,j ≤ N. Each spin has an (uniform) interaction with any other spin. We

will encounter more mean-field equations in Chapter 3.

2.1.2 Thermodynamical limit

In nature macroscopic systems are extremely large of the order of 10

atoms and more.

So it is natural to take the system size limit L → ∞. But when we take this limit the

Hamiltonian goes to infinity as well. The infinite limit expression of the Hamiltonian

does not make any sense. So how to define an infinite-volume Gibbs measure which

depends on this divergent function?

All is settled by defining the infinite-volume Gibbs measure by the condition that

all the conditional probabilities to finite-sized volumes are finite-size Gibbs measures

in a consistent way. The corresponding equations due to this condition are called the

DLR-equations.

Definition 2.1. An infinite-volume measure µ is a Gibbs measure if it satisfies the so-

called DLR-equations:

µ(·|η

) = µ

(.)

(2.11)

for all finite Λ and µ-a.e. every η.

Equivalently: if we condition µ on the configuration η outside Λ we obtain the

finite-volume Gibbs measure µ

If we look at the finite-size Gibbs measures µ

and if we take the sequence L =

1,2,··· it depends on the boundary condition η what will happen for very large L. The

sequence does not need to settle to a single limit Gibbs measure. For L → ∞ the

sequence may oscillate between two or even more infinite-volume Gibbs measures.

To see some limiting structure one can define metastates. These metastates are

probability measures over the infinite-volume Gibbs measures. Later on we reveal more

details about metastates in Section 2.5.

When we cannot write the Gibbs measure µ as a combination of Gibbs measures,

e.g. µ = (µ +µ )/2, we call µ an extremal Gibbs measure or a pure state. From (2.11)

follows when µ and µ are pure states, then all the convex combinations in between are

infinite-volume Gibbs measures.

Page 24

+ + + + + + + + +

+ + + + + + + + + +

+ + - + + + + + - +

+ + + + + + -

- +

+ + + + + + - + - +

+ + + + + + -

- +

+ + + + + + + + + +

+ + + -

- + + + + +

+ + + + + + + + + +

+ + + + + + + + + Λ

Figure 2.1: Typical configuration for µ

As T → 0 the inverse temperature β → ∞. From (2.2) we see that we obtain for

infinite volumes only Gibbs measures µ for which configurations of a strictly non-zero

weight (with respect to µ) do minimize the corresponding energy function H

(.).

We call these states ground states and the corresponding set of non-zero weight

configurations ground-state configurations due to the following property. From the

corresponding ground-state configurations σ, for every configuration σ we can cre-

ate by flipping any finite number of spins in σ the following holds: the difference of

(σ ) − H

(σ) ≥ 0. Note that we need to be careful, because in the infinite volume

limit Λ → ∞ the energy tends to −∞ for a lot of configurations.

This does not mean that there are no states σ for which the difference H

(σ ) −

(σ) < 0, where σ is a ground-state configuration. What it does mean in dynamical

sense, is that the system will stay in the same state for an infinite amount of time.

2.1.3 Some choices of boundary conditions

For getting a better understanding of the Gibbs measure subjects we just introduced,

we consider some examples. All is for the Ising model defined in Section 2.1.1. For

simplicity we restrict ourselves mostly to 2 dimensions.

Uniformly agreeing

First we take as boundary condition η ≡ 1, i.e. every site y has η

= +1. Looking at

(2.4) we see easily that only the configuration σ ≡ +1 minimizes the Hamiltonian. This

means that there is exactly one ground state µ

which equals µ

β→∞

(σ) = δ(σ ≡ +1).

Page 25

For β large enough but finite, the Gibbs state µ

which does appear tends to concen-

trate around this configuration σ = +1. The set of configurations σ which do appear

with µ

-measure 1 is of the following structure: σ has typically small islands of −

spins in a sea +-spins. The small islands have small lakes of +-spins which can contain

islands of −-spins and so on. This set we will refer to as the +-ensemble later on. See

Figure 2.1 for an example.

The same is true for the boundary condition η ≡ −1. Then the configurations has

small islands of +-spins surrounded by −-spins: the −-ensemble.

We can make this image plausible by proving the absence of large contours: in

literature often referred to as a Peierls bound. Consider all the bonds of the dual lattice

between nearest neighbor spins which have opposite signs. When we take the

union, the resulting closed curves Γ do form the boundary between + and − spins.

Every closed curve we call a contour Γ. The length |Γ| of the contour is the number of

dual bonds involved. Because of the boundary condition η ≡ +1 every contour does

appear as a closed curve. Every set of non-intersecting contours defines exactly one

configuration when we only look at the +-boundary condition and vice versa. Later on

for different boundary conditions a more general definition is needed and more general

curves do appear.

When we look at the definition for the Hamiltonian (2.4) we see that

(σ = {Γ}) − H

(σ ≡ +) = 2β|Γ|

(2.12)

This means that for the relative probability it holds:

(σ = {Γ})

(σ ≡ +1)

= exp(−2β|Γ|)

(2.13)

also called the weight or the cost of contour Γ. Note that the weight of a configuration

consisting of more contours factorizes into the weights of the single contours making

up the configuration.

Now we can prove the statement:

Peierls bound: Assume β > (log 3)/2 and + boundary conditions. Then for any

θ > 0 with µ

-probability one there are no contours larger than L

when L → ∞.

Proof.

(σ : ∃ Γ with |Γ| ≥ L

)

θ > 0, possibly L

≡ µ

(σ : )

(2.14)

Page 26

+ - + - + - +

+ .

+ - + - + - + Λ

Figure 2.2: Alternating boundary conditions η for Λ

Because of factorization H

({Γ

,Γ

}) = H

({Γ

}) + H

({Γ

}) and therefore

(σ : ) =

Λ σ:

exp(−βH

(σ)) =

Γ:|Γ|≥L

exp(−2β|Γ|)

σ: σ=

{Γ }

∪Γ

exp(−βH

(σ )) <

∞

n=L

exp(−2βn) ≤ 2L

exp(−(2β − log 3)L

) → 0

for L → ∞, θ > 0 , β >

log 3 (2.15)

Note that the proof of the Peierls bound heavily depends on the uniform exponential

size decay of the contour weights.

Alternating

Now we choose the boundary condition η as an alternation of + and − spins, see Figure

2.2. Every boundary spin involved has a sign opposite to its nearest neighbors. Note

that this boundary condition gives rise to contours which are not closed curves.

Because the boundary condition does not favor any sign, the ground state µ(σ) =

(δ(σ ≡ +1) + δ(σ ≡ −1)) =

(µ

+µ

−

), when we take even volume sizes. We see

that this boundary condition gives rise to a mixture; the ground state is a combination

of the two pure states µ

= δ(σ ≡ +1) and µ

−

= δ(σ ≡ −1).

Page 27

+ + + + + + +

+ + + + + + + +

+ + -

- + +

- Λ

Figure 2.3: Typical configuration for µ

Dobrushin

The Gibbs states do concentrate now around both pure states which together do

make the ground state: µ =

(µ

+ µ

−

). The measure µ

is the Gibbs measure which

concentrates only on the +-ensemble and µ

−

concentrates on the −-ensemble.

We claim that this means that there are no interfaces involved with probability one.

By an interface we mean a contour which crosses the square lattice (so is at least of

order L). Of a (vertically-crossing) interface maximally half of the vertical bonds do

cancel in considering the weight; the weight of an interface is at most exp(−β|Γ|) so

we can apply again the Peierls bound for proving the claim.

Dobrushin

Now we create a boundary condition for which interfaces do exist. We choose η as

follows. For the upper half of the boundary we take all the spins +1 and for lower

half we do the opposite: all the spins −1. This boundary condition is also called the

Dobrushin boundary condition.

The possible form of the ground states is dimension-dependent. For d = 2 ground

states and for d = 3 also Gibbs states do exist with an interface like in Figure 2.3. This

means that the interfaces appear with non-zero probability at a particular position.

Chaotic size dependence

When we choose the boundary conditions carefully we can ensure that the system does

not have a limiting Gibbs measure. Take for even system size L the boundary condition

+ and for odd L the boundary condition −. Then the sequence µ

converges to

the unique Gibbs measure µ

. The restricted odd sequence µ

2L+1

converges to µ

−

However the full sequence µ

oscillates between µ

and µ

−

and never settles to a

Page 28

limit. It depends on the volume size what the measure looks like even when this size

goes to infinity. This limiting dependence we call size dependence. Instead of even and

oddness we can also choose the + or − boundary conditions in a random way. Then for

very large L the measure still may depend randomly on L. This is called chaotic size

dependence.

Quenched-random boundary conditions

The environment around the system can be changing randomly in time. However in

reality when this happens then these changes are typically adiabatically slow with re-

spect to the dynamical changes of the system. To model this we assume the external

environment is fixed and is an outcome of the random variables making the randomness

of the environment. This type of randomness we call quenched disorder. However we

must be a bit careful when we say that the external environment is fixed. Although

the disorder is quenched and therefore fixed, the boundary condition changes randomly

when we look at increasing sequences of volumes which are independently chosen of

the disorder.

Choose all η

i.i.d. (=independently identically distributed) according to the follow-

ing distribution

P(η

= ±1) =

(2.16)

What will happen now? This is the question a considerable part of this thesis is all

about, in particular Chapter 4. Are there Gibbs states or ground states involved which

do contain all the above features: mixtures, interfaces, + and - ensembles? The answer

is not obvious from the beginning. Because although the probability of having interfaces

goes to zero in the η-distribution (Dobrushin-type configurations), it certainly does not

immediately follow that the Gibbs probability µ also goes to zero for the interfaces.

Furthermore there is no such thing as a limiting Gibbs state, because chaotic size

dependence is involved. For sparse enough sequences the limiting measure oscillates

randomly between measures concentrated on the +-ensemble and measures concen-

trated on the −-ensemble. For large enough volumes, with η probability one, neither

interfaces nor mixtures will occur. The above model is one of the simplest in which one

can study these things rigorously. The concepts have been developed for spin glasses,

in which much less is clear even at a heuristic level.

Page 29

2.2 Spin glasses

2.2.1 Ising spin glasses

In the previous section we have considered the Ising model, which models metals with

uniform spin-interactions. The spin glasses we now consider, are modelled by a system

with the same concepts but now the spin-interactions will be modelled by random inter-

actions. The atoms do not lay regularly on a crystal but are randomly placed in space.

In these spin-glasses these random places do change very slowly in time. Compared to

the dynamics of the spins these positions are fixed. After some time the spin values are

more or less like random distributed but do not change in time anymore. This rather

unusual behavior is seen in some alloys of ferromagnets and conductors like AuFe and

CuMn. In these metals the so called RKKY spin interactions are rapidly oscillating and

slowly decreasing. Because the atoms are randomly placed the sign of the interactions

is also random distributed.

We use the term glass because of the similarity with the glass of windows, which

are fluids but where the flow is almost infinitely slow. In the literature the term spin

glass is often used for a wider class of models which have a high amount of quenched

disorder in common but where the connection to the alloys is often lost.

Spin-glass models with infinite-range interactions turn out to be useful also for

explaining pattern recognition in neural networks, in error-correcting codes, image

restoration, and in all kinds of optimization problems [68].

For an explanation of the spin-glass phenomena the Edwards-Anderson model has

been introduced. This is an Ising spin glass with only nearest neighbor interactions

and therefore has only interactions between neighboring pairs of spins. The rapidly

oscillating interactions are modelled by i.i.d. Gaussians.

The Hamiltonian is as follows

= −β

<i,j>∈Λ

+ hβ

(2.17)

The J

are quenched i.i.d. non-trivial random variables with common mean IE[J

] =

IE[J

] ≡ IE[J] and h is an uniform magnetic field. The set Λ is a subset of

. Because

no boundary conditions do enter in the Hamiltonian the boundary conditions here are

free.

In real life when we study spin-systems we expect to observe only quantities which

depend on macroscopic properties. The couplings are microscopic and in practice

we do not know all the random places of the individual atoms. When we make the

setup we do this without knowing the particular realization of the couplings. So for a

proper measurement we need that the macroscopic properties we measure are coupling-

Page 30

independent. Therefore we should observe only states which we can create in a coupling-

independent way. These states we call observable states. If a state is not observable we

will call it an invisible state [67].

When h = 0 it depends on IE[J] and on β for which kind of configurations there

is a tendency to order in the system. When β is low enough: β < β

then there is

no ordering in the system at all. The spins behave approximately independent of each

other; the system shows paramagnetic behavior. For some systems only this behavior

is possible and β

= ∞.

When β ≥ β

there are three possibilities. When IE[J] > 0 the system prefers

ferromagnetic behavior: all spins tend to have equal values. For IE[J] < 0 the spin

values of nearest neighbor spins tend to be different from each other, which means the

system prefers to be anti-ferromagnetic. The third possibility, happening typically when

IE[J] = 0, is the spin glass phase which we will consider now.

A good way to see these tendencies is to look at the so called Edward-Anderson

order parameter q

|Λ|

i∈Λ

< σ

(2.18)

where < . > is the Gibbs mean of the argument and |Λ| is the total size or the volume

of Λ. For paramagnetic behavior < σ

>= 0 for every site i, making q

= 0. When

the system behaves like a ferromagnet or a anti-ferromagnet < σ

= 1 for every site

i. This makes q

= 1: its maximal possible value.

When we set the average IE[J] of the couplings to zero the systems prefers as many

spin pairs for which σ

= +1 as for which σ

= −1.

With the field h we have some control over the spin-values, when the average

IE[J] = 0 or small in magnitude compared to h. When we put h > 0 the system

gives preference to +-spins, when h < 0, the −-spins are more favored. However when

the field is not too large there is an intimate interplay between tendency due to the field

h and the tendency due to the couplings J

Denote by [.] the coupling average: the average over the disorder J

. For calculating

e.g. the averaged free energy we first calculate the trace as before with a fixed random

realization. Then we take the average over the randomness. This because the change in

randomness over time is adiabatically small compared to the spin value changes due to

thermal activity.

The free energy per spin turns out to be a self-averaging quantity. This means that

with probability 1 the free energy per spin for a fixed realization of the couplings is

equal to the coupling mean when we take the system size limit Λ → ∞. So the limit is

independent of the realization of the couplings. This is as it should be, because the free

energy is a macroscopic object.

Page 31

When the temperature is not too low < σ

>= 0 for any site i, because of param-

agnetism. This makes both the average magnetism m = [< σ

>] = 0 and q = [<

] = 0. For low enough temperature in general < σ

>= 0 due to the quenched

couplings. However when we average over the disorder it can happen due to alternat-

ing signs that [< σ

>] = 0 although < σ

>= 0 for the typical realizations of the

quenched couplings. But q = [< σ

] = [q

] > 0. This scenario is called the

spin-glass phase which is the third possibility we have mentioned earlier on.

2.2.2 Mean field: SK-model

Because calculations for the short-range EA-model are extremely hard we can try to do

the mean-field approximation like we did in the Ising model. The result is the infinite-

range Sherrington-Kirkpatrick model of which the Hamiltonian is

= −

√

i<j

+ hβ

(2.19)

again with J

the outcome of an i.i.d. random distribution. Often one takes J

standard Gaussian: J

∼ N(0,1) and the external field h = 0. It is believed that the

infinite dimension limit d → ∞ for the free energy density of the EA-model is equal

to the free energy density of the SK-model. As in the SK-model each spin interacts

directly with infinitely many other spins.

For the SK-model one can show that the spin-glass phase does occur when the

temperature is low enough and the coupling mean not too large [68].

When one tries to take the limit N → ∞ various limit Gibbs states µ seem to appear.

In general a Gibbs state then looks like a mixture of infinitely many pure states µ

µ(σ) ≈

(α)µ

(2.20)

where w(α) is the relative weight of the pure state µ

. [66]. Note that the decomposition

weights as well as the µ

do depend on the disorder J. Recently it was proven that in

the limit N → ∞ each configuration is a ground state [67]. However taking the infinite-

volume limit is problematic. A slightly better-behaved class are the Hopfield models.

2.3 Hopfield model

Our brain is a complex structure of many neurons which interact with each other in

a non-trivial long-range way. For instance the cerebral cortex consists already about

Page 32

(t)

−→ h

(t) =

j:j=i

(t) −→ σ

(t + ∆t) = signh

(t)

Figure 2.4: The zero-temperature neuron dynamics

neurons. Nowadays there is a lot of research in this area. It seems that our brain

network is scale free. It has the structure of a so called small world network: i.e. small

path length between two neurons in the order of the path length in a random edge neural

network, but with a relatively high amount of connections (e.g. [18]).

Originally Pastur and Figotin invented the Hopfield model as a model for a special

type of spin-glasses. Then Hopfield came up with it independently as a model for neural

networks as above. However it is a simplified model and the geometric structure of the

neural connections is totally missing.

2.3.1 Setting

Assume that the neural network contains N neurons and that every neuron interacts

with any other neuron (i.e. having mean-field like interactions). These interactions are

composed out of 2-neuron interactions only. Now consider a neuron i. The state of the

neuron is labelled by the variable σ

. If the neuron is excited then σ

= +1. When

= −1 the neuron is at rest. As a current is going from neuron j into neuron i the

signal is altered due to chemical transmitters in the neuron i itself. This synaptic efficacy

we denote by J

. It alters the signal σ

into J

. The total input h

of neuron i equals

j: j=i

(2.21)

2.3.2 Dynamics and ground states

Let us define the dynamics of this model. If σ

(t) denotes the state of a neuron i at time

t then it becomes (or stays) excited at time t + ∆t whenever h

exceeds a threshold θ

Otherwise it is at rest at time t + ∆t:

(t + ∆t) = sign(h

(t) − θ

)

(2.22)

For simplicity we set this threshold to zero. Then the state of neuron i at time t + ∆t

becomes [19]

(t + ∆t) = sign

j: j=i

(t)

(2.23)

Page 33

so without an extra constant term. See Figure 1.3 and also Figure 2.4.

A particular fixed configuration of neurons we denote by the N−dimensional vector

ξ ∈ {−1,1}

⊗N

. We call this a pattern. If this pattern ξ is a stable fixed point of the

evolution defined by (2.23) then we say it is in the system’s memory. If we assume

that there are no metastable fixed points, then the following happens. If there is only

one stable fixed point, then whatever the initial state of the neurons, after long enough

time the system will be in the state defined by ξ, i.e. the system remembers the pattern.

Of course there can be more patterns in the memory. Then if we pick at random an

initial configuration of neurons, at the end we will always end up in one of the patterns.

Furthermore every pattern of the memory can be reached with non-zero probability. In

general however it can happen that the dynamics get stuck in a metastable fixed point.

To illustrate this process better, imagine you want to answer a question for a quiz.

It can happen that you need a bit of time to remember the answer, because the question

appears to be difficult. But still you have the feeling that you might know this one.

In the process of remembering you try to find associations with the -according to you-

presumably right answer. You are fine-tuning your first thought. You are altering your

initial condition to a better one, a one with less energy. Then after some time you think

you know an answer and you believe it is right. You have recovered something of your

memory: either you are in the state of the right answer or in a state of wrongness.

A good way of measuring how well a configuration σ agrees with a given pattern ξ

is to look at the corresponding order parameter q

i=1

(2.24)

When q

= 1 then the configuration σ is equal to the pattern ξ and when q

= −1 then it

is equal to the opposite: σ = −ξ. The parameters q

are also called overlap parameters.

Whenever ±q

> 0 the configuration ±σ agrees with the pattern ξ for more than half

of the neurons.

A good way of studying this system is to use the Gibbs description of statistical me-

chanics. Then the memory of the system is formed by the ground states of the model.

The equilibrium features are governed by the Gibbs measure. We choose the Hamilto-

nian as

= −

i=1

= −

i=1

j: j=i

(2.25)

Now the zero temperature dynamics of this system is equivalent to the earlier-defined

Page 34

neuron dynamics (2.23). We see this as follows. The energy equals

(t + ∆t) = −

i=1

(t)σ

(t + ∆t)

(2.26)

For zero temperature the Gibbs measure for time t + ∆t becomes a δ-measure on the

configurations σ(t+∆t) for which the energy H

(t+∆t) is minimal. As we see from

(2.26), this is the configuration σ for which for every neuron i σ

(t + ∆t) = signh

(t).

It is easy to see that the energy cannot increase in this operation. However it is not clear

that in the end when t is very large, we will end up in a single ground state configuration

or oscillate between more.

The above dynamics is deterministic. In reality the neurons might not be determin-

istic in this way. Furthermore we need to allow for some probability that the energy

in the operation can increase. This is to have the ability to get out of the local minima

formed by the metastable fixed points. Therefore we introduce a parameter β to con-

trol the uncertainty in the model. The smaller β, the more uncertainty. When β = 0

the neurons behave perfectly random, because the energies of the configurations do not

matter. Taking β → ∞ makes the system behave like the zero-temperature dynamics of

(2.23).

As example we take J

≡ +1. Then the system transforms into the Curie-Weiss

model. The behaviour of this model is well understood. The ground states are σ = ±1.

It is also easy to see that the same is true when J

contains only one pattern ξ, i.e.

= ξ

. Indeed the Hamiltonian (2.25) is minimized whenever σ ≡ ±signξ. In this

case the pattern coordinates ξ

are allowed to have a more general distribution but they

need to be i.i.d.

In reality one often knows only the global properties of the memory of the system.

Furthermore the memory also changes in time. But these changes in time are very slow

compared over the time in which the states of the neurons are changing. A good way

of modelling this is that instead of choosing the pattern ourselves we let the patterns be

chosen according to a quenched random distribution. All the randomness is i.i.d. and

is thus described by a product measure. The measure with respect to this randomness

ξ we denote by

. Usually one takes the randomness as symmetric Bernoulli: ξ ∼

{−1,1}

⊗N

Note an important difference between the 1-pattern case and the SK-model. In the

SK-model all the bonds have independent disorder J

. In the 1-pattern Hopfield model

pairs of bonds with a common neuron e.g. (ij) and (jk) have highly dependent disorder.

We generalize from the 1-pattern system to the finite p-pattern system with patterns

Page 35

,...,ξ

. We take for the J

µ=1

(2.27)

For the patterns we take a random outcome of the uniform distribution on {−1,1}

⊗N

⊗p

In the literature this choice of the J

is referred to as the Hebb rule. Because of the scal-

ing the quenched patterns are asymptotically orthonormal to each other

lim

N→∞

· ξ

= δ

+ O( 1/N)

(2.28)

It is easy to see that the states σ = ±ξ

are equilibrium states of the system. Indeed if

we put in σ(t) = ξ

into the β → ∞-dynamics (2.23) we obtain for N → ∞

(t + ∆t) = sign

j: j=i

= sign ±

ν=1

j: j=i

sign ±

ν=1

νµ

= sign(±ξ

) = ±ξ

(2.29)

In other words σ(t) = σ(t + ∆t). This makes σ(t) ≡ ±ξ

fixed point configurations

and therefore equilibrium states. However it is not clear from these calculations whether

these states are also ground states. This is because the states can be unstable fixed points.

Furthermore we might not be allowed to omit the O( 1/N) term in the calculations

as we have done. Also it could be possible that the states only can be reached by a set

-measure 0. Or maybe there are more ground states than these fixed points. After

more analysis it turns out to be that the 2p states σ = ±ξ

are indeed the only ground

states for this system (e.g. [10]).

2.3.3 System-size-dependent patterns

When the number p of patterns depends on the system size N several things can happen.

Denote by α the ratio between the number of patterns and the system size: α = p/N.

We consider the phase regions in the (T,α) plane, see Figure 2.5. Whenever T > T

where T

= 1 +

√

α the system behaves like a paramagnet.

There is a curve T

such that below this line all of the p patterns are stable, i.e.

absolute minima of the free energy. Between the curves T

and T

there is a different

curve T

which separates between stability and metastability.

Page 36

Figure 2.5: Phase diagram for the Hopfield model

(after [3])

Between the curves T

and T

the patterns become metastable; they are local min-

ima of the free energy. The global minima correspond to the spin-glass states. These

states have vanishingly small overlap q

(of order O(1/

√

αN)) with all of the patterns

µ. So only if the initial configuration is close enough to a pattern the system will re-

member it.

Above T

and below T

the spin-glass states become the only one present. So

none of the patterns can be remembered. In this spin-glass phase there is presence of

ageing. The decay of the energy becomes slower for longer waiting times. According

to numerical research the spin-glass properties seem to be closely related to properties

of the SK-model, which we obtain by taking the limit α → ∞. Analytic research of

the corresponding dynamics is highly complicated; it cannot be described only by the

overlap values q

µ,t

and the neuron states σ

at times t [3, 58, 68].

For a more extensive discussion of the Hopfield model, including some history and

its relation with the theory of neural networks, see [10, pag. 133 and further] or [12].

2.3.4 Some generalizations

To put in more realism we take into account that not every neuron need to be connected

with every other. For this goal we define the matrix Λ

, which represents the structure

of the network. If neuron i is connected with j then Λ

= 1 otherwise Λ

= 0. When

the network is undirected we have a symmetric matrix Λ

. Now we use the Hopfield

dynamics of (2.23) but we replace J

by the value Λ

Numerical research seems to suggest that the task of recognition of a finite number

Page 37

of patterns is better performed (i.e. higher overlap after long time) when we decrease

the clustering coefficient of a network [48]. By the clustering coefficient we do mean

the following. Take a vertex v of a graph. Suppose v has N neighboring edges. Then

at most N(N − 1)/2 edges can exist between these neighboring vertices. Denote by

the actual number of these edges divided by the maximal amount possible. The

clustering coefficient C is the mean of C

over all vertices v.

Another generalization is to allow the neurons to have more values. The neuron-

states increase to σ

∈ {1,...,q} instead of σ

= ±1. In spin-glass language we say

that we have q-state Potts spins instead of Ising spins. For the patterns we can still take

the restriction to ξ

= ±1. When the system has only one pattern in its memory then

we easily see that the form of the ground states is of the following type. Every site i for

which ξ

= +1 has σ

≡ j and every site for which ξ

= −1 has σ

≡ k, with j = k.

Of course it is more realistic to consider Potts-patterns, when the neuron states are

equivalent to Potts-spins. Then the ground states are {ξ

} with

probability one

whenever the number p of patterns is not too large: α : 0 ≤ α < 1 arbitrarily, p <

(α/lnq)lnN [37]. Note that p is allowed to be infinite when N → ∞.

2.4 Scenarios for the spin glass

In the last decades researchers have tried to get an analytic-rigorous grip on the phenom-

ena of short-range spin-glasses. During this process various competing theories were

formed which were not at all conclusive. Most theories we can group into three scenar-

ios; the droplet-picture of Fisher and Huse [33], the chaotic-pairs picture of Newman

and Stein [66] and the replica symmetic breaking picture which resulted from mean-

field theory for the infinite range SK spin-glass developed by Parisi [56].

2.4.1 Droplet-picture short-range spin-glasses

At the end of the eighties Fisher and Huse introduced a so called droplet picture [33] to

describe the equilibrium phenomena for short-range spin glasses. For a clear example

of this picture we take a model which has the energy function (2.17) of the Edward-

Anderson model. The couplings (the spin-interactions) we choose symmetrically and

continuously distributed. We set the field h to zero.

For finite dimensional Ising spin glasses there are two possibilities for the equilib-

rium behavior for small T. There is a critical dimension d

such that:

d < d

: System is paramagnetic at all T > 0 so T

= ∞.

d ≥ d

: There exists exactly one pair of (flip-related) ground states. For 0 < T < T

Page 38

∞ the behavior of the system is described by a small non-zero density of excitations

with a volume which is non-zero relatively to the (infinite-sized) system.

Now we take a particular ground state G and look at its excitations. Because T > 0

there are excited regions where the spins have opposite values compared to G. As in the

Ising model we can define contours by the boundaries of these regions. The contours

do exist on various scales. For a large enough system the probability of having at least

one large contour is of order 1, although the probability of having a particular large

contour is small. For low enough T we assume that the contours with the lowest energies

dominate the physics. These contours we call droplets. More precisely

Definition 2.2. A droplet D

(j) of length scale L is a contour Γ enclosing site j and has

the minimum of energy of all possible contours Γ enclosing j and containing between

and (2L)

spins.

The energy F

(j) of a droplet D

(j) equals

(j) =

min

Γ encl. j,

≤|Γ|<(2L)

(Γ)

(2.30)

where E

(Γ) is the energy of configuration {Γ} relatively to the ground state energy

(∅).

In case of an Ising ferromagnet (i.e. (2.17) with J

≡ 1 and h = 0) F

(j) =

O(L

d−1

). For the current Ising spin glass with the random symmetric couplings it is

expected that the droplet energy is much lower. This because there is a big amount

of frustration and also there are many configurations which are almost like the ground

states. However for a generic contour the energy scales still like L

d−1

. Given this we

make the scaling ansatz:

(j) = O(L

), θ < d − 1

(2.31)

In [33] it is argued that

θ ≤

d − 1

(2.32)

However the arguments in favor of (2.32) use some assumptions which need not hold in

general [24].

For θ > 0 we expect the following picture. Because of the almost degenerate

ground state the Gibbs weight of the event F

≈ 0 is bigger than zero even for zero

energy. As we see from the Hamiltonian only the droplets at length scale L with energy

≤ O(T) do contribute significantly to the Gibbs measure. When T

O(L

), only

a small fraction of these droplets does appear. Because of the positive weight of F

near

zero some of the droplets will be excited at any positive temperature. These properties

make that θ > 0 implies d ≥ d

Page 39

When θ < 0, the energy cost is so low that the entropy will dominate and the

droplet-picture breaks down. Because every spin can be flipped with arbitrarily small

energy cost (by taking the system size large enough) the system is to be expected to

behave like a paramagnet. Therefore θ < 0 implies d < d

2.4.2 Parisi’s Replica Symmetry breaking picture

Parisi cleverly conjectured in the eighties an expression for the free energy function of

the SK-model and also an expression of the (Parisi) overlap distribution [56]. The idea

of this solution is also known as replica symmetry breaking (RSB). Recently the con-

jectured free energy expression was mathematically rigorously proven to be the correct

expression by Talagrand [74] who used in his proof results of Guerra and co-workers.

However, some of the aspects of Parisi’s RSB-picture are still open.

This RSB-picture predicts that in the infinite-volume limit states do appear which

are composed out of infinitely many pure states. It is not clear what a pure state means

for the infinite range SK-model. Assuming we still can define overlaps between differ-

ent ’pure states’, the overlap between ’pure state’ α and α is

αα

i=1

< σ

(2.33)

By < . >

we mean the Gibbs measure over the pure Gibbs state µ

. From this quantity

we can read how much state µ

looks like state µ

. For every pure state µ

it holds

=< σ

= q

(2.34)

where q

is the same parameter as in (2.18). Furthermore we see

−q

≤ q

αα

≤ q

(2.35)

To explicit construct the pure states is impossible. However, still some things can be

said about the distribution of the overlaps. We choose at random two pure states from

the Gibbs measures appearing in the limit of the SK-model. Then we denote by P(q)dq

the probability that the overlap of these two states lays in between q and q + dq. This

distribution is also called the Parisi overlap distribution. It looks like

(q) =

α,α

(α)w

(α )δ(q − q

αα

)

(2.36)

For high temperature the SK-model becomes a paramagnet and P(q) = δ(q = 0).

However the symmetric overlap function P(q) is highly non-trivial when the tempera-

ture T is low enough and consist of many δ-functions of non-zero weight. Furthermore

Page 40

it is coupling dependent, i.e. a non self-averaging object. When we average over the

couplings the resulting distribution shows to be continuous non zero between two δ

spikes at ±q

. Furthermore there is chaotic size dependence. When we look at two

different infinite volumes which has a large difference in volume sizes then in general

P(q) also looks very different.

Another interesting concept which holds according to Parisi’s theory is ultrametric-

ity. Recall that for two equal pure states the overlap equals q

. With this we create a

distance function between two pure states

αα

= q

− q

αα

(2.37)

Then we take at random three states 1, 2, 3. With these states we can make three pairs.

Ultrametricity then claims that either

= d

or d

= d

< d

= d

< d

or d

= d

< d

(2.38)

So the three overlaps of the state pairs are intimately related. A mathematical rigorous

proof of this ultrametric structure is still an open problem.

2.4.3 Chaotic Pairs

The remaining possibility is [65, 66] that for large L the Gibbs measure looks like (with

dependence on J)

≈

−α

(2.39)

When we put L → ∞ and take the union of all possible states emerging then we obtain

a set of uncountably many states. The Gibbs measure is approximately a combination

of two pure Gibbs states α

and −α

out of the infinitely many. These two states are

each other’s global spin-flip. The state-labels α

are chaotically dependent on L.

We encounter in Chapter 3 an example of an infinite-range system which has in-

finitely many ground states. For fixed size L only two pairs of ground states do appear

(or triples of pairs in case of 3-Potts spins) in the way of the Chaotic Pairs scenario.

2.5 Metastates

In spin glasses to get a grip on the quenched disorder we consider the following. Look

at a sequence of finite volume Gibbs measures µ

. The disorder of the spin glasses is

Page 41

prescribed by the parameter η. It is treated as quenched disorder so we consider it as

fixed. Then we take the empirical average of these measures

n=1

Λn

(2.40)

We try to take the limit N → ∞. The result provides the so called (empirical) metastate.

This metastate is a probability measure on the Gibbs measure and is dependent on the

quenched disorder η [66]. The metastate gives the relative weight of the event that

a quenched disordered system of a very large volume behaves like a particular Gibbs

measure.

In general however (2.40) does not converge for almost every configuration η unless

we take a sparse enough subsequence. It does converge however in distribution. The

resulting distribution over the infinite Gibbs measures does not depend on η anymore.

The limiting process of the whole path t → µ

[tN]

is described by the so called super-

state. The value [tN] is equal to the largest integer smaller equal tN [54]. In [51] and

[53] there are two examples for which this behavior has been examined thoroughly.

For the d = 2 random boundary field Ising model, which we consider in Chapter

4, the metastate does concentrate on two extremal Gibbs measures µ

and µ

−

. We

conjecture that for d = 2,3 every mixture of µ

and µ

−

can appear as a limit point along

the regular sequence of cubes. These mixtures are null-recurrent. So in the metastate

they do not appear and for this particular model the metastate is a.s. convergent.

For d > 3 for the random weak boundary field Ising model, the limit points along

the regular sequences are only µ

and µ

−

almost surely. Each extremal Gibbs measure

appears with probability 1/2 [28].

As example of an a.s. non-converging metastate we take the Curie-Weiss random

field Ising model. It has as Hamiltonian

= −

i<j

− β

i=1

(2.41)

The random variables η

are i.i.d. and have

(η

= ±1) = 1/2. For β large enough

and small the model behaves like a ferromagnet with one +-phase µ

+,η

∞

and one −-

phase µ

−,η

∞

. When one takes the sequence n = 1,2,··· the corresponding metastate

converges in distribution to

lim

N→∞

= lim

N→∞

n=1

law

= n

∞

+,η

∞

+ (1 − n

∞

)δ

−,η

∞

(2.42)

The variable n

∞

is a random variable independent of η. It is distributed as

∞

x) =

arcsin(

√

x) [51, 53].

Page 42

Page 43

Chapter 3

Gaussian Potts-Hopfield model

In this chapter we study a Gaussian Potts-Hopfield model. Whereas for Ising spins and

two disorder variables per site the chaotic pair scenario is realized, we find that for q-

state Potts spins q(q − 1)-tuples occur. Beyond the breaking of a continuous stochastic

symmetry, we study the fluctuations and obtain the Newman-Stein metastate description

for our model.

3.1 Introduction

The Gaussian Potts-Hopfield model is equal to the Potts-Hopfield model but with Gaus-

sian noise as patterns. What happens for two patterns with Ising or Potts-like neurons

is, surprisingly, that there are infinitely many ground-states. We study the mean-field

Potts model with Hopfield-Mattis disorder, and more in particular with Gaussianly dis-

tributed disorder. This model is a generalization of the Ising version of the model stud-

ied in [11]. It provides yet another example of a disordered model with infinitely many

low-temperature pure states, such as is sometimes believed to be typical for spin-glasses

[33]. In our model, however, in contrast to [11], instead of chaotic pairs we find that the

chaotic size dependence is realized by chaotic q(q − 1)-tuples.

A somewhat different generalization of the Hopfield model to Potts spins was intro-

duced by Kanter in [47] and was mathematically rigorously analysed in [37]. However,

whereas the version we treat here (in which the form of the disorder is the Mattis-

Hopfield one) displays the phenomenon of stochastic symmetry breaking, in which a

finite-spin, “finite pattern” model can end up with chaotic size dependence, and a real-

ization of chaotic n-tuples out of infinitely many “pure states”, we do not see how to

obtain such results in a version of Kanter’s form of the disorder distribution.

We are concerned in particular with the infinite-volume limit behaviour of the Gibbs

Page 44

and ground state measures. The possible limit points are labelled as the minima of

an appropriate mean-field (free) energy functional. These minima can be obtained as

solutions of a suitable mean-field equation. These minima lie on the minimal-free-

energy surface, which is a m(q −1)-sphere in the (e

,··· ,e

)

⊗m

space. This space for

q-state Potts spins and m patterns is formed by the m-fold product of the hyperplane

spanned by the end points of the unit vectors e

, which are the possible values of the

spins. But only a limited area of the minimal-free-energy surface is accessible. Only

those values for which certain mean-field equations hold, are allowed. These equations

have the structure of fixed point equations. We derive them in Chapter 3.4. To obtain

the Gibbs states we need to find the solutions of these equations on the minimal free

energy surface.

The structure of the ground or Gibbs states for Ising spins, where q = 2, and 2

standard-Gaussian patterns ξ, η is known since a few years [11]. Due to the Gaussian

distribution we have a nice symmetric structure: the extremal ground (and Gibbs) states

form a circle. The first time this degeneracy of the ground states due to the rotational

symmetry of the Gaussian’s is mentioned is in [2].

For a fixed configuration and a large finite volume the possible order-para-meter

values become close to two diametrical points (which ones depend on the volume of the

system) on this circle. This chapter treats the generalization of this structure to q-state

Potts spins with q > 2. To have a concrete example, we concentrate on the case q = 3.

It turns out that we again obtain a circle symmetry but also a discrete symmetry, which

generalizes the one for Ising spins. One gets instead of a single pair a triple of pairs

(living on 3 separate circles), where for each pair one has a similar structure as for the

single pair for q = 2. For q > 3 we get

q(q−1)

pairs and a similar higher-dimensional

structure.

Our model contains quenched disorder. It turns out that there is some kind of self-

averaging. The thermodynamic behaviour of the Hamiltonian is the same for almost

every realization. This is the case for the free energy and the associated fixed point

equations, as is familiar from many quenched disordered models. However, this is not

precisely true for the order parameters. We will see that they show a form of chaotic

size dependence, i.e. the behaviour strongly depends both on the chosen configura-

tion and on the way one takes the infinite-volume limit N → ∞ (that is, along which

subsequence).

3.2 Notations and definitions

We start with some definitions. Consider the set Λ

= {1,··· ,N} ⊂ IN

. Let the

single-spin space χ be a finite set and the N-spin configuration space be χ

⊗N

. We

Page 45

(−

√

(−

,−

√

(1,0)

Figure 3.1: Wu representation spin values for q = 3

denote a spin configuration by σ and its value at site i by σ

. We will consider Potts

spins, in the Wu representation [76]. Each of the possible q values provides a spin-

vector e

. The i-th coordinates are given by e

i,j

= δ

i,j

. Then the set χ

⊗N

is the N-fold

tensor product of the single-spin space χ = {e

,··· ,e

}. The e

are the projections of

the spin-vectors e

on the hypertetrahedron in IR

q−1

spanned by the end points of e

So every spinvalue σ

is represented by the projection vector e

For q = 3 we get for example for e

, e

and e

the vectors of Figure 3.1. We have set

the projection of the origin (0,0,0) to (0,0) and rescaled the projection of e

to (1,0).

The Hamiltonian of our model is defined as follows:

−βH

k=1

i,j=1

δ(σ

,σ

)

(3.1)

with

δ(σ

,σ

) =

[1 + (q − 1)e

· e

]

(3.2)

where ξ

is the i-th component of the random N-component vector ξ

. For the ξ

’s we

choose i.i.d. N(0,1) distributions. The vectors ξ

= (ξ

,··· ,ξ

), by analogy with the

standard Hopfield model, are called patterns. If we combine the above, we can rewrite

the Hamiltonian H

as:

−βH

= β

q − 1

k=1

i=1

q − 1

i=1

(3.3)

Page 46

So asymptotically

− βH

= N

k=1

with K = 2β

q − 1

and order parameters q

i=1

(3.4)

The last term in (3.3) inside the brackets is an irrelevant constant; in fact it approaches

zero, due to the strong law of large numbers. Note that for the infinite pattern-limit

m → ∞ the Hamiltonian is still of the same form asymptotically. (The ξ

’s are i.i.d.

N(0,1) distributed so IEξ

= 0.) Note that any i.i.d. distribution with zero mean,

finite variance and symmetrically distributed around zero will give an analogous form

of H

, but we plan to consider only Gaussian distributions, for which we will find that

a continuous symmetry can be stochastically broken, just as in [11]. From now on we

drop the subscript N to simplify the notation, when no confusion can arise.

3.3 Ground states

Now it is time to reveal the characteristics of the ground states for the Potts model. First

we discuss the simple behaviour for 1 pattern. Then the more interesting part: q > 2

and 2 patterns.

3.3.1 Ground states for 1 pattern

For one pattern ξ the Hamiltonian is of the following form:

−βH

= N

i,j=1

δ(σ

,σ

)

(3.5)

We easily see that the ground states are obtained by directing the spins with ξ

> 0

in one direction and the spins with ξ

≤ 0 in a different direction. If we have as the

distribution for the ξ

’s P(ξ

= ±1) =

, then the order parameter is of the form:

− e

), with 1 ≤ i, j ≤ q and i = j, see also [27]. So for q = 3 we have

only 6 ground states. They form a regular hexagon:

±3/4,

√

3/4 , ± 3/4,

√

3/4 , 0,±

√

3/2

(3.6)

This regular hexagon with its interior is the convex set of possible order parameter

values. It is easy to see that for ξ

N(0,1)-distributed we get the same ground states

except for a scaling factor 2/π multiplying the values of the order parameter values.

Page 47

3.3.2 Ground states for 2 patterns

The Hamiltonian for 2 patterns (Gaussian i.i.d.) is:

−βH

i,j=1

(ξ

+ ξ

)δ(σ

,σ

) = N

+ q

)

(3.7)

Similarly as in [11], we make use of the fact that the distribution of 2 independent

identically distributed Gaussians has a continuous rotation symmetry. This symmetry

shows also up in the order parameters.

Ising spins

First we consider Ising-spins (i.e. we take q = 2). In [11] it is proven that the ground

states are as follows. The order parameters become ±(r cosθ,r sinθ), with θ ∈ [0,π)

and r = 2/π. Note that there are uncountably many ground-states.

This can be made plausible by the following observations. Note that the random

fields {sign(ξ

)} are equally distributed as standard Hopfield-patterns:

P(sign(ξ

) = ±1) = 1/2. So if we choose σ such that for each i: σ

= sign(ξ

), then

we obtain the state with corresponding order parameters (r ,0) for the limit N → ∞,

which is the ground state configuration corresponding to θ = 0. The spin-configuration

= sign(ξ

) for all i corresponds to θ = π/2. By the global spin-flip symmetry of the

Hamiltonian we obtain the ground-states corresponding to θ = π and θ = 3π/2.

But what about the θ values in between? The set of Gaussian patterns has a contin-

uous rotation symmetry. We obtain two new patterns for which we multiply the patterns

with a rotation matrix, i.e. rotating the patterns over an angle θ (with 0 ≤ θ < π/2):

(θ)

cosθ

sinθ

sinθ −cosθ

(3.8)

The corresponding order-parameters we define as q(θ). To obtain the original patterns

from η

(θ) and η

(θ) simply perform the rotation again:

cosθ

sinθ

sinθ −cosθ

(θ)

(3.9)

By the rotation (3.8) of the standard Gaussian patterns ξ

and ξ

we obtain two new

patterns η

and η

which again are Gaussian distributed. Note that IEη

(θ) = IEη

(θ) =

IEξ

= IEξ

= 0. Furthermore the variance of η

(θ) and η

(θ) is the same as for ξ

and

, i.e. 1. Therefore the distribution of the rotated patterns η

(θ) and η

(θ) is the same

as for the old ones, namely standard N-multivariate Gaussian. It is easily checked that

Page 48

each η

(θ) and η

(θ) are uncorrelated and because they are both Gaussian they are also

independent.

For any θ it holds

+ ξ

= η

(θ)η

(θ) + η

(θ)η

(θ)

(3.10)

By this it follows that the energy of the configurations

σ(θ) = {sign(η

(θ))}

(3.11)

are the same in the limit N → ∞ and therefore ground states. This we see by calculating

the two corresponding energies:

−βH

(σ(0)) =

i=1

|ξ

| +

i=1

sign(ξ

) =

2β

+ O β/N ,

−βH

(σ(θ)) =

i=1

+ ξ

sign(η

(θ)η

(θ)) =

i=1

(θ)η

(θ) + η

(θ)η

(θ) sign(η

(θ)η

(θ)) =

2β

+ O β/N

(3.12)

So in the limit indeed it holds

lim

N→∞

(σ(θ)) = H

(σ(0)) for all θ

(3.13)

This means that we have an uncountable number of ground-state configurations in the

limit N → ∞.

Structure of the order parameters

Now we look what this symmetry does mean for the order parameters. We consider

the Gibbs measure with the original patterns ξ

. Take a configuration σ(θ). The cor-

responding order-parameters we denote by q(θ). Rewriting the patterns ξ

into η(θ)

according to (3.9) gives

(θ)

i=1

cos(θ)η

(θ) + sin(θ)η

(θ) sign(η

(θ))

i=1

sin(θ)η

(θ) − cos(θ)η

(θ) sign(η

(θ))

cosθ

sinθ

+ O

1/N

(3.14)

Page 49

because sign(η

) is independent of η

. The O

1/N term is in general different

for different θ. However the equality q(θ) = −q(θ + π) is exact because η

(θ) =

−η

(π + θ). Because of this the energy of the configurations σ = sign(η

(θ)) and

σ = −sign(η

(θ)) = sign(η

(π + θ)) are also the same. In [11] it is proven that for

finite N only for one pair (θ

(N) and θ

(N) + π) the energy is in its global minimum.

The value of θ

(N) depends on the system size.

When we add to the Hamiltonian the term −( /N)

i=1

(θ

)σ

, with > 0 and

fixed, the degeneracy of the ground-states is broken even when N → ∞. Now only

the configuration {sign(η

(θ

))} is a ground-state, i.e. q = 2/π(cosθ

,sinθ

). For β finite and large enough the same holds in the limit N → ∞ but with

r (β) instead of 2/π. This also corresponds to the results proven in [11].

Potts-spins

For obtaining the ground-states in case of Potts-neurons we perform the same strategy as

for the Ising-neurons. We consider the distributions {sign(η

(θ))}. The corresponding

ground-states configurations σ(θ) we obtain as follows. If sign(η

(θ)) = 1 we set

= k. When sign(η

(θ)) = −1 we set σ

= k , with k = k and k,k ∈ {1,··· ,q}.

This gives us q(q − 1) possible values for the order-parameters q(θ) for each θ, the

so-called discrete symmetry. If we look carefully at the values of q(θ) we see that when

we take the union of q(θ) over all θ the resulting curves consist of q(q − 1)/2 circles in

the order-parameter space. This provides the continuous symmetry of the ground-states

which originates from the continuous rotational symmetry between the two Gaussian

patterns ξ

and ξ

We take one of the q(q − 1) values by considering the ground-state configurations

sign(η

(θ)) = 1 → σ

= e

, sign(η

(θ)) = −1 → σ

= e

(3.15)

In the same way as (3.14) we obtain for q

(θ) by using independence

(θ) =

cos(θ)

N/2

i=1

|η

(θ)|

−

√

+ O

1/N

= cos(θ)

−

√

+ O

1/N

(3.16)

and

(θ) = sin(θ)

−

√

+ O

1/N

(3.17)

Page 50

By considering the other possibilities we obtain all the six discrete points. These have

the q

-coordinates of (3.6) multiplied by the factor

2/π. By rotating we obtain the

circles. Because q(θ) = −q(θ + π) we obtain the same structure as for the Ising-spins.

The same is true for the order-parameters resulting from the Gibbs-states.

Without much effort this is also seen to be true for infinitely many patterns (as long

as their number grows logarithmic compared to the system size). However the precise

structure of the Gibbs-states is not proven yet but still being investigated.

This is an example of chaotic size dependence, based on the breaking of a stochas-

tic symmetry, of the same nature as in [11]. Because of weak compactness, different

subsequences exist whose q(q − 1)-tuples of ground states converge to q(q − 1)-tuples,

associated to particular θ-values. These subsequences depend on the random pattern

realization. See Section 3.5.

For any finite m ≥ 3 patterns one has the same discrete structure as before, but in-

stead of a continuous circle symmetry we have a continuous m-sphere symmetry (iso-

morpic to O(m)). The case of an infinite (that is, increasing with the system) m is

still open. However the limit meta-state structure of the Gibbs-states when considering

infinite sequences in N is more complicated.

3.4 Positive temperatures

In this section we obtain an expression for the free energy which is maximized over

the order parameters q

. By large deviation arguments we relate this expression and

therefore the free energy to the average of the energy over the induced measure of the

order parameters q

3.4.1 Fixed-point mean-field equations

Remember

= Tr

exp N

k=1

(3.18)

Due to the quadratic dependence on q

this is hard to compute. Therefore we like to

linearize the terms in the exponential. For this we use the following identity:

2π

∞

−∞

dm e

−Nam

/2+

√

Namx

(3.19)

Note

q−1

i=1

(3.20)

Page 51

So if we set x =

√

and a = K we obtain

exp N

q−1

i=1

2π

∞

−∞

exp −KNm

/2 + KNm

(3.21)

Applying it for every m order-parameters q

and putting the result into Z

we obtain

= Tr

k=1

2π

q−1

exp −KNm

/2 + KNm

· q

(3.22)

This transformation is called the Hubbard-Stratonovich transformation. Notice that the

dependence on q

now is linear. Because N → ∞ the integral behaves like its maximal

value. Maximizing the exponent in (3.22) gives the saddle point equations for m

∂

∂m

−KNm

/2 + Km

· Nq

= 0 →

−KNm

+ KNq

= 0 → m

= q

(3.23)

Further rewriting of (3.22) gives that the partition function Z

is equal to

2π

m(q−1)

···dm

exp − KN

k=1

/2 +

N log Tr

exp

k=1

· ξ

,···ξ

(3.24)

Now we maximize this exponent. Using both equations gives the so-called fixed-point

mean-field equation. Maximizing and putting m

= q

(the first equation) give the

mean field equations for the order parameters which have the structure of a system of

fixed point equations q = F(q). When we have only two patterns ξ

and ξ

these are as

follows:

{ξ

exp [K(ξ

+ξ

)·

]}

{exp [K(ξ

+ξ

)·

]}

,ξ

{η

{exp [K(ξ

+ξ

)·

]}

{exp [K(ξ

+ξ

)·

]}

,ξ

(3.25)

3.4.2 Induced measure on order parameters

Now we try to find an expression which in the infinite neuron limit equals the induced

Gibbs measure L

∞,β

on the order-parameters. For this end we calculate the free energy

by using the Laplace method. When we look carefully at the integrand in (3.22) we see

−βf(β) = lim

N→∞

log Z

= max

(−Q(m) + c(Km))

(3.26)

Page 52

where c(m) is the generating function of the pattern distributions:

c(m) =

k=1

ln IE

exp(ζ

· e

)

(3.27)

Because

Q = Km

/2, Q(m) = Km

(3.28)

From these solutions we can also read out the fixed point equations. When we differen-

tiate (3.28) to m componentwise we get

Q(m) = K c( Q(m)) ⇒ m = c( Q(m))

(3.29)

This we can relate to the rate-function c (t), which is the Legendre transform of c(m):

c (m) = sup

[m · t − c(t)]

(3.30)

For fixed m the vector t has to be such that m = c(t). But because of the fixed point

equations m = c( Q(m)). Therefore t = Q(m). So

c (m) = m · Q(m) − c( Q(m))

(3.31)

Insert this into (3.26) to obtain

−βf(β) = max

(Q(m) − c (m))

(3.32)

For N → ∞ the equation m = q holds. This gives that

lim

N→∞

log Z

= lim

N→∞