This is the html version of the file http://www.cs.cmu.edu/~eairoldi/phdthesis/edo.phd-proposal.pdf.
G o o g l e automatically generates html versions of documents as we crawl the web.
To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:8zqftx9UL30J:www.cs.cmu.edu/~eairoldi/phdthesis/edo.phd-proposal.pdf+%22curved+exponential+family%22+%22naive+bayes%22&hl=en&gl=us&ct=clnk&cd=3


Google is neither affiliated with the authors of this page nor responsible for its content.
These search terms have been highlighted: curved exponential family naive bayes 

Page 1
DRAFT
DO NOT DISTRIBUTE
Hierarchical Bayesian Mixture Models of Graphs and Networks
A Probabilistic Approach to Complexity via Motifs, Dynamics & Integration
Ph.D. Thesis Proposal
Edoardo M. Airoldi
edo@cmu.edu
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213 USA
February 2006
1

Page 2
DRAFT
DO NOT DISTRIBUTE
Foreword
Over the past three years I have been working on probabilistic models of bipartite graphs and dynamic
networks. My previous work suggests a probabilistic framework and a general modeling approach to
complex graphs and networks, which is based on the four concepts of motifs, mixtures, dynamics, and
integration. In this thesis, I propose to present such a framework and discuss its properties. In particular,
the main goal of this work will be to establish the essential elements of a formal model of complexity
which allow us to reconcile the global properties of a system with local phenomena of interest, in a
generative fashion.
My solution to the global/local trade-off is to express complexity through hierarchical mixtures of
simple patterns, i.e., motifs, that evolve over time. Complex global behavior emerges from the com-
bination of local interaction patterns and their dynamics. I will discuss and demonstrate how this new
framework incorporates, generalizes, and extends other probabilistic approaches present in the literature,
in what sense it provides a “soft” and probabilistic version of deterministic approaches, and why it lays
the statistical foundations of the theory of (random) graphs and networks.
The major part of my effort will thus be devoted to modeling issues that arise in conjunction to
the four essential aspects listed above. I plan to investigate theoretical and computational issues, and
characterize the geometrical intuitions underlying the allocation task, a major inference objective shared
by the various models encompassed by this framework.
2

Page 3
DRAFT
DO NOT DISTRIBUTE
Abstract
In this thesis I propose computational and modeling approaches to summarize the complexity of
complex systems in terms of mixtures of simpler latent patterns, which can be described as probabilistic
functions of few parameters that evolve over time according to simple laws.
Complex systems are ubiquitous and they underly many applications. In this thesis I plan to focus on
the generic complex system problem involving sets of objects of different types that relate to one another
over time; relations are measured on pairs of objects both within and across types and, which are pos-
sibly multivariate. Biological sequences and networks related to an organism, dynamic communication
networks within a group, and the research and publication processes in the scientific community at large
provide examples of complex systems that I will discuss in this thesis.
Being able to summarize the information in such complex systems in terms of simple patterns and
corresponding laws of evolution that correlate to phenomena of interest, e.g., topics, functional annota-
tions, formal and informal role in operational processes within an organization, is a fundamental chal-
lenge arising in diverse applications. Feasible solutions could foster automation and knowledge discov-
ery in large biological and genetic databases, and they could lead to new e-business applications, e.g., in
summarization, search, and strategic planning. Statistical foundations for the treatment of such systems
could advance our understanding of real world phenomena and anomalies, and they would give us the
tools to cope with the information overload we face in a systematic manner.
I plan to subsume models of complex systems that were developed for specific application areas into
a general modeling framework. There are four aspects are crucial to most of them: (1) the presence
of a hierarchical structure in the likelihood, which includes both observable and non-observable random
quantities, (2) the mixed membership assumption, according to which objects may participate in multiple
latent patterns to different degrees, (3) the existence of multiple data types along with the availability of
possibly partial measurements about their interactions, and (4) the temporal dimension.
My modeling framework grows out of my recent research efforts in application areas such as bi-
ological sequences and networks, dynamic communication networks, and large collection of scientific
publications. These efforts suggest a probabilistic framework and a general modeling approach to com-
plex systems. This thesis will present such a framework and discuss its properties. In particular, the main
goal is the reconciliation of global properties of complex systems with local phenomena of interest. My
solution is to express complexity through hierarchical mixtures of simple patterns that evolve over time,
i.e., complex global behavior emerges from the combination of local interaction patterns and dynamics. I
will discuss and demonstrate how this new framework generalizes other probabilistic approaches present
in the literature, provides a “soft” and probabilistic version of the deterministic approaches, and lays the
foundations of a statistical theory of graphs and networks. The major part of my effort is then devoted to
modeling issues that arise in conjunction to the four essential aspects listed above, e.g., integration and
dynamics. I conclude with the investigation of theoretical and computational issues, and with a charac-
terization of the geometrical intuitions underlying the allocation task, a major inference objective shared
by the various models encompassed by this framework.
Keywords:
Hierarchical models of mixed membership; Generative models; Pure topology types;
Independent integration; Conditional integration; Generalized state-space dynamics; Order-dependent
likelihoods; Latent birth-death processes; Diffusion processes; Exchangeability; Identifiability; Model
selection; Spectral and frequency decompositions; Loss functions; Gaussian ensembles; Sub-sampling;
Latent allocation as projection; Shrinkage; Approximate inference techniques.
3

Page 4
DRAFT
DO NOT DISTRIBUTE
Contents
1 Introduction
5
1.1 Impact and Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.2 A Case for Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2 A Brief History of Modeling Approaches to Graphs and Networks
8
2.1 Static Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.2 Dynamics and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Building Graphs from Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Inadequacies of the Current Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Overview of the Research Agenda
14
3.1 Research Work to Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Proposed Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Time to Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4

Page 5
DRAFT
DO NOT DISTRIBUTE
1 Introduction
In this thesis I propose computational and modeling approaches to summarize complex systems in terms of
mixtures of simpler latent patterns, which can be described as probabilistic functions of few parameters that
evolve over time according to simple laws, which are organized in a hierarchical fashion.
Complex systems are ubiquitous and underly most applications. They are defined as sets of objects of
different types that relate to one another over time; relations are measured on pairs of objects both within
and across types and, and are possibly multivariate. Biological sequences and networks related to an organ-
ism, dynamic communication networks within a group, and the research and publication processes in the
scientific community at large provide examples of complex systems that I plan to discuss in this thesis.
Example 1. A temporal sequence of bipartite graphs where the two types are documents and words can
be summarized in terms of few non-observable “topics” that evolve over time. We can enrich the system by
introducing observations about documents’ authors and obtain a bipartite graph of authors and words, and a
co-authorship general graph. Next we can consider references and the corresponding co-citation graph, and
so on. The result is a multifaceted view of the “scientific publications process” in terms of graphs that encode
possibly multivariate relations among objects of different types, i.e., authors-to-authors, authors-to-words,
authors-to-papers, words-to-papers, and papers-to-papers. The system can still be summarized in terms of
few non-observable “topics”, “areas of expertise” and “collaboration clusters” that evolve over time, along
with structural relations among the object types dictated by the semantics of the publications process.
Example 2. Another set of important examples is given by the collection of biological sequences and
networks related to different organisms. In this case, an important goal of the analysis is that of annotating
proteins with the function(s) they participate into, so the observations regarding the object type “functional
annotation” may play a more prominent role in the summarization. A set of graphs that encode protein-
protein interactions under several experimental conditions can be summarized in terms of non-observable
“functional processes” and non-observable process-to-process “interaction patterns.” Both of these sum-
maries, along with partially available functional annotations, can then be used to predict the annotations of
new proteins. Alternatively, both interactions and annotations can inform functional processes. Or we may
grant annotations a more prominent role by having them inform functional processes for each protein first,
and use the interactions to fine tune them. We can then introduce gene expression data, and so on. Again,
objects of multiple types (proteins, annotations, experimental conditions) and the relations among them out-
line an interacting system that changes as exogenous conditions change. In this example, the data consists of
bipartite graphs (proteins and functional annotations, proteins and experimental conditions for gene expres-
sion) and graphs in general (protein-protein interactions for each experimental condition—note that these
experimental conditions may differ from those under which the gene expression data were obtained).
Example 3. Yahoo! Inc. has recently created the position of “Chief Data Officer” to as the company
recognized “the strategic value of data and wanted data to have a voice at the executive table” rather than
seeing it a mere IT service (Piatetsky-Shapiro, 2005). At one of the primary international forums on ma-
chine and statistical learning held in December 2005 (Advances in Neural Information Processing Systems
in Vancouver) preliminary evidence was presented that suggests that learning systems based on large scale
databases may deliver satisfactory solutions to portions of real-world problems, such as those that arise in
the natural language processing applications (Holzle, 2005). A recent review of progresses and challenges
in link mining written by Senator (2005) identifies the need for “integrated systems” as one of the challenges
ahead. It is my opinion that this will be achieved by integrating different data types as well as perspectives
5

Page 6
DRAFT
DO NOT DISTRIBUTE
of different disciplines, rather than just integrating methods in the same discipline, which often boil down
to different approaches to the analysis of similar data. Being able to summarize the information in these
systems in terms of simple patterns and corresponding laws of evolution that correlate to phenomena of
interest, e.g., topics, functional annotations, formal and informal role in operational processes within an
organization, and so on, is a fundamental problem of science and applications today. In general, summa-
rization is sought as a mean to facilitate organization, sorting, ranking, and search. Getoor and Diehl (2005)
organize the typical goals of the analysis of complex systems according to whether they relate to the ob-
jects in the system (ranking, classification, clustering, classification), to the links (prediction, test), or to the
entire graphs that encode the interactions (sub-graph discovery, classification, generative models). Ideally
we would want an “intelligent” summary, i.e., a summary that help us organize, enhance search, ranking,
prediction of association, missing links, and at the same time maximizes predicting power for some task, or
fosters information discovery, (Airoldi et al., 2006c).
Feasible solutions to the generic tasks of summarization and complexity reduction would foster au-
tomation and knowledge discovery in large biological and genetic databases, would lead to new e-business
applications, e.g., in summarization, search, and strategic planning. Statistical foundations for the treatment
of such systems would advance our understanding of significant phenomena and anomalies that take place
in our world, and would give us the tools to cope with the information overload we face in a systematic
manner (Reddy, 1996, 2003).
1.1 Impact and Vision
This thesis presents a theory for modern applications; modeling choices and substantive issues are at the
forefront of the discussion only to the extent that they respond to needs that are rooted in the data. This is
possible because of the central role played by few essential characteristics in many complex systems.
1. On the application side there is an increasing amount of information available with a temporal or
sequential dimension to require methods that explicitly account for dynamics and evolution in the phenom-
ena under scrutiny. The approaches available to date for which solid inference mechanisms are available
include hidden Markov models and generalized state-space models. New approaches are needed, which can
account for richer evolutionary patterns. Further, there is a need for predictions based on several sources of
information that need be integrated, and a solid probabilistic approach is missing.
2. The models that can be expressed by this framework are extremely diverse and widely applicable.
Many scientific studies lead to data sets that are represented as graphs, at some level, e.g., two-mode data
lead to bipartite graphs, uni-modal data to graphs where we record relations between pairs of objects of
the same type, multi-mode data where we record relations among objects of multiple types, multi-graphs
where edges encode multivariate variables, and combination of these. Assumptions and intuitions of interest
may need be incorporated in application-specific models, but the modularity of our approach makes these
”special” modeling issues a piece of the puzzle that can be introduced on top of a basic model skeleton.
3. On the theoretical side there is a tremendous need for solid foundations. Researchers in the com-
puter science community are engaged in a dangerous (incremental) race, whereas fundamental issues such
as identifiability, model selection, distribution free tests for assessing the goodness of fit, the geometrical
understanding of allocation tasks, or the asymptotics of the family of hierarchical mixture models are com-
pletely ignored. Even limited explorations of these issues would advance the scientific understanding of the
methods. This will ultimately benefit applications by providing theoretical insights to support application-
6

Page 7
DRAFT
DO NOT DISTRIBUTE
specific modeling choices.
The grand vision is to establish a mature statistical theory of graphs and networks, to bridge theoretical
computer science, a largely deterministic discipline, and statistical learning by explicitly characterizing the
relation between deterministic and probabilistic solutions to problems that involve graphs and networks
shared by both disciplines. The ultimate goal is that of promoting the role of statistics in the computing
sciences and its modern applications.
1. On the statistics side, this work would provide solid foundations of a theory of hierarchical mixture
models for graphs and networks. This is missing, to the best of my knowledge, and it is worth pursuing in
its own right.
2. On the computer science side, there are multiple benefits; mostly it would promote the role of Bayesian
statistics in the theoretical computer science and data mining communities, by providing new models and
perspectives in applications of primary importance—biological sequences & networks, dynamic graphs &
networks, collections of scholarly publications, knowledge and corporate networks, linkage and homeland
security. This work will possibly foster a fast scientific progress by serving as a glue for several branches of
the literature that are poorly aware of one another.
In conclusion, recent trends and events suggest an imminent shift of focus of the research community
at large towards complex interacting dynamic systems, along with a rediscovered mindset that through
integration we can finally deliver satisfactory solutions to long-standing real-world problems, and create
new applications. This thesis will present theoretical musings derived from applications for applications,
in order to provide us with insights and understanding on when and why the methods we employ work on
a case by case basis. Specifically, I will discuss models and methods that enable applications to biological
databases, scholarly publications, and knowledge and corporate databases. I will present successes stories
where my methods were key to answer real-world problems. I will introduce a theory of these complex
systems and discuss how this is crucial to the success of such endeavors.
1.2 A Case for Probabilistic Models
In the past century, statistical theory has transformed the social, biological and medical sciences, (Fienberg,
1999). A substantial portion of the recent successes in algorithmic and theoretical computer science has
been achieved by means of probabilistic methods and randomized algorithms, (Habib et al., 1998). There is
little doubt that a modern theory of complex systems should have solid foundations in statistical theory.
One major advantage of a probabilistic perspective is the central role played by models in separating
associations and patterns from the typically large amount of noise that characterizes complex systems. Fur-
thermore, for every deterministic algorithm it is typically possible to posit a probabilistic model, which has
something akin to a maximum likelihood solution that coincides with the outcome of the deterministic algo-
rithm,
1
and in practical applications soft constraints typically are more desirable than hard constraints, e.g.,
must link, must-not link.
Being able to distinguish between the presence/absence of associations is very different than being able
1
The differences here are two. Computationally, the deterministic algorithm does not “spend time” to assess the quality of its
answer, whereas a probabilistic methods encodes a scoring function, i.e., typically a penalized likelihood, that is used to assess the
quality of its answer, and it takes a little more computation. Analytically, the constraints imposed by a probabilistic models are soft
rather than hard; the latter can be both an advantage or a disadvantage, depending on the implications for the application at hand.
7

Page 8
DRAFT
DO NOT DISTRIBUTE
to assess whether the association suggested by the algorithm may be the outcome of pure chance. This is
an important difference, that is crucial in applications where an automated decision informs the actions of
human personnel, e.g., airport security. Similarly, manually curated data in biology and genetics tend to
be expensive both in terms of time and resources, and it is common practice to use automated screening
methods to rank the measurements of the data gathered via high-throughput techniques before proceeding
with manual testing.
I conclude by invoking the arguments that the nature of human reasoning is probabilistic (Evans et al.,
1993; Leighton and Sternberg, 2003), and that a statistical theory of complex systems founded on proba-
bilistic generative models is natural (Tenenbaum, 1999) and easy to interpret. For example, the way we
understand “social closeness” is through a cascade of implications that hold often times, rather than ex-
actly. In the proposed framework of this thesis, the cascade of implications is translated into a hierarchy of
probabilistic assumptions, and a notion of distance is then obtained from the model, in some way.
2 A Brief History of Modeling Approaches to Graphs and Networks
Models for graphs of various types are scattered across research in the social, physical, mathematical, sta-
tistical, and computing sciences. In this review of the literature, I emphasize those statistical models that
attempt to express the dependencies between objects in the system, in some sense.
Notation. We denote random quantities (scalars, vectors and matrices) with uppercase letters and their
realizations with lowercase letters, e.g., Pr (Y = y) = 1, and we will specify the dimensionality of various
quantities in the text. A graph is a pair G = (V,E) where V vertices, which correspond to objects of interest,
and E edges. Graphs represent relations measured on pairs of vertices; we index the random quantities that
encode such (possibly multivariate) measurements consistently, e.g., Y
mn
. If measurements related to more
than two vertices we will expand the subscript list accordingly, e.g., Y
mno
denotes the random quantity
that encode an observation of three vertices. We will use superscript to denote elements of a multivariate
random quantity, e.g., if Y
mn
is a matrix we will denote its generic element with Y
[ij]
mn
. Although the
temporal dimension can be seen as just another dimension in the random elements at hand, e.g., the generic
element of a sequence of adjacency matrices can be denoted with Y
[ij1[
mn
,Y
[ij2]
mn
,... ,Y
[ijT]
mn
, it is convenient
for our purposes to decouple the temporal dimension from the other dimensions. To this extent, we will
denote the generic element of a temporal sequence of adjacency matrices with a separate set of indices with
Y
[ij](1)
mn
,Y
[ij](2)
mn
,... ,Y
[ij](T)
mn
, and the sequence of matrices as Y
(1)
mn
,Y
(2)
mn
,... ,Y
(T)
mn
. Throughout the text we
will reserve the letter Y to denote measurements on sets of vertices and the letter X to denote measurements
on the the vertices themselves; one major distinction in the models discussed below is, in fact, the search
for patterns among objects, where node ID matters, versus the search for patterns among edges, where the
similarity of patterns is independent of the sets nodes involved. In those few cases where the semantics
of the problem require a separate treatment for different types of measurements associated with vertices or
edges special notation will be introduced.
2.1 Static Graphs
The works in this section take as input measurements about objects that start from a network as given.
8

Page 9
DRAFT
DO NOT DISTRIBUTE
Exponential Random Graph Models. Under the assumption that two possible social ties are indepen-
dent only if a common actor is involved in both
2
Frank and Strauss (1986) devised the following character-
ization for the probability distribution of undirected Markov graphs.
P
θ
{Y = y} = exp
n−1
k=1
θ
k
S
k
(y) + τT(y) + ψ(θ,τ)
y ∈ Y,
(1)
where the statistics S
k
and T count specific structures, such as edges, triangles, and k-stars, {θ
k
} = θ and
τ are the parameters, and ψ(θ,τ) is the normalizing constant. Frank and Strauss (1986) worked mainly
with the three parameter models, where θ
3
,... ,θ
n−1
= 0. They proposed the pseudo-likelihood estima-
tion method to estimate the complete vector of parameters by maximizing the following pseudo-likelihood
function.
ℓ(θ) =
i<j
log P
θ
{Y
ij
= y
ij
|Y
uv
= y
uv
for all u < v,(u,v) = (i,j)} .
(2)
Wasserman and Pattison (1996) proposed the current formulation of Exponential Random Graph Models
(ERGM), also referred to as p
models, as a generalization of the Markov graphs of Frank and Strauss. For
both directed and undirected graphs, they maintain a similar characterization of the probabilities where the
statistics S
k
and T are substituted for arbitrary statistics U. This leads to the probability functions of the
form
P
θ
{Y = y} = exp θ
u(y) − ψ(θ) .
(3)
More recently, Snijders et al. (2004) have proposed a variant of these models where the major problem
of double-counting
3
is mitigated, but not overcome. Hunter and Handcock (2004) propose an alternative
estimation scheme that corrects parameter estimates for double-counting. This estimation procedure can be
used for models based on distributions in the curved exponential family. Park and Newman (2004) formally
characterize sensitivity issues.
Remark A. It is possible to express the current formulation of exponential random graphs using the
formalism of undirected graphical models, let us write the likelihood of an arbitrary undirected graph.
p(x|θ) =
c∈C
ψ(x
c
c
)
z
,
(4)
where x
c
denotes the nodes in clique c, θ
c
denotes the corresponding set of parameters, ψ are non-normalized
potentials over the cliques, and z =
c∈C
c∈C
ψ(x
c
c
) is the normalization constant. If the likelihood is
in the exponential family, we can write:
p(x|θ) = exp θ
u(x) − log z .
Remark B. Models in this family are not “generative models” in that no assumptions are present to
explain how the sufficient statistics are generated. However, it is possible to posit a generative model that
includes exponential random graph models, or any other conditional model, as part of the emission model
(Airoldi et al., 2006c).
2
This is the intuitive definition of Markov property for spatial processes on a lattice in Besag (1974).
3
The statistics S
i
(y) count graph structures. Although they are not independent, i.e., they count overlapping sets of edges, they
are assumed independent in the pseudo-likelihood. Ignoring the correlations is a bad idea, and causes extreme sensitivity of the
predicted number of edges to small changes in the value of certain parameters.
9

Page 10
DRAFT
DO NOT DISTRIBUTE
Latent Variable Models. The notions of equivalence, structural equivalence, and blocks are introduced
by Lorrain and White (1971) and further explored by many, notably by Faust (1988). A comprehensive
treatment of models that use blocks to express the complexity of the data is given in Doreian et al. (2004). A
summary of models and notions relevant for social networks developed in the social sciences can be found
in Wasserman and Faust (1994).
Stochastic block models, the probabilistic treatment of blocks, have appeared early in the statistical sci-
ences (Holland and Leinhardt, 1975) and widely studied (Fienberg and Wasserman, 1981; Fienberg et al.,
1981; Holland et al., 1983; Fienberg et al., 1985; Wang and Wong, 1987; Wasserman and Anderson, 1987;
Anderson et al., 1992) and recently rediscovered (Snijders and Nowicki, 1997; Nowicki and Snijders, 2001),
including non-parametric treatment of the number of blocks (Kemp et al., 2004), and integration with non-
relational information to infer the blocks (Wang et al., 2005). A general stochastic block model of mixed-
membership has been recently proposed (Airoldi et al., 2005a, 2006d), along with a framework to integrate
external information of different types (Airoldi et al., 2006c), that relaxes the historical assumption of single-
membership of objects to blocks, and estimates block-to-block connectivity patterns in a Bayesian fashion.
Remark C. A general framework for integration of a different nature is described by Carley (2002).
An alternative approach latent space models, where observed interactions are projected on a latent space
through a generalized linear model (Hoff et al., 2002; Hoff, 2003a,b). Hoff et al. (2002) use MCMC to infer
latent space positions, treated as hyper-parameters. Hoff (2003a) specifies a Gaussian prior over the latent
space, thus giving to the model fully generative flavor, with the goal of modeling reciprocity.
Remark D. It is possible to posit a generative model that includes generalized linear models as part of
the emission model (Airoldi et al., 2006c). The connection between stochastic block models (SBM) and
latent space models (LSM) is more subtle, though.
Both SBM and LSM seek to define a conditional probability distribution for relations {y
nm
} among
actors in a way that reflects some latent semantics (i.e., roles, topics, functions, etc.) of the actors. Let Z
n
denote a latent variable capturing the latent semantic representation of the actor n, the SBM usually defines
a generative model y
nm
∼ f(·|Z
n
,Z
m
), of which the Z’s typically act as indicators of context-dependent
edge generating processes. On the other hand, an LSM maps the observed relation y
nm
to some latent
semantic differences between the two actors via a regression function, of which the Z’s typically represent
the projections of the actors onto some latent metric space where their differences can be measured via
a Euclidian metric. Specifically, in LSM the Zs are multivariate/continuous, e.g., could be drawn from a
mv-normal, and their realizations indicate the position of actors in the latent space. In SBM the Zs are
multivariate/discrete, e.g., could be drawn from a multinomial(θ), and their realizations indicate which
group an actor belongs to, for each observed interaction. In other words, the dimensionality of the Zs in
SBM reflects how many latent groups to be captured in a domain, whereas in LSM the dimensionality of the
Zs does not have an explicit interpretation in terms of groups. In fact, in LSM we need to run a clustering
procedure (e.g., k-means) in the latent space where the actors are projected to, in order to decide how many
groups there are. Thus, the two types of network models are different: SBM focuses on latent membership
of each actor and underlines the importance of modeling the ”grouping” of actors, whereas LSM focuses on
latent distances and therefore stress more on modeling proper projections of actors into a latent manifold.
Hoff’s formulation of LSM is not a soft version of SBM. As a results, SBM and LSM have some orthogonal
advantages in modeling network data.
Remark E. Connections have been highlighted to MDS and other linear methods (Breiger et al., 1975),
10

Page 11
DRAFT
DO NOT DISTRIBUTE
to unsupervised learning, e.g., PCA, FA, (Ghahramani, 2005), and to matrix factorization (Ding, 2005;
Xing and Jordan, 2003).
Spectral Methods. Research on by Gaussian unit ensembles provides a probabilistic connection to
spectral decompositions (Metha, 2004). In the computer science literature, there is a stream of works in
this area well summarized by Saul (2005), who discusses comparison to PCA, MDS and other linear meth-
ods. Briefly, isomap (Tenenbaum et al., 2000), local linear embedding (Roweis and Saul, 2000), laplacian
eigenmaps (Belkin and Niyogi, 2002), Hessian eigenmaps (Donoho and Grimes, 2003), maximum variance
unfolding (Weinberger and Saul, 2004; Weinberger et al., 2004; Sun et al., 2006), conformal eigenmaps
(Coifman et al., 2005a,b; Lafon and Lee, 2006) and its asymptotics (Nadler et al., 2005), and the recent
reformulation of problems and solutions in terms of tensors (He et al., 2005).
Simple Models of Real-World Phenomena and their Mathematical Properties. Much of the research
across communities concerns the study of real-world graphs and their properties with the aim of building
toy models that capture such properties. For example, Newman and Park (2003) study transitivity and as-
sortative mixing (i.e., positive correlation of degrees of adjacent vertices) via group structure; Hoff (2003b)
studies transitivity, reciprocity and balance; Barabasi (2005b) studies burst and heavy tails in human dynam-
ics; Zheng et al. (2005) study the size of individuals’ social networks and means of estimating them from a
certain type of survey questions; and Ganesh et al. (2005) study the effects on epidemics of the topological
properties of graphs.
Research originating in mathematics and physics posit simple algorithms for generating graphs that
replicate observed properties, which are amenable to probabilistic analysis. Bollobas and Riordan (2002)
review few of such algorithms for popular graph types (Barabasi and Albert, 1999; Kumar et al., 2000;
Cooper and Frieze, 2003), and present an extended analysis of the “LCD” model of Buckley and Osthus
(2004). Other notable analytical investigations concern sampling, and asymptotic results. Park and Newman
(2004, 2005) give analytic solutions for the 2-star network and for clustered networks; Milo et al. (2003)
analyze sampling algorithms; Kleinberg and Kleinberg (2005) describe asymptotics of isomorphism and
embedding; Stumpf et al. (2005) find that sub-samples of scale-free graphs are not scale-free, and present
a way to study properties of a sub-sample based on moment generating functions; Flaxman et al. (2005)
describe the behavior of high degree vertices and eigenvalues in scale-free graphs; Chung and Lu (2003)
characterize average distances given expected degrees; and Caldarelli et al. (2004) study the formation of
cycles. A series of works is concerned with models and methods to find “statistically significant” motifs,
i.e., recurring edge patterns over sets of difference nodes (Berg and Lassig, 2004; Shen-Orr et al., 2002;
Milo et al., 2002; Artzy-Randrup et al., 2004; Milo et al., 2004a; Kashtan et al., 2004; Milo et al., 2004b).
Newman (2003a) portrays networks as mixtures of patterns; and Vaszquez et al. (2004) present the only in-
vestigation to date of how global patterns may arise from the composition of local ones. Few comprehensive
reviews are available, which summarize many of these findings (Barabasi et al., 1999; Albert and Barabasi,
2002; Dorogovtsev and Mendes, 2002; Newman, 2003b; Amaral and Ottino, 2004).
A notion that recently captured the attention of funding agencies and high profile journals is that of
“topology types”. Airoldi and Carley (2005) present a review and a critique of such notion. They sur-
vey generative algorithms for random graphs (Erdos and Renyi, 1960), Poisson graphs and others that lead
to heavy tails for the corresponding degree distributions (Simon, 1955; Bollobas, 1985, 2001; Barabasi,
2005b), scale-free graphs (Faloutsos et al., 1999; Barabasi and Albert, 1999; Huberman and Adamic, 1999;
Adamic and Huberman, 2000; Barabasi et al., 2000; Barabasi and Bonabeau, 2003), small-world graphs
(Milgram, 1967; Watts and Strogatz, 1998; Kleinberg, 1999a; Amaral et al., 2000; Liben-Nowell et al., 2005),
11

Page 12
DRAFT
DO NOT DISTRIBUTE
core-periphery graphs (Borgatti and Everett, 1999), and cellular graphs and networks (Frantz and Carley,
2005a; Airoldi and Carley, 2006). Several of these topology types are presented in heuristic terms, vaguely
consistent across communities
4
. Airoldi and Carley (2005) show that the slight differences in the sampling
algorithms, which generate topologies that adhere to the heuristic requirements of a specific type, are not
stable in terms of the topological properties of the graphs they lead to. That is, slight differences in the
operational definitions for the same topology type lead to separable graphs in terms of the set of common
metrics used by practitioners in the various communities
5
. A different set of concerns is explored in “ro-
bustness” studies, which measure the stability of topological properties of graphs and networks of specific
types to disruption and other stress situations (Borgatti et al., 2005; Frantz and Carley, 2005b). These works
are simulation studies that approach the sub-sampling issues discussed above from another perspective.
Remark F. Alternatively, being able to embed the various topology type in a smooth parametric contin-
uum (e.g., Erdos random, to small-world, to ring lattice; see Watts and Strogatz, 1998) would help under-
standing the boundaries. Unfortunately, also this strategy is not practical. There is a potpourri of necessary
conditions that have to be satisfied by such a smooth parametric continuum, which appear in the heuristic
definitions of the various topology types, e.g., the same degree distribution for all nodes or not, or short-
est path as the only notion of distance, or shortest path and metric embedding. Although it is possible to
posit a generative process that satisfies all the necessary conditions, such a generative process relies on a
non-smooth parametric continuum. Specifically, we would need to introduce in such a process a discrete
parameter that controls the number of different “probabilistic treatments” for the nodes, e.g., the number of
degree distributions. The problem is that, on one hand, the value of such a discrete parameter is difficult to
estimate. On the other hand, its correct estimation is fundamental in correctly assigning the topology type.
Ultimately, the diversity in the notions of topology types translates into the hardness of the estimation task,
upon the success of which depends our ability to discriminate among types.
To summarize, we can organize the various works according to few aspects: (a) the notion(s) of distance
between pairs of nodes that are needed; (b) the use, or not, of the descriptive statistics, as well as their
nature, i.e., local versus global; (c) the existence of dependence constraints among neighborhoods; (d)
the focus on node patterns (groups) versus edge patterns (motifs), where we do no distinguish similar edge
configurations among different sets of nodes. These aspects have to be crossed with the nature of the models:
(i) “generative” models and algorithms, both probabilistic and deterministic; (ii) models and algorithms that
contain “generative” ideas, both probabilistic and deterministic; (iii) other models and algorithms.
Problems. More works have introduced methodological innovations in the context of specific problems.
Notable research in this sense concerned how to find communities in networks Girvan and Newman (2002);
Newman (2004a,b), and in bipartite graphs (Mishra et al., 2004). To this extent, Doreian et al. (2004) sum-
marize relevant works in the social sciences and develop a theory of generalized block models. A cluster
of research is about link-mining (Domingos, 2003; Jensen, 1999; Getoor, 2003; Getoor and Diehl, 2005),
graph mining (Chakrabarti, 2005), link prediction (Getoor et al., 2002; Liben-Nowell and Kleinberg, 2003;
Goldenberg and Moore, 2004), and link ranking. (Brin and Page, 1998; Kleinberg, 1999b; Cohen et al.,
1999; Ng et al., 2001). Other notable works are concerned with the information flow within a network; the
emergence of deadlines (Papadimitriu and Servan-Schreiber, 1999), the dynamics of information (Kleinberg,
2001), the dynamics of information exchange (Dodds et al., 2003), how to maximize influence spread
4
A notable survey is that of Mitzenmacher (2004), who presents a brief history of power-laws and lognormal distributions, and
discusses some of their connections from a generative perspective. Newman (2005) discusses the connections among of power-laws,
the Pareto distribution, and Zipf’s law. Airoldi and Shalizi (2006) present a clear analytical overview of these connections.
5
Few works survey network metrics and visualization tools; notables are Carley and Reminga (2004) and (Frank, 2000).
12

Page 13
DRAFT
DO NOT DISTRIBUTE
(Kempe et al., 2003), decentralized information processing (Van Zandt, 1997), and decentralized search
(Kleinberg, 2000, 2004). A practical set of concerns inspired methods for entity disambiguation (Malin et al.,
2005), and classification of relational data (Macskassy and Provost, 2005). Solan et al. (2005) propose a
model to learn grammar-like rules in natural languages.
Empirical Studies. Another portion of research concerns findings that influenced the development of
theoretical aspects. Notable empirical studies include the web (Faloutsos et al., 1999; Albert et al., 1999;
Kleinberg and Lawrence, 2001), air traffic (Guimera et al., 2005), the creative enterprise (Barabasi, 2005a),
scientific collaborations (Newman, 2001), metabolic networks (Guimera and Amaral, 2005), decentralized
search in email network (Adamic and Adar, 2005), transcriptional regulatory network, (Balazsi et al., 2005),
words (Steyvers and Tenenbaum, 2005), the organization within the cell (Barabasi and Oltvai, 2004), poli-
tics (Porter et al., 2005), complex brain networks (Sporns et al., 2004), and more (Newman et al., 2002).
2.2 Dynamics and Evolution
Most existent works focus on static networks, however, there are few that consider methodology to deal with
dynamics and evolution. Notables are the stream of works on cellular automata (Ilachinski, 2001), the early
works on diffusion (Coleman et al., 1957), the treatment of dynamics with Markov-chains Monte-Carlo
(Wasserman, 1980), dynamic random fields on undirected graph (Shalizi, 2003), link-copying processes
(Kleinberg et al., 1999; Leskovec et al., 2005b,a), cascading behaviors (Watts, 2002), network tomography
and latent allocation (Airoldi and Faloutsos, 2004), dynamics in the social space (Banks et al., 2005), and
models that attempt at replicating real-world phenomena such as opinion formation (Wu and Huberman,
2005), and evolution (Doreian and Stokman, 1997).
Empirical Studies. Very few studies exist to guide theoretical developments in a dynamic setting.
Few notables are communication networks (Airoldi, 2003; Airoldi and Faloutsos, 2004), email networks
(Kossinets and Watts, 2006), nucleic acid chain dynamics (Sales-Pardo et al., 2005), and scientific collabo-
rations (Barabasi et al., 2002).
2.3 Building Graphs from Data
The works in this section share the intuition that measurements about objects are inherently noisy; the var-
ious authors attempt to model the uncertainty associated with the measurements in order to make decisions
whether two objects are related or not, and create a graph. A popular approach is that of associating a
random variable with each “object”, e.g., Bernoulli, define the process through which “observations” relate
to binary outcomes, and estimate the parameter of a Bayesian network (Heckerman, 1999) that describes
the observations best, through dependencies among objects. The estimated Bayesian network provides a
probabilistic model for the observed co-occurrences that can be used to predict missing links, or to assess
the likelihood of existing ones, (Getoor et al., 2002; Friedman and Koller, 2003; Heckerman et al., 2004;
Goldenberg and Moore, 2004; Teyssier and Koller, 2005). Important applications based on variations this
approach have been used for building recommender systems (Breese et al., 1998), social networks (Breiger,
2003), and complex cellular networks (Friedman, 2004; Segal et al., 2005).
13

Page 14
DRAFT
DO NOT DISTRIBUTE
2.4 Inadequacies of the Current Research
There are several dimensions that are relevant to statistical analyses of graphs and networks. Unfortunately
no single approach develops, or at least allows for, all of them
Dimensions of interest are: (a) a “proper” likelihood function; (b) the fully generative nature of the
model; (c) replicability of interesting properties at both global and local level; (d) the focus on edges or
nodes; (e) notions of distance and embedding of graphs in a metric space; (f) identifiability issues that need
be explicitly identified; (g) hierarchical relations between dyads, triangles, k-stars, k-triangles and other
basic structural (connected) components that are used to summarize and characterize an observed graph; (h)
dependencies among relevant quantities, i.e., the sufficient statistics, corresponding to a decomposition of
the observed graph into cliques or other structures of interest need be identified; (i) goodness of fit must be
assessed—current models tend to over-fit observed graphs and can not be easily extended as the observed
network grows; (j) the possibility of integrating data on different object types; and other dimensions.
With respect to this last dimension, i.e., integration of multiple object types and data about them, most
existent work tend to concern or assume specific types of data representations, e.g., temporal and sequential
data in attribute space, or relational data represented by graphs or networks. We view learning problems
along this line as “type-specific-learning” problems. Typically, one can develop solutions to type-specific-
learning problems by devising novel domain- and data-specific models and algorithms that leverage domain
knowledge and semantics of interest for particular applications. Integrating heterogeneous data types un-
der a unified model remains a challenge, however, especially for complex graphs that are simultaneously
described by intrinsically different types of characteristics, such as features in attribute space and links in
relational space (Airoldi et al., 2006c).
As we discussed in the previous section, there is a wide range of research questions that an elegant
solution to the issues above may help us answer. It is useful to keep those questions in mind in order to
guide our technical choices.
3 Overview of the Research Agenda
My previous research on these topics suggests a probabilistic framework and a general modeling approach to
complex systems. In this thesis, I plan to present such a framework and discuss its properties. In particular,
the main goal is the reconciliation of global properties of complex systems with local phenomena of interest.
My solution is to express complexity through hierarchical mixtures of simple patterns that evolve over time,
i.e., complex global behavior emerges from the combination of local interaction patterns and dynamics. I
will discuss and demonstrate how this new framework generalizes other probabilistic approaches present
in the literature, provides a soft version of the deterministic approaches, and lays the foundations of a
statistical theory of graphs and networks. The major part of my effort will then be devoted to modeling
issues that arise in conjunction to the four essential aspects listed above, e.g., integration and dynamics.
I conclude with the investigation of theoretical and computational issues, and with a characterization of
the geometrical intuitions underlying the allocation task, a major inference objective shared by the various
models encompassed by this framework.
14

Page 15
DRAFT
DO NOT DISTRIBUTE
3.1 Research Work to Date
Here I briefly summarize my work in dynamic networks, computational biology and text analysis. The focus
of the write up is on the lessons learned and, in particular, on the flow of ideas that constitute the Ariadne’s
thread of my research. These ideas naturally call for a science of complex systems, and portray the notion
of ”hierarchical structure” in the likelihood, and that of ”mixtures” as the central technical tools for positing
probabilistic Bayesian models that can express complex global phenomena in terms simpler local patterns,
in a generative and informative fashion, as a continuum rather than as a dichotomy.
3.1.1 Dynamic Communication Networks
With Christos Faloutsos, I developed the Dynamic Network Tomography (DNT) model in order to re-
cover origin-destination (OD) traffic flows from observable link loads, in dynamic communication networks
(Airoldi and Faloutsos, 2004). Link loads are sums of subsets of OD flows, i.e., mixtures of non-observable
quantities, so that our model is essentially a latent allocation model where the OD flows play the role of al-
location weights. Real OD flows are expensive to retrieve, thus there is the necessity for accurate estimation
methods. We were provided validation data by AT&T at first, and from the Networking Institute of Carnegie
Mellon University later.
Previous models assumed either IID observations at different epochs or a fixed dynamics for all times.
The state-of-the-art was a local likelihood model based on Gaussian assumption and IID observations, but
with a powerful parameterization that linked latent flows to observed loads uniquely (Cao et al., 2000, 2001).
I specified the dynamics as a generalized dynamic system, by positing a stable average in the long-term
while allowing some short-term variability to capture ”local” deviations, i.e., localized in time. Further,
instead of modeling the dynamic among the flows (i.e., the mixture weights) directly, I added a layer into
the hierarchical structure of the likelihood by introducing latent variables for the average flows, in order to
move the generalized dynamic system up a level, among these new variables. Therefore the final model
softly constrains the mixture weights through a hierarchical, fully generative, dynamic allocation model.
I produced inference algorithms (MCMC, particle filters, importance sampling, and hybrid schemes) for
several models based on Gamma and Log-Normal distributions. I applied these models to real traffic at
AT&T (16 time series) and at Carnegie Mellon (about 12000 time series) and achieve accuracy improve-
ments of 30-to-40% on all OD routes—in terms of ℓ
2
distance between estimated and true OD flows.
I then developed some general theory of dynamic allocation models. In particular, a nice feature of
our model is that the ”stochastic dynamics” not only allows for localized variability across the temporal
sequences, but also leads to more parsimonious models, in terms of the number of parameters, than the
traditional state-space formulation. I was interested in formulating a general convolution problem in order to
find the general family of distributions that could substitute for Gamma and Log-Normal analytic forms. The
issue is that a model of stochastic dynamic has to be conjugate to that of the OD flows through composition
by multiplication in order for the distributional assumptions to propagate ”exactly” through subsequent
epochs, e.g., the Gamma model does not lead to exact propagation whereas the Log-Normal model does.
The crucial property of the log-Normal model, though, is not that it is a stable limit of the product of
random variables with finite variances, as I originally conjectured, but rather that it is a infinitely divisible
distribution. In fact, infinitely divisible distributions possess properties that are sufficient to be solution to
the general convolution problem (Airoldi, 2005).
15

Page 16
DRAFT
DO NOT DISTRIBUTE
In conclusion, we solved a classical ill-posed inverse problem using projections, that we defined through
a hierarchical, fully generative, dynamic allocation model. The model integrates relational and temporal ob-
servations. We learned that picking one solution among the feasible ones is not good enough to recover real
traffic. In general, in cases where several likely solutions exist it is crucial to ”inform” the inference some-
how, e.g., by introducing structure in the likelihood, in order to reach a ”good” solution. Non-informative
inference will give biased estimates in such cases.
3.1.2 Large Collections of Electronic Documents and Emails
With Stephen Fienberg, I developed the Dirichlet-Poisson and Dirichlet-negative-binomial models in order
to resolve an authorship attribution problem. In particular, we were asked to produce predictions and confi-
dence bounds on whether Reagan had drafted the texts of 312 of his own radio addresses, aired between 1975
and 1979, for which the original drafts were not found, neither other evidence was available (Airoldi et al.,
2006a).
The distinguishing feature of authorship attribution problems is that ”topical” words are not useful, and
in our case spellings and punctuation were not useful either. Rather, words used for discrimination in these
problems are typically high-frequency, non-topical words, called ”function words,” as well as idioms and
expressions. Probabilistic models based on multivariate Bernoulli, multinomial and mixtures of such models
do not fit word counts satisfactorily—note that a Dirichlet mixture of multinomials (Blei et al., 2003) is
again a multinomial. Further, word counts corresponding to function words display a variance greater than
the mean, thus the Poisson model is not entirely adequate, although significantly improves the previous ones.
The negative-binomial model, essentially a gamma mixture of Poissons, fits the data best and leads to robust
and accurate predictions.
We started from standard formulation of Poisson and negative-binomial models, we then introduced
reparameterizations that allow for a natural interpretation of the parameters in the language modeling con-
text, and that introduce the notion of contagion as a model of context, i.e., the more a word is used, the more
likely it is that it will occur again, thus defining the writing style of an author. We then devised reasonable
priors for improving the estimates of rare words. In order to produce ”interpretable” predictions we also
tackled the problem of word selection. We were looking for a small set of words that captured the writing
style of Reagan, which could be recognized by experts and close collaborators of his at the Hoover Insti-
tution as ”sounding like him.” To this extent, we devised a statistical test for determining the importance
of a word, based on a smooth function of word counts, which we called “∆
2
statistic.” This enabled us to
apply the theory of false discovery rates in order to correct for the fact that we had to test a large amount
of words, idioms and expressions. By the end, cross validated accuracy of predictions about authors was up
high beyond 95% with only 30 to 40 words.
Given the success of our methodology, we tested our models on a large set of text classification problems,
old and new, to discover that they consistently outperformed naive Bayes mixtures and other models based
on multivariate Bernoulli and multinomial. We found that whenever the classification was driven by non-
rare words our methods ”largely” outperformed those models. We found that our ∆
2
statistic selected
more discriminative words than Information Gain and improved accuracy. We thus extended our statistic
to deal with multiple classes and derived an asymptotic approximations for its distribution. This enabled to
compute p-values for each word by looking up a Gaussian probability table, rather than having to draw a
large sample, thus dramatically reducing processing time and providing a feasible alternative to information
16

Page 17
DRAFT
DO NOT DISTRIBUTE
gain (Airoldi et al., 2005b). In two side projects, with other collaborators, I further explored the power of the
2
statistic in finding associations among words, with the final goal of extracting sentiments and detecting
fraudulent intent in emails. I found that ∆
2
led to Markov blankets that encoded sentiments better those
obtained with Chi-Square and G-Square tests. ∆
2
is more robust to sparse counts (Airoldi et al., 2006b,
2005c).
In conclusion, we developed probabilistic solutions that improved performance on a wide range of practi-
cal problems using models of contagion and approximate inference. In our formulation, models of contagion
are flexible hierarchical mixture models, e.g., negative-binomial is essentially Gamma mixture of exchange-
able Poissons with a Dirichlet prior on its parameters. The asymptotic formulas for our statistical test were
crucial in reducing processing time in computing p-values, and now the ∆
2
statistic provides a feasible al-
ternative to information gain. There is more theoretical value in the sum/ratio type of reparameterizations,
which maintain tractability, allow to condition on observable quantities in mixture models of contagion, and
provide a bridge to understand the geometry of such models in the simplex—more below.
3.1.3 Biological Sequences and Networks
With Stephen Fienberg, I developed a soft-clustering version of the Dirichlet-Poisson model for the serial
analysis of gene expression data. The problem is that of recovering non-observable functional processes
that express the observed sequential profiles. In this mixture model, we assume that each gene tag may be
expressed under more than one functional process. The Poisson model provides better fit for gene expres-
sion profiles. The notions of ”contagion” and latent ”functional context” entailed by the model are very
compelling in this application domain. On mouse retina data, our model recovered a richer set of sequential
profiles than latent Dirichlet allocation (Blei et al., 2003), the analysis indicated that less latent profiles are
needed to express the observed gene profiles, and we obtained a better log-likelihood plateau for the same
number of parameters.
With Stephen Fienberg, Eric Xing and David Blei, I developed a latent mixture model of complex data,
in order to integrate gene expression (GE) data with protein-protein interactions (PPI). The problem is still
the same, i.e., we want to recover non-observable functional processes that supposedly drive the observable
interaction patterns among proteins. We posit a hierarchy of probabilistic assumptions that encodes a simple
intuition: proteins that interact with others according to similar patterns are likely to participate in the
same latent functional processes. For the relational portion of the data, we posit a stochastic block mixture
model where each protein may participate in multiple non-observable functions. We developed a ”nested”
variational algorithm for the relational portion of the model that is parallelizable and scales in terms of
memory to very-large matrices Preliminary experiments on PPI in Yeast suggested that our model is a
powerful tool for filtering outcomes of high-throughput experiments starting from a small sample of hand-
curated functional annotations. Our model predicted correctly more than 50% of the functional annotations
for unknown proteins, in cross-validation experiments (Airoldi et al., 2005a, 2006d).
These projects are still being completed and we are now at the stage where we need to integrate GE
and PPI observations on overlapping sets of proteins, under the assumption that both data sources provide
information on the same set of non-observable functional processes. An important lesson we learned is that
conditioning, identifiability and robustness are tied in these models, and a simple reparameterization may
have profound consequences. For example, in the model for GE the sum/ratio parameterization is crucial
in that it leads to closed formula updates for the variational inference. Further, conditioning on observable
17

Page 18
DRAFT
DO NOT DISTRIBUTE
quantities pushes the log-likelihood to a better plateau than competing models (Airoldi et al., 2006f). In the
model for PPI we conjecture that conditioning, e.g., on observable density of the interaction matrix, will
lead to more robust inferences by eliminating penalizing local optima that would bias the estimates.
3.2 Proposed Research
The proposed research advances science by providing theoretical understanding, geometrical intuitions, ap-
proximations and computational shortcuts, and an array of fresh modeling ideas for temporal and relational
components of systems of interest, along with strategies to integrate them. In particular, we propose prob-
abilistic models that posit a hierarchy of probabilistic assumptions about how the data interact, which we
compose in a generative fashion, i.e., data are characterized as arising from probabilistic algorithms that are
completely specified by few parameters. See Table 1 for an overview of my research agenda.
In my thesis I plan to develop foundations of a statistical theory of random graphs and networks in three
main directions.
1. Graphs and Networks
(a) Pure Topology Types. In this part I survey the different processes that explain both global and
local properties (Airoldi and Carley, 2005). I then show how to combine a basic probabilistic
model where the edges are exchangeable with probabilistic generative modules to replicate the
properties of interest with high probability. I plan to explore cellular networks in more details.
(b) Discovering Latent Motifs. In this part I show how to use latent variable models to infer ba-
sic motifs using admixtures and a hierarchy of probabilistic assumptions. Admixtures at the
node level serve the purpose of expressing the complexity in the data from simple motifs. The
hierarchy of probabilistic assumption serves to purpose of targeting latent motifs that exhibit
regularities of interest, and provides a unified framework for the analysis of one-, two-, and
multiple-mode data (Erosheva and Fienberg, 2005; Airoldi et al., 2006d).
2. Graph Evolution and Dynamics
(a) Birth-Death Processes. In this part I show how to posit a birth-death process in the latent space,
and I show how to make inference. A latent birth-death process is not only a convenient and
elegant way of enhancing the complexity that any given set of motifs can express, but it also
provides a fundamental advance in modern time series analysis (Xing, 2005).
3. Complex Graphs and Networks
(a) Integration and Prediction. In this part I show how to integrate the different models in a modular
fashion. Integration provides to our model the ability to share statistical power among several
sources of information, all of which tell us something important about the characteristics of
the system, which relate to a particular set of attributes of primary interest. The main purpose
of the integration, in fact, is to infer latent motifs that are tailored for predicting the target set
of attributes in the system, for which only a small amount of (possibly) partial information is
available (Airoldi et al., 2006c). I plan to develop an example in detail to demonstrate the theory.
18

Page 19
DRAFT
DO NOT DISTRIBUTE
1. Graphs and Networks
(a) Pure Topology Types
i. Stability and Separability of Metric Embeddings
ii. Exchangeability and Basic Likelihoods
iii. Canonical Bases for Graph Generation
iv. Example: Cellular Networks
(b) Discovering Latent Motifs
i. Mixtures of Connection Sub-Graphs
ii. Stochastic Block Models of Mixed Membership
iii. Latent Space Models of Mixed Membership
iv. Diffusion Models
(c) Bipartite Graphs
2. Evolution and Dynamics
(a) Order-Dependent Likelihoods
i. Laws of Evolution
ii. Diffusion Models Revisited
(b) Birth-Death Processes
i. Mixtures of Hidden Markov Models
ii. Mixtures of State-Space Models
(c) Limited Resources
3. Integration and Complexity
(a) Multi-Graphs, Hyper-Graphs, and the Meta-Matrix
(b) Example Applications
i. Dynamic Graphs and Networks
ii. Biological Sequences and Networks
iii. Textual Documents and Word Networks
4. Theoretical Issues
(a) Hierarchical (Bayesian) Mixture Models
i. Identifiability
ii. Exchangeability
iii. Model Selection Strategies
(b) Latent Allocation as Projection
i. Projective Geometry in the Exponential Family
(c) Remarks on Asymptotics and Information
(d) Gaussian Unit Ensembles
(e) Approximate Inference Techniques
Table 1: Overview of the research agenda.
19

Page 20
DRAFT
DO NOT DISTRIBUTE
I also plan to explore theoretical and computational issues such as approximate inference techniques,
identifiability of parameters in complex models, model selection strategies for the number of latent patterns
(Airoldi et al., 2006e), a characterization of latent allocation tasks in terms of projections, sub-sampling for
large graphs, connections to spectral theory via Gaussian ensembles, and remarks on the asymptotics of the
models encompassed in this framework.
3.3 Time to Completion
E (Time) Ref. Research to be completed
Ideal venues
6 weeks
1a
The emergence of cellular structures
1 science / pnas
6 weeks
1b
Stochastic block models of mixed membership
1 jmlr / ba / jasa / jrss(b)
16 weeks
2
Latent birth-death processes
2 jasa / jrss(b)
8 weeks
3
Integration and prediction in complex biological systems 1 ba, 1 valencia
Expected completion in November 2006
References
Adamic, L. A., E. Adar. 2005. How to search a social network. Social Networks 27 187–203.
Adamic, L. A., B. A. Huberman. 2000. Comment on power-law distribution of the world wide web. Science
287 2115a.
Airoldi, E. M. 2003. Advances in network tomography. Tech. Rep. CMU-CALD-03-101, Carnegie Mellon
University.
Airoldi, E. M. 2005. Latent allocation models for dynamic network analysis: An application to network
tomography. Manuscript.
Airoldi, E. M., A. G. Anderson, S. E. Fienberg, K. K. Skinner. 2006a. Who wrote Ronald Reagan radio
addresses? Bayesian Analysis 1 289–320.
Airoldi, E. M., X. Bai, R. Padman. 2006b. Sentiment extraction from unstructured texts: Markov blankets
and meta-heuristic search. Lecture Notes in Computer Science, vol. 3932. Springer-Verlag.
Airoldi, E. M., D. M. Blei, S. E. Fienberg, E. P. Xing. 2006c. Latent mixed-membership allocation models
of relational and multivariate attribute data. Valencia & ISBA Joint World Meeting on Bayesian Statistics.
Forthcoming.
Airoldi, E. M., D. M. Blei, S. E. Fienberg, E. P. Xing. 2006d. Mixed membership stochastic block models for
relational data with application to protein-protein interactions. Proceedings of ENAR Annual Meetings.
Forthcoming.
Airoldi, E. M., D. M. Blei, E. P. Xing, S. E. Fienberg. 2005a. A latent mixed-membership model for
relational data. Workshop on Link Discovery: Issues, Approaches and Applications. In conjunction with
the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
20

Page 21
DRAFT
DO NOT DISTRIBUTE
Airoldi, E. M., K. M. Carley. 2005. Sampling algorithms for pure network topologies: Stability and separa-
bility of metric embeddings. ACM SIGKDD Explorations 7 13–22.
Airoldi, E. M., K. M. Carley. 2006. The emergence of cellular structures. Manuscript.
Airoldi, E. M., W. W. Cohen, S. E. Fienberg. 2005b. Bayesian models for frequent terms in text. Proceedings
of the Classification Society of North America and INTERFACE Annual Meetings.
Airoldi, E. M., C. Faloutsos. 2004. Recovering latent time-series from their observed sums: network to-
mography with particle filters. Proceedings of the ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, vol. 10. 30–39.
Airoldi, E. M., S. E. Fienberg, C. Joutard, T. M. Love. 2006e. Model comparison and choice for hierarchical
Bayesian mixed-membership models with application to scientific publications. Forthcoming.
Airoldi, E. M., B. Malin, L. A. Sweeney. 2005c. X-RAY: Technologies to defeat identity theft vulnerabilities
related to e-mail requests. Proceeding of the AAAI Spring Symposium.
Airoldi, E. M., C. R. Shalizi. 2006. The analytical nexus from Zipf’s law to lognormal distributions.
Manuscript.
Airoldi, E. M., E. P. Xing, S. E. Fienberg. 2006f. Contagion processes for soft clustering: Learning latent
themes from frequently co-occurring events. Manuscript.
Albert, R., A. L. Barabasi. 2002. Statistical mechanics of complex networks. Rebiews of Modern Physics
74.
Albert, R., H. Jeong, A.-L. Barabasi. 1999. Diameter of the world wide web. Nature 401 130.
Amaral, L. A. N., J. M. Ottino. 2004. Complex networks. European Physics Journal B 38 147–162.
Amaral, L. A. N., A. Scala, M. Barthelemi, H. E. Stanley. 2000. Classes of small-world networks. Proceed-
ings of the National Academy of Sciences 97 11149–11152.
Anderson, C. J., S. Wasserman, K. Faust. 1992. Building stochastic blockmodels. Social Networks 14
137–161.
Artzy-Randrup, Y., S. J. Fleishman, N. Ben-Tal, L. Stone. 2004. Comment on “Network motifs: Simple
building blocks of complex networks” and “Superfamilies of evolved and designed networks”. Science
305 1107c.
Balazsi, G., A. L. Barabasi, Z. N. Oltvai. 2005. Topological units of environmental signal processing in the
transcriptional regulatory network of Escherichia coli. Proceedings of the National Academies of Science
102 7841–7846.
Banks, H. T., A. F. Karr, H. K. Nguyen, J. R. Samuels, Jr. 2005. Sensitivity to noise variance in a social
network dynamics model. 2005 2005-10, Statistical and Applied Mathematical Sciences Institute.
Barabasi, A. L. 2005a. Network theory—the emergence of the creative enterprise. Science 308 639–641.
Barabasi, A. L. 2005b. The origins and heavy tails in human dynamics. Nature 435 207–211.
21

Page 22
DRAFT
DO NOT DISTRIBUTE
Barabasi, A. L., R. Albert. 1999. Emergence of scaling in random networks. Science 286 509–512.
Barabasi, A. L., R. Albert, H. Jeong. 1999. Mean-field theory for scale-free random networks. Physyca A
272 173–187.
Barabasi, A. L., R. Albert, H. Jeong, G. Bianconi. 2000. Response to power-law distribution of the world
wide web. Science 287 2115a.
Barabasi, A. L., E. Bonabeau. 2003. Scale-free networks. Scientific American 50–59.
Barabasi, A. L., H. Jeong, Z. Neda, E. Ravasz, A. Schubert, T. Vicsek. 2002. Evolution of the social network
of scientifc collaborations. Physica A 311 590–614.
Barabasi, A. L., Z. N. Oltvai. 2004. Network biology: Understanding the cell’s functional organization.
Nature Reviews Genetics 5 101–113.
Belkin, M., P. Niyogi. 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering.
Advances in Neural Information Processing Systems, vol. 14.
Berg, J., M. Lassig. 2004. Local graph alignment and motif search in biological networks. Proceedings of
the National Academy of Sciences 101 14689–14694.
Besag, J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal
Statistical Society, Series B .
Blei, D. M., A. Ng, M. I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research
3 993–1022.
Bollobas, B. 1985. Random Graphs. Academic Press, London.
Bollobas, B. 2001. Random Graphs. 2nd ed. Academic Press, New York.
Bollobas, B., O. Riordan. 2002. Mathematical results on scale-free graphs. S. Bornholdt, H. Schuster, eds.,
Handbook of graphs and networks. Wiley-VCH.
Borgatti, S. P., K. M. Carley, D. Krackhardt. 2005. Robustness of centrality measures under conditions of
imperfect data. Social Networks Forthcoming.
Borgatti, S. P., M. G. Everett. 1999. Models of core / periphery structures. Social Networks 21 375–395.
Breese, J., D. Heckerman, C. Kadie. 1998. Empirical analysis of predictive algorithms for collaborative
filtering. Uncertainty in Artificial Intelligence, vol. 14.
Breiger, R. L. 2003. Emergent themes in social network analysis: Results, challenges, opportunities. Dy-
namic Social Network Modeling and Analysis: Workshop Summary and Papers.
Breiger, R. L., S. A. Boorman, P. Arabie. 1975. An algorithm for clustering relational data with applica-
tions to social network analysis and comparison to multidimensional scaling. Journal of Mathematical
Psychology 12 328–383.
Brin, S., L. Page. 1998. The anatomy of a large-scale hypertextual (Web) search engine. Proceedings of the
International World Wide Web Conference, vol. 7.
22

Page 23
DRAFT
DO NOT DISTRIBUTE
Buckley, P. G., D. Osthus. 2004. Popularity based random graph models leading to a scale-free degree
sequence. Discrete Mathematics 282 53–68.
Caldarelli, G., R. Pastor-Satorras, A. Vespignani. 2004. Structure of cycles and local ordering in complex
networks. European Physics Journal B 38 183—186.
Cao, J., D. Davis, S. Van Der Viel, B. Yu. 2000. Time-varying network tomography: router link data.
Journal of the American Statistical Association 95 1063–75.
Cao, J., D. Davis, S. Van Der Viel, B. Yu, Z. Zu. 2001. A scalable method for estimating network traffic
matrices from link counts. Tech. rep., Bell Labs.
Carley, K. M. 2002. Smart agents and organizations of the future. L. Lievrouw, S. Livingstone, eds., The
Handbook of New Media. 206–220.
Carley, K. M., J. Reminga. 2004. ORA: Organizational Risk Analyzer. Available for download at
http://www.casos.cs.cmu.edu/projects/ora/.
Chakrabarti, Deepayan. 2005. Tools for large graph mining. Ph.D. thesis, School of Computer Science,
Carnegie Mellon University.
Chung, F., L. Lu. 2003. The average distances in random graphs with given expected degrees. Internet
Mathematics 1 91–114.
Cohen, William C., Robert E. Schapire, Yoram Singer. 1999. Learning to order things. Journal of Artificial
Intelligence Research 10 243–270.
Coifman, R. R., S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, S. W. Zucker. 2005a. Geometric
diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings
of the National Academy of Sciences 102 7426–7431.
Coifman, R. R., S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, S. W. Zucker. 2005b. Geometric
diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods. Proceed-
ings of the National Academy of Sciences 102 7432–7437.
Coleman, J., E. Katz, H. Menzel. 1957. The diffusion of an innovation among physicians. Sociometry 20
253–270.
Cooper, C., A. M. Frieze. 2003. A general model of web graphs. Random Structures and Algorithms 22
311–335.
Ding, C. 2005. Principal component analysis and matrix factorizations for learning. International Confer-
ence on Machine Learning. Tutorial.
Dodds, P. S., D. J. Watts, C. F. Sabel. 2003. Information exchange and the robustness of organizational
networks. Proceedings of the National Academy of Sciences 100 12516–12521.
Domingos, P. 2003. Prospects and challenges for multirelational data mining. SIGKDD Explorations 5
80–83.
Donoho, D. L., C. Grimes. 2003. Hessian eigenmaps: Locally linear embedding techniques for high-
dimensional data. Proceedings of the National Academy of Sciences 100 5591–5596.
23

Page 24
DRAFT
DO NOT DISTRIBUTE
Doreian, P., V. Batagelj, A. Ferligoj. 2004. Generalized Blockmodeling. Cambridge University Press.
Doreian, P., F. N. Stokman, eds. 1997. Evolution of Social Networks. Gordon and Breach Publishers.
Dorogovtsev, S. N., J. F. F. Mendes. 2002. Evolution of networks. Advances in Physics 51 1079–1187.
Erdos, P., A. Renyi. 1960. On the evolution of random graphs. Publications of the Mathematical Institute
of the Hungarian Academy of Sciences 5 17–61.
Erosheva, E., S. E. Fienberg. 2005. Bayesian mixed membership models for soft clustering and classifica-
tion. C. Weihs, W. Gaul, eds., Classification—The Ubiquitous Challenge. Springer-Verlag, 11–26.
Evans, J. St. B. T., S. E. Newstead, R. M. J. Bynre. 1993. Human Reasoning. Psychology Press.
Faloutsos, M., P. Faloutsos, C. Faloutsos. 1999. On power-law relationships of the internet topology. Pro-
ceedings of the ACM SIGCOMM Conference. 251–261.
Faust, K. 1988. Comparison of methods for positional analysis: Structural and general equivalences. Social
Networks 10 313–341.
Fienberg, S. E. 1999. History of science: Statistics in the social world. Science 283 2025.
Fienberg, S. E., M. M. Meyer, S. Wasserman. 1981. Analyzing data from multivariate directed graphs: An
application to social networks. Interpreting Multivariate Data. New York: Wiley, 289–306.
Fienberg, S. E., M. M. Meyer, S. Wasserman. 1985. Statistical analysis of multiple sociometric relations.
Journal of the American Statistical Association 80 51–67.
Fienberg, S. E., S. Wasserman. 1981. Categorical data analysis of single sociometric relations. S. Leinhardt,
ed., Sociological Methodology. San Francisco: Jossey-Bass, 156–192.
Flaxman, A., A. Frieze, Trevor Fenner. 2005. High degree vertices and eigenvalues in the preferential
attachment graph. Internet Mathematics 2.
Frank, O. 2000. Structural plots of multivariate binary data. Journal of Social Structure 1 1–19.
Frank, O., D. Strauss. 1986. Markov graphs. Journal of the American Statistical Association 81 832–842.
Frantz, T., K. M. Carley. 2005a. A formal characterization of cellular networks. Tech. Rep. CMU-ISRI-05-
109, School of Computer Science, Canregie Mellon University.
Frantz, T., K. M. Carley. 2005b. Relating network topology to the robustness of centrality measures. Tech.
Rep. CMU-ISRI-05-117, School of Computer Science, Canregie Mellon University.
Friedman, N. 2004. Inferring cellular networks using probabilistic graphical models. Science 303 799–805.
Friedman, N., D. Koller. 2003. Being bayesian about bayesian network structure: A bayesian approach to
structure discovery in bayesian networks. Machine Learning 50 95–125.
Ganesh, A., L Massoulie, D. Towsley. 2005. The effect of network topology on the spread of epidemics.
Proceedings of the 24th IEEE INFOCOM Annual Conference.
Getoor, L. 2003. Link mining: A new data mining challenge. SIGKDD Explorations 5 84–89.
24

Page 25
DRAFT
DO NOT DISTRIBUTE
Getoor, L., C. Diehl. 2005. Link mining: A survey. ACM SIGKDD Explorations 7 3–12.
Getoor, L., N. Friedman, D. Koller, B. Taskar. 2002. Learning probabilistic models with link uncertainty.
Journal of Machine Learning Research .
Ghahramani, Z. 2005. Unsupervised learning. O. Bousquet, G. Raetsch, U. von Luxburg, eds., Advanced
Lectures on Machine Learning, vol. LNAI 3176. Springer-Verlag.
Girvan, M., M. E. J. Newman. 2002. Community structure in social and biological networks. Proceedings
of the National Academy of Sciences 99 7821—7826.
Goldenberg, A., A. W. Moore. 2004. Tractable learning of large bayes net structures from sparse data.
Proceedings of the Intenational Conference on Machine Learning, vol. 21.
Guimera, R., L. A. N. Amaral. 2005. Functional cartography of complex metabolic networks. Nature 433
895–900.
Guimera, R., S. Mossa, A. Turtschi, L. A. N. Amaral. 2005. The worldwide air transportation net-
work: Anomalous centrality, community structure, and cities’ global roles. Proceedings of the National
Academy of Sciences 102 7794—7799.
Habib, M., C. McDiarmid, J. Ramirez-Alfonsin, B. Reeds, eds. 1998. Probabilistic Methods for Algorithmic
Discrete Mathematics. Springer.
He, X., D. Cai, P. Niyogi. 2005. Tensor subspace analysis. Advances in Neural Information Processing
Systems, vol. 18.
Heckerman, D. 1999. A tutorial on learning with Bayesian networks. M. Jordan, ed., Learning in Graphical
Models. MIT Press.
Heckerman, D., C. Meek, D. Koller. 2004. Probabilistic models for relational data. Tech. Rep. MSR-TR-
2004-30, Microsoft Research.
Hoff, P. D. 2003a. Bilinear mixed effects models for dyadic data. Tech. Rep. 32, University of Washigton,
Seattle.
Hoff, P. D. 2003b. Random effects models for network data. R. Breiger, K. Carley, P. Pattison, eds.,
Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers. National Academies
Press, 303–312.
Hoff, P. D., A. E. Raftery, M. S. Handcock. 2002. Latent space approaches to social network analysis.
Journal of the American Statistical Association 97 1090–1098.
Holland, P., K. B. Laskey, S. Leinhardt. 1983. Stochastic blockmodels: Some first steps. Social Networks 5
109–137.
Holland, P. W., S. Leinhardt. 1975. Sociological Methodology, chap. Local structure in social networks.
Jossey-Bass, 1–45.
Holzle, Urs. 2005. Petabyte processing made easy. Advances in Neural Information Processing Systems.
Invited Talk.
25

Page 26
DRAFT
DO NOT DISTRIBUTE
Huberman, B. A., L. A. Adamic. 1999. Growth dynamics of the world-wide web. Nature 401 131.
Hunter, D., M. Handcock. 2004. Inference in curved exponential family models for networks. Tech. Rep.
TR0402, Department of Statistics, Penn State University.
Ilachinski, A. 2001. Cellular Automata: A Discrete Universe. World Scientific.
Jensen, D. 1999. Statistical challenges to inductive inference in linked data. Proceedings of the 17th
International Workshop on Artificial Intelligence and Statistics.
Kashtan, N., S. Itzkovitz, R. Milo, U. Alon. 2004. Efficient sampling algorithm for estimating subgraph
concentrations and detecting network motifs. Bioinformatics 20 1746–1758.
Kemp, C., T. L. Griffiths, J. B. Tenenbaum. 2004. Discovering latent classes in relational data. Tech. Rep.
AI Memo 2004-019, MIT.
Kempe, D., J. Kleinberg, E. Tardos. 2003. Maximizing the spread of influence through a social network.
Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Min-
ing.
Kleinberg, J. 1999a. The small-world phenomenon: An algorithmic perspective. Tech. Rep. 99-1776,
Department of Computer Science, Cornell University.
Kleinberg, J. 2000. Navigation in a small world. Nature 845.
Kleinberg, J. 2001. Small-world phenomena and the dynamics of information. Advances in Neural Infor-
mation Processing Systems 14.
Kleinberg, J. 2004. The small-world phenomenon and decentralized search. SIAM News 37.
Kleinberg, J., S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins. 1999. The web as a graph: Mea-
surements, models and methods. Proceedings of the International Conference on Combinatorics and
Computing. 1–17.
Kleinberg, J., S. Lawrence. 2001. The structure of the web. Science 294 1849–1850.
Kleinberg, J. M. 1999b. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 604–
632.
Kleinberg, R., J. Kleinberg. 2005. Isomorphism and embedding problems for infinite limits of scale-free
graphs. Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms.
Kossinets, G., D. J. Watts. 2006. Empirical analysis of an evolving social network. Science 311 88–90.
Kumar, R., P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. 2000. Random graph models
for the web graph. Annual Symposium on Foundations of Computer Science. 57–65.
Lafon, S., A. B. Lee. 2006. Diffusion maps and coarse-graining: A unified framework for dimensionality
reduction, graph partitioning and data set parameterization. Manuscript.
Leighton, J. P., R. J. Sternberg, eds. 2003. The Nature of Reasoning. Cambridge University Press.
26

Page 27
DRAFT
DO NOT DISTRIBUTE
Leskovec, J., D. Chakrabarti, J. Kleinberg, C. Faloutsos. 2005a. Realistic, mathematically tractable graphh
generation and evolution, using Kronecker multiplication. Proceedings of the 9th European Conference
on Principles and Practice of Knowledge Discovery in Databases.
Leskovec, J., J. Kleinberg, C. Faloutsos. 2005b. Graphs over time: Densification laws, shrinking diam-
eters and possible explanations. Proceedings of the 11th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining.
Liben-Nowell, D., J. Kleinberg. 2003. The link prediction problem for social networks. Proceedings of the
12th Annuam ACM International Conference on Knowledge Management. 556–559.
Liben-Nowell, D., J. Novak, R. Kumar, P. Raghavan, A. Tomkins. 2005. Geographic routing in social
networks. Proceedings of the National Academy of Sciences 102 11623—11628.
Lorrain, F., H. C. White. 1971. Structural equivalence of individuals in social networks. Journal of Mathe-
matical Sociology 1 49–80.
Macskassy, S. A., F. Provost. 2005. Classification in networked data: A toolkit and a univariate case study.
Tech. Rep. CeDER-04-08, Stern School of Business, New York University.
Malin, B., E. M. Airoldi, K. M. Carley. 2005. A social network analysis model for name disambiguation in
lists. Journal of Computational and Mathematical Organization Theory 11 119–139.
Metha, M. L. 2004. Ranomd Matrices. Elsevier.
Milgram, S. 1967. The small world phenomenon. Psychology Today 1.
Milo, R., S. Itzkovitz, N. Kashtan, R. Levitt, U. Alon. 2004a. Response to comment on “Network motifs:
Simple building blocks of complex networks” and “Superfamilies of evolved and designed networks”.
Science 305 1107d.
Milo, R., S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzenshtat, M. Sheffer, U. Alon. 2004b.
Superfamilies of evolved and designed networks. Science 303 1538–1542.
Milo, R., N. Kashtan, S. Itzkovitz, M. E. J. Newman, U. Alon. 2003. On the uniform generation of random
graphs with prescribed degree sequence. ArXiv Condensed Matter e-prints .
Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon. 2002. Network motifs: Simple
building blocks of complex networks. Science 298 824–827.
Mishra, N., D. Ron, R. Swaminathan. 2004. A new conceptual clustering framework. Machine Learning
Journal 56 115–151.
Mitzenmacher, M. 2004. A brief history of generative models for power law and lognormal distributions.
Internet Mathematics 1 226–251.
Nadler, B., S. Lafon, R. Coifman, I. Kevrekidis. 2005. Diffusion maps, spectral clustering and eigenfunc-
tions of fokker-planck operators. Advances in Neural Information Processing Systems, vol. 18.
Newman, M. E. J. 2001. The structure of scientific collaboration networks. Proceedings of the National
Academy of Sciences 98 404—409.
27

Page 28
DRAFT
DO NOT DISTRIBUTE
Newman, M. E. J. 2003a. Mixing patterns in networks. Physics Reviews E 67.
Newman, M. E. J. 2003b. The structure and function of complex networks. SIAM Review 45 167–256.
Newman, M. E. J. 2004a. Analysis of weighted networks. Physics Reviews E 70.
Newman, M. E. J. 2004b. Detecting community structure in networks. European Physics Journal B 38
321–330.
Newman, M. E. J. 2005. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics 46 323–
351.
Newman, M. E. J., J. Park. 2003. Why social networks are different from other types of networks. Physics
Reviews E 68.
Newman, M. E. J., D. J. Watts, S. H. Strogatz. 2002. Random graph models of social networks. Proceedings
of the National Academy of Sciences 99 2566—2572.
Ng, A. Y., A. X. Zheng, M. I. Jordan. 2001. Link analysis, eigenvectors, and stabiliy. Proceedings of the
International Joint Conference on Artificial Intelligence, vol. 17.
Nowicki, K., T. A. B. Snijders. 2001. Estimation and prediction for stochastic blockstructures. Journal of
the American Statistical Association 96 1077–1087.
Papadimitriu, C., E. Servan-Schreiber. 1999. The origins of the deadline: Optimizing communication in
organizations. Complexity in Economics.
Park, J., M. E. J. Newman. 2004. Solution of the 2-star model of a network. Physics Reviews E 70.
Park, J., M. E. J. Newman. 2005. Solution for the properties of a clustered network. Physics Reviews E 72.
Piatetsky-Shapiro, G. 2005. Interview with usama fayyad, yahoo chief data officer. ACM SIGKDD Explo-
rations 7 84–90.
Porter, M. A., P. J. Mucha, M. E. J. Newman, C. M. Warmbrand. 2005. A network analysis of committees
in the united states house of representatives. Proceedings of the National Academy of Sciences 102 7057–
7062.
Reddy, R. 1996. To dream the possible dream. Communications of the ACM 39 105–112.
Reddy, R. 2003. Three open problems in AI. Journal of the ACM 50 83–86.
Roweis, S. T., L. K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science
22 2323–2326.
Sales-Pardo, M., R. Guimera, A. A. Moreira, J. Widom, L. A. N. Amaral. 2005. Mesoscopic modeling for
nucleic acid chain dynamics. Physics Reviews E 71 1–13.
Saul, L. 2005. Spectral methods for dimensionality reduction. Advances in Neural Information Processing
Systems. Tutorial.
Segal, E., D. Pe’er, A. Regev, D. Koller, N. Friedman. 2005. Learning module networks. Journal of Machine
Learning Research 6 503–556.
28

Page 29
DRAFT
DO NOT DISTRIBUTE
Senator, T. E. 2005. Link mining applications: Progress and challenges. ACM SIGKDD Explorations 7
76–83.
Shalizi, C. R. 2003. Optimal nonlinear prediction of random fields on networks. Discrete Mathematics and
Theoretical Computer Science AB(DMCS) 11–30.
Shen-Orr, S., R. Milo, S. Mangan, U. Alon. 2002. Network motifs in the transcriptional regulation network
of Escherichia coli. Nature Genetics 31 64–68.
Simon, H. A. 1955. On a class of skew distribution functions. Biometrika 42 425–440.
Snijders, T. A. B., K. Nowicki. 1997. Estimation and prediction for stochastic blockmodels for graphs with
latent block structure. Journal of Classification 14 75–100.
Snijders, T. A. B., P. E. Pattison, G. L. Robins, M. S. Handcock. 2004. New specifications for exponential
random graph models. Sociological Methodology .
Solan, Z., D. Horn, E. Ruppin, S. Edelman. 2005. Unsupervised learning of natural languages. Proceedings
of the National Academy of Sciences 102 11629–11634.
Sporns, O., D. R. Chialvo, M. Kaiser, C. C. Higetag. 2004. Organization, development and function of
complex brain networks. Trends in Cognitive Science 8 418–425.
Steyvers, M., J. B. Tenenbaum. 2005. The large-scale structure of semantic networks: Statistical analyses
and a model of semantic growth. Cognitive Science 29.
Stumpf, M. P. H., C. Wiuf, R. M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling
properties of networks. Proceedings of the National Academy of Sciences 102 4221–4224.
Sun, J., S. Boyd, L. Xiao, P. Diaconis. 2006. The fastest mixing markov process on a graph and a connection
to a maximum variance unfolding problem. SIAM Review Forthcoming.
Tenenbaum, J. B. 1999. A bayesian framework for concept learning. Ph.D. thesis, Massachussetts Institute
of Technology.
Tenenbaum, J. B., V. de Silva, J. C. Langford. 2000. A global geometric framework for nonlinear dimen-
sionality reduction. Science 290 2319–2323.
Teyssier, M., D. Koller. 2005. Ordering-based search: A simple and effective algorithm for learning bayesian
networks. Uncertainty in Artificial Intelligence, vol. 21. 584–590.
Van Zandt, T. 1997. Decentralized information processing in the theory of organizations. M. Sertel, ed.,
Contemporary Economic Development Reviewed, vol. 4: The Enterprise and its Environment. MacMillan
Press Ltd.
Vaszquez, A., R. Dobrin, D. Sergi, J.-P. Eckmann, Z. N. Oltvai, A.-L. Barabasi. 2004. The topological
relationship between the large-scale attributes and local interaction patterns of complex networks. Pro-
ceedings of the National Academy of Sciences 101 17940—17945.
Wang, X., N. Mohanty, A. K. McCallum. 2005. Group and topic discovery from relations and text. Advances
in Neural Information Processing Systems, vol. 18.
29

Page 30
DRAFT
DO NOT DISTRIBUTE
Wang, Y. J., G. Y. Wong. 1987. Stochastic blockmodels for directed graphs. Journal of the American
Statistical Association 82 8–19.
Wasserman, S. 1980. Analyzing social networks as stochastic processes. Journal of the American Statistical
Association 75 280—294.
Wasserman, S., C. J. Anderson. 1987. Stochastic a posteriori blockmodels: Construction and assessment.
Social Networks 9 1–36.
Wasserman, S., K. Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University
Press.
Wasserman, S., P. Pattison. 1996. Logit models and logistic regression for social networks: I. an introduction
to markov graphs and p
. Psychometrika 61 401–425.
Watts, D. J. 2002. A simple model of global cascades on random networks. Proceedings of the National
Academy of Sciences 99 5766—5771.
Watts, D. J., S. H. Strogatz. 1998. Collective dynamics of “small-world” networks. Nature 393 440–442.
Weinberger, K., L. Saul. 2004. Unsupervised learning of image manufolds by semidefinite programming.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Weinberger, K., F. Sha, L. Saul. 2004. Learning a kernel matrix for nonlinear dimensionality reduction.
International Conference on Machine Learning.
Wu, F., B. A. Huberman. 2005. Social structure and opinion formation. Online Manuscript.
Xing, E. P. 2005. Dynamic non-parametric Bayesian models. Tech. rep., School of Computer Science,
Canregie Mellon University.
Xing, E. P., M. I Jordan. 2003. On semidefinite relaxation for normalized k-cut and connections to spectral
clustering. Tech. Rep. CSD-03-1265, Division of Computer Science, University of California at Berkeley.
Zheng, T., M. Salganik, A. Gelman. 2005. How many people do you know in prison? Journal of the
American Statistical Association Forthcoming.
30