Hierarchical Bayesian Mixture Models of Graphs and Networks

This is the html version of the file http://www.cs.cmu.edu/~eairoldi/phdthesis/edo.phd-proposal.pdf.
G o o g l e automatically generates html versions of documents as we crawl the web.
To link to or bookmark this page, use the following url:

http://www.google.com/search?q=cache:8zqftx9UL30J:www.cs.cmu.edu/~eairoldi/phdthesis/edo.phd-proposal.pdf+%22curved+exponential+family%22+%22naive+bayes%22&hl=en&gl=us&ct=clnk&cd=3

Google is neither affiliated with the authors of this page nor responsible for its content.

These search terms have been highlighted:

curved

exponential

family

naive

bayes

Page 1

DRAFT

DO NOT DISTRIBUTE

Hierarchical Bayesian Mixture Models of Graphs and Networks

A Probabilistic Approach to Complexity via Motifs, Dynamics & Integration

Ph.D. Thesis Proposal

Edoardo M. Airoldi

edo@cmu.edu

School of Computer Science

Carnegie Mellon University

Pittsburgh, PA 15213 USA

February 2006

Page 2

DRAFT

DO NOT DISTRIBUTE

Foreword

Over the past three years I have been working on probabilistic models of bipartite graphs and dynamic

networks. My previous work suggests a probabilistic framework and a general modeling approach to

complex graphs and networks, which is based on the four concepts of motifs, mixtures, dynamics, and

integration. In this thesis, I propose to present such a framework and discuss its properties. In particular,

the main goal of this work will be to establish the essential elements of a formal model of complexity

which allow us to reconcile the global properties of a system with local phenomena of interest, in a

generative fashion.

My solution to the global/local trade-off is to express complexity through hierarchical mixtures of

simple patterns, i.e., motifs, that evolve over time. Complex global behavior emerges from the com-

bination of local interaction patterns and their dynamics. I will discuss and demonstrate how this new

framework incorporates, generalizes, and extends other probabilistic approaches present in the literature,

in what sense it provides a “soft” and probabilistic version of deterministic approaches, and why it lays

the statistical foundations of the theory of (random) graphs and networks.

The major part of my effort will thus be devoted to modeling issues that arise in conjunction to

the four essential aspects listed above. I plan to investigate theoretical and computational issues, and

characterize the geometrical intuitions underlying the allocation task, a major inference objective shared

by the various models encompassed by this framework.

Page 3

DRAFT

DO NOT DISTRIBUTE

Abstract

In this thesis I propose computational and modeling approaches to summarize the complexity of

complex systems in terms of mixtures of simpler latent patterns, which can be described as probabilistic

functions of few parameters that evolve over time according to simple laws.

Complex systems are ubiquitous and they underly many applications. In this thesis I plan to focus on

the generic complex system problem involving sets of objects of different types that relate to one another

over time; relations are measured on pairs of objects both within and across types and, which are pos-

sibly multivariate. Biological sequences and networks related to an organism, dynamic communication

networks within a group, and the research and publication processes in the scientific community at large

provide examples of complex systems that I will discuss in this thesis.

Being able to summarize the information in such complex systems in terms of simple patterns and

corresponding laws of evolution that correlate to phenomena of interest, e.g., topics, functional annota-

tions, formal and informal role in operational processes within an organization, is a fundamental chal-

lenge arising in diverse applications. Feasible solutions could foster automation and knowledge discov-

ery in large biological and genetic databases, and they could lead to new e-business applications, e.g., in

summarization, search, and strategic planning. Statistical foundations for the treatment of such systems

could advance our understanding of real world phenomena and anomalies, and they would give us the

tools to cope with the information overload we face in a systematic manner.

I plan to subsume models of complex systems that were developed for specific application areas into

a general modeling framework. There are four aspects are crucial to most of them: (1) the presence

of a hierarchical structure in the likelihood, which includes both observable and non-observable random

quantities, (2) the mixed membership assumption, according to which objects may participate in multiple

latent patterns to different degrees, (3) the existence of multiple data types along with the availability of

possibly partial measurements about their interactions, and (4) the temporal dimension.

My modeling framework grows out of my recent research efforts in application areas such as bi-

ological sequences and networks, dynamic communication networks, and large collection of scientific

publications. These efforts suggest a probabilistic framework and a general modeling approach to com-

plex systems. This thesis will present such a framework and discuss its properties. In particular, the main

goal is the reconciliation of global properties of complex systems with local phenomena of interest. My

solution is to express complexity through hierarchical mixtures of simple patterns that evolve over time,

i.e., complex global behavior emerges from the combination of local interaction patterns and dynamics. I

will discuss and demonstrate how this new framework generalizes other probabilistic approaches present

in the literature, provides a “soft” and probabilistic version of the deterministic approaches, and lays the

foundations of a statistical theory of graphs and networks. The major part of my effort is then devoted to

modeling issues that arise in conjunction to the four essential aspects listed above, e.g., integration and

dynamics. I conclude with the investigation of theoretical and computational issues, and with a charac-

terization of the geometrical intuitions underlying the allocation task, a major inference objective shared

by the various models encompassed by this framework.

Keywords:

Hierarchical models of mixed membership; Generative models; Pure topology types;

Independent integration; Conditional integration; Generalized state-space dynamics; Order-dependent

likelihoods; Latent birth-death processes; Diffusion processes; Exchangeability; Identifiability; Model

selection; Spectral and frequency decompositions; Loss functions; Gaussian ensembles; Sub-sampling;

Latent allocation as projection; Shrinkage; Approximate inference techniques.

Page 4

DRAFT

DO NOT DISTRIBUTE

Contents

1 Introduction

1.1 Impact and Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2 A Case for Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 A Brief History of Modeling Approaches to Graphs and Networks

2.1 Static Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2 Dynamics and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Building Graphs from Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Inadequacies of the Current Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Overview of the Research Agenda

3.1 Research Work to Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Proposed Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Time to Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Page 5

DRAFT

DO NOT DISTRIBUTE

1 Introduction

In this thesis I propose computational and modeling approaches to summarize complex systems in terms of

mixtures of simpler latent patterns, which can be described as probabilistic functions of few parameters that

evolve over time according to simple laws, which are organized in a hierarchical fashion.

Complex systems are ubiquitous and underly most applications. They are defined as sets of objects of

different types that relate to one another over time; relations are measured on pairs of objects both within

and across types and, and are possibly multivariate. Biological sequences and networks related to an organ-

ism, dynamic communication networks within a group, and the research and publication processes in the

scientific community at large provide examples of complex systems that I plan to discuss in this thesis.

Example 1. A temporal sequence of bipartite graphs where the two types are documents and words can

be summarized in terms of few non-observable “topics” that evolve over time. We can enrich the system by

introducing observations about documents’ authors and obtain a bipartite graph of authors and words, and a

co-authorship general graph. Next we can consider references and the corresponding co-citation graph, and

so on. The result is a multifaceted view of the “scientific publications process” in terms of graphs that encode

possibly multivariate relations among objects of different types, i.e., authors-to-authors, authors-to-words,

authors-to-papers, words-to-papers, and papers-to-papers. The system can still be summarized in terms of

few non-observable “topics”, “areas of expertise” and “collaboration clusters” that evolve over time, along

with structural relations among the object types dictated by the semantics of the publications process.

Example 2. Another set of important examples is given by the collection of biological sequences and

networks related to different organisms. In this case, an important goal of the analysis is that of annotating

proteins with the function(s) they participate into, so the observations regarding the object type “functional

annotation” may play a more prominent role in the summarization. A set of graphs that encode protein-

protein interactions under several experimental conditions can be summarized in terms of non-observable

“functional processes” and non-observable process-to-process “interaction patterns.” Both of these sum-

maries, along with partially available functional annotations, can then be used to predict the annotations of

new proteins. Alternatively, both interactions and annotations can inform functional processes. Or we may

grant annotations a more prominent role by having them inform functional processes for each protein first,

and use the interactions to fine tune them. We can then introduce gene expression data, and so on. Again,

objects of multiple types (proteins, annotations, experimental conditions) and the relations among them out-

line an interacting system that changes as exogenous conditions change. In this example, the data consists of

bipartite graphs (proteins and functional annotations, proteins and experimental conditions for gene expres-

sion) and graphs in general (protein-protein interactions for each experimental condition—note that these

experimental conditions may differ from those under which the gene expression data were obtained).

Example 3. Yahoo! Inc. has recently created the position of “Chief Data Officer” to as the company

recognized “the strategic value of data and wanted data to have a voice at the executive table” rather than

seeing it a mere IT service (Piatetsky-Shapiro, 2005). At one of the primary international forums on ma-

chine and statistical learning held in December 2005 (Advances in Neural Information Processing Systems

in Vancouver) preliminary evidence was presented that suggests that learning systems based on large scale

databases may deliver satisfactory solutions to portions of real-world problems, such as those that arise in

the natural language processing applications (Holzle, 2005). A recent review of progresses and challenges

in link mining written by Senator (2005) identifies the need for “integrated systems” as one of the challenges

ahead. It is my opinion that this will be achieved by integrating different data types as well as perspectives

Page 6

DRAFT

DO NOT DISTRIBUTE

of different disciplines, rather than just integrating methods in the same discipline, which often boil down

to different approaches to the analysis of similar data. Being able to summarize the information in these

systems in terms of simple patterns and corresponding laws of evolution that correlate to phenomena of

interest, e.g., topics, functional annotations, formal and informal role in operational processes within an

organization, and so on, is a fundamental problem of science and applications today. In general, summa-

rization is sought as a mean to facilitate organization, sorting, ranking, and search. Getoor and Diehl (2005)

organize the typical goals of the analysis of complex systems according to whether they relate to the ob-

jects in the system (ranking, classification, clustering, classification), to the links (prediction, test), or to the

entire graphs that encode the interactions (sub-graph discovery, classification, generative models). Ideally

we would want an “intelligent” summary, i.e., a summary that help us organize, enhance search, ranking,

prediction of association, missing links, and at the same time maximizes predicting power for some task, or

fosters information discovery, (Airoldi et al., 2006c).

Feasible solutions to the generic tasks of summarization and complexity reduction would foster au-

tomation and knowledge discovery in large biological and genetic databases, would lead to new e-business

applications, e.g., in summarization, search, and strategic planning. Statistical foundations for the treatment

of such systems would advance our understanding of significant phenomena and anomalies that take place

in our world, and would give us the tools to cope with the information overload we face in a systematic

manner (Reddy, 1996, 2003).

1.1 Impact and Vision

This thesis presents a theory for modern applications; modeling choices and substantive issues are at the

forefront of the discussion only to the extent that they respond to needs that are rooted in the data. This is

possible because of the central role played by few essential characteristics in many complex systems.

1. On the application side there is an increasing amount of information available with a temporal or

sequential dimension to require methods that explicitly account for dynamics and evolution in the phenom-

ena under scrutiny. The approaches available to date for which solid inference mechanisms are available

include hidden Markov models and generalized state-space models. New approaches are needed, which can

account for richer evolutionary patterns. Further, there is a need for predictions based on several sources of

information that need be integrated, and a solid probabilistic approach is missing.

2. The models that can be expressed by this framework are extremely diverse and widely applicable.

Many scientific studies lead to data sets that are represented as graphs, at some level, e.g., two-mode data

lead to bipartite graphs, uni-modal data to graphs where we record relations between pairs of objects of

the same type, multi-mode data where we record relations among objects of multiple types, multi-graphs

where edges encode multivariate variables, and combination of these. Assumptions and intuitions of interest

may need be incorporated in application-specific models, but the modularity of our approach makes these

”special” modeling issues a piece of the puzzle that can be introduced on top of a basic model skeleton.

3. On the theoretical side there is a tremendous need for solid foundations. Researchers in the com-

puter science community are engaged in a dangerous (incremental) race, whereas fundamental issues such

as identifiability, model selection, distribution free tests for assessing the goodness of fit, the geometrical

understanding of allocation tasks, or the asymptotics of the family of hierarchical mixture models are com-

pletely ignored. Even limited explorations of these issues would advance the scientific understanding of the

methods. This will ultimately benefit applications by providing theoretical insights to support application-

Page 7

DRAFT

DO NOT DISTRIBUTE

specific modeling choices.

The grand vision is to establish a mature statistical theory of graphs and networks, to bridge theoretical

computer science, a largely deterministic discipline, and statistical learning by explicitly characterizing the

relation between deterministic and probabilistic solutions to problems that involve graphs and networks

shared by both disciplines. The ultimate goal is that of promoting the role of statistics in the computing

sciences and its modern applications.

1. On the statistics side, this work would provide solid foundations of a theory of hierarchical mixture

models for graphs and networks. This is missing, to the best of my knowledge, and it is worth pursuing in

its own right.

2. On the computer science side, there are multiple benefits; mostly it would promote the role of Bayesian

statistics in the theoretical computer science and data mining communities, by providing new models and

perspectives in applications of primary importance—biological sequences & networks, dynamic graphs &

networks, collections of scholarly publications, knowledge and corporate networks, linkage and homeland

security. This work will possibly foster a fast scientific progress by serving as a glue for several branches of

the literature that are poorly aware of one another.

In conclusion, recent trends and events suggest an imminent shift of focus of the research community

at large towards complex interacting dynamic systems, along with a rediscovered mindset that through

integration we can finally deliver satisfactory solutions to long-standing real-world problems, and create

new applications. This thesis will present theoretical musings derived from applications for applications,

in order to provide us with insights and understanding on when and why the methods we employ work on

a case by case basis. Specifically, I will discuss models and methods that enable applications to biological

databases, scholarly publications, and knowledge and corporate databases. I will present successes stories

where my methods were key to answer real-world problems. I will introduce a theory of these complex

systems and discuss how this is crucial to the success of such endeavors.

1.2 A Case for Probabilistic Models

In the past century, statistical theory has transformed the social, biological and medical sciences, (Fienberg,

1999). A substantial portion of the recent successes in algorithmic and theoretical computer science has

been achieved by means of probabilistic methods and randomized algorithms, (Habib et al., 1998). There is

little doubt that a modern theory of complex systems should have solid foundations in statistical theory.

One major advantage of a probabilistic perspective is the central role played by models in separating

associations and patterns from the typically large amount of noise that characterizes complex systems. Fur-

thermore, for every deterministic algorithm it is typically possible to posit a probabilistic model, which has

something akin to a maximum likelihood solution that coincides with the outcome of the deterministic algo-

rithm,

and in practical applications soft constraints typically are more desirable than hard constraints, e.g.,

must link, must-not link.

Being able to distinguish between the presence/absence of associations is very different than being able

The differences here are two. Computationally, the deterministic algorithm does not “spend time” to assess the quality of its

answer, whereas a probabilistic methods encodes a scoring function, i.e., typically a penalized likelihood, that is used to assess the

quality of its answer, and it takes a little more computation. Analytically, the constraints imposed by a probabilistic models are soft

rather than hard; the latter can be both an advantage or a disadvantage, depending on the implications for the application at hand.

Page 8

DRAFT

DO NOT DISTRIBUTE

to assess whether the association suggested by the algorithm may be the outcome of pure chance. This is

an important difference, that is crucial in applications where an automated decision informs the actions of

human personnel, e.g., airport security. Similarly, manually curated data in biology and genetics tend to

be expensive both in terms of time and resources, and it is common practice to use automated screening

methods to rank the measurements of the data gathered via high-throughput techniques before proceeding

with manual testing.

I conclude by invoking the arguments that the nature of human reasoning is probabilistic (Evans et al.,

1993; Leighton and Sternberg, 2003), and that a statistical theory of complex systems founded on proba-

bilistic generative models is natural (Tenenbaum, 1999) and easy to interpret. For example, the way we

understand “social closeness” is through a cascade of implications that hold often times, rather than ex-

actly. In the proposed framework of this thesis, the cascade of implications is translated into a hierarchy of

probabilistic assumptions, and a notion of distance is then obtained from the model, in some way.

2 A Brief History of Modeling Approaches to Graphs and Networks

Models for graphs of various types are scattered across research in the social, physical, mathematical, sta-

tistical, and computing sciences. In this review of the literature, I emphasize those statistical models that

attempt to express the dependencies between objects in the system, in some sense.

Notation. We denote random quantities (scalars, vectors and matrices) with uppercase letters and their

realizations with lowercase letters, e.g., Pr (Y = y) = 1, and we will specify the dimensionality of various

quantities in the text. A graph is a pair G = (V,E) where V vertices, which correspond to objects of interest,

and E edges. Graphs represent relations measured on pairs of vertices; we index the random quantities that

encode such (possibly multivariate) measurements consistently, e.g., Y

. If measurements related to more

than two vertices we will expand the subscript list accordingly, e.g., Y

mno

denotes the random quantity

that encode an observation of three vertices. We will use superscript to denote elements of a multivariate

random quantity, e.g., if Y

is a matrix we will denote its generic element with Y

[ij]

. Although the

temporal dimension can be seen as just another dimension in the random elements at hand, e.g., the generic

element of a sequence of adjacency matrices can be denoted with Y

[ij1[

[ij2]

,... ,Y

[ijT]

, it is convenient

for our purposes to decouple the temporal dimension from the other dimensions. To this extent, we will

denote the generic element of a temporal sequence of adjacency matrices with a separate set of indices with

[ij](1)

[ij](2)

,... ,Y

[ij](T)

, and the sequence of matrices as Y

(1)

(2)

,... ,Y

(T)

. Throughout the text we

will reserve the letter Y to denote measurements on sets of vertices and the letter X to denote measurements

on the the vertices themselves; one major distinction in the models discussed below is, in fact, the search

for patterns among objects, where node ID matters, versus the search for patterns among edges, where the

similarity of patterns is independent of the sets nodes involved. In those few cases where the semantics

of the problem require a separate treatment for different types of measurements associated with vertices or

edges special notation will be introduced.

2.1 Static Graphs

The works in this section take as input measurements about objects that start from a network as given.

Page 9

DRAFT

DO NOT DISTRIBUTE

Exponential Random Graph Models. Under the assumption that two possible social ties are indepen-

dent only if a common actor is involved in both

Frank and Strauss (1986) devised the following character-

ization for the probability distribution of undirected Markov graphs.

{Y = y} = exp

n−1

k=1

(y) + τT(y) + ψ(θ,τ)

y ∈ Y,

(1)

where the statistics S

and T count specific structures, such as edges, triangles, and k-stars, {θ

} = θ and

τ are the parameters, and ψ(θ,τ) is the normalizing constant. Frank and Strauss (1986) worked mainly

with the three parameter models, where θ

,... ,θ

n−1

= 0. They proposed the pseudo-likelihood estima-

tion method to estimate the complete vector of parameters by maximizing the following pseudo-likelihood

function.

ℓ(θ) =

i<j

log P

= y

for all u < v,(u,v) = (i,j)} .

(2)

Wasserman and Pattison (1996) proposed the current formulation of Exponential Random Graph Models

(ERGM), also referred to as p

∗

models, as a generalization of the Markov graphs of Frank and Strauss. For

both directed and undirected graphs, they maintain a similar characterization of the probabilities where the

statistics S

and T are substituted for arbitrary statistics U. This leads to the probability functions of the

form

{Y = y} = exp θ

⊤

u(y) − ψ(θ) .

(3)

More recently, Snijders et al. (2004) have proposed a variant of these models where the major problem

of double-counting

is mitigated, but not overcome. Hunter and Handcock (2004) propose an alternative

estimation scheme that corrects parameter estimates for double-counting. This estimation procedure can be

used for models based on distributions in the curved exponential family. Park and Newman (2004) formally

characterize sensitivity issues.

Remark A. It is possible to express the current formulation of exponential random graphs using the

formalism of undirected graphical models, let us write the likelihood of an arbitrary undirected graph.

p(x|θ) =

c∈C

ψ(x

|θ

)

(4)

where x

denotes the nodes in clique c, θ

denotes the corresponding set of parameters, ψ are non-normalized

potentials over the cliques, and z =

c∈C

ψ(x

|θ

) is the normalization constant. If the likelihood is

in the exponential family, we can write:

p(x|θ) = exp θ

⊤

u(x) − log z .

Remark B. Models in this family are not “generative models” in that no assumptions are present to

explain how the sufficient statistics are generated. However, it is possible to posit a generative model that

includes exponential random graph models, or any other conditional model, as part of the emission model

(Airoldi et al., 2006c).

This is the intuitive definition of Markov property for spatial processes on a lattice in Besag (1974).

The statistics S

(y) count graph structures. Although they are not independent, i.e., they count overlapping sets of edges, they

are assumed independent in the pseudo-likelihood. Ignoring the correlations is a bad idea, and causes extreme sensitivity of the

predicted number of edges to small changes in the value of certain parameters.

Page 10

DRAFT

DO NOT DISTRIBUTE

Latent Variable Models. The notions of equivalence, structural equivalence, and blocks are introduced

by Lorrain and White (1971) and further explored by many, notably by Faust (1988). A comprehensive

treatment of models that use blocks to express the complexity of the data is given in Doreian et al. (2004). A

summary of models and notions relevant for social networks developed in the social sciences can be found

in Wasserman and Faust (1994).

Stochastic block models, the probabilistic treatment of blocks, have appeared early in the statistical sci-

ences (Holland and Leinhardt, 1975) and widely studied (Fienberg and Wasserman, 1981; Fienberg et al.,

1981; Holland et al., 1983; Fienberg et al., 1985; Wang and Wong, 1987; Wasserman and Anderson, 1987;

Anderson et al., 1992) and recently rediscovered (Snijders and Nowicki, 1997; Nowicki and Snijders, 2001),

including non-parametric treatment of the number of blocks (Kemp et al., 2004), and integration with non-

relational information to infer the blocks (Wang et al., 2005). A general stochastic block model of mixed-

membership has been recently proposed (Airoldi et al., 2005a, 2006d), along with a framework to integrate

external information of different types (Airoldi et al., 2006c), that relaxes the historical assumption of single-

membership of objects to blocks, and estimates block-to-block connectivity patterns in a Bayesian fashion.

Remark C. A general framework for integration of a different nature is described by Carley (2002).

An alternative approach latent space models, where observed interactions are projected on a latent space

through a generalized linear model (Hoff et al., 2002; Hoff, 2003a,b). Hoff et al. (2002) use MCMC to infer

latent space positions, treated as hyper-parameters. Hoff (2003a) specifies a Gaussian prior over the latent

space, thus giving to the model fully generative flavor, with the goal of modeling reciprocity.

Remark D. It is possible to posit a generative model that includes generalized linear models as part of

the emission model (Airoldi et al., 2006c). The connection between stochastic block models (SBM) and

latent space models (LSM) is more subtle, though.

Both SBM and LSM seek to define a conditional probability distribution for relations {y

} among

actors in a way that reflects some latent semantics (i.e., roles, topics, functions, etc.) of the actors. Let Z

denote a latent variable capturing the latent semantic representation of the actor n, the SBM usually defines

a generative model y

∼ f(·|Z

), of which the Z’s typically act as indicators of context-dependent

edge generating processes. On the other hand, an LSM maps the observed relation y

to some latent

semantic differences between the two actors via a regression function, of which the Z’s typically represent

the projections of the actors onto some latent metric space where their differences can be measured via

a Euclidian metric. Specifically, in LSM the Zs are multivariate/continuous, e.g., could be drawn from a

mv-normal, and their realizations indicate the position of actors in the latent space. In SBM the Zs are

multivariate/discrete, e.g., could be drawn from a multinomial(θ), and their realizations indicate which

group an actor belongs to, for each observed interaction. In other words, the dimensionality of the Zs in

SBM reflects how many latent groups to be captured in a domain, whereas in LSM the dimensionality of the

Zs does not have an explicit interpretation in terms of groups. In fact, in LSM we need to run a clustering

procedure (e.g., k-means) in the latent space where the actors are projected to, in order to decide how many

groups there are. Thus, the two types of network models are different: SBM focuses on latent membership

of each actor and underlines the importance of modeling the ”grouping” of actors, whereas LSM focuses on

latent distances and therefore stress more on modeling proper projections of actors into a latent manifold.

Hoff’s formulation of LSM is not a soft version of SBM. As a results, SBM and LSM have some orthogonal

advantages in modeling network data.

Remark E. Connections have been highlighted to MDS and other linear methods (Breiger et al., 1975),

Page 11

DRAFT

DO NOT DISTRIBUTE

to unsupervised learning, e.g., PCA, FA, (Ghahramani, 2005), and to matrix factorization (Ding, 2005;

Xing and Jordan, 2003).

Spectral Methods. Research on by Gaussian unit ensembles provides a probabilistic connection to

spectral decompositions (Metha, 2004). In the computer science literature, there is a stream of works in

this area well summarized by Saul (2005), who discusses comparison to PCA, MDS and other linear meth-

ods. Briefly, isomap (Tenenbaum et al., 2000), local linear embedding (Roweis and Saul, 2000), laplacian

eigenmaps (Belkin and Niyogi, 2002), Hessian eigenmaps (Donoho and Grimes, 2003), maximum variance

unfolding (Weinberger and Saul, 2004; Weinberger et al., 2004; Sun et al., 2006), conformal eigenmaps

(Coifman et al., 2005a,b; Lafon and Lee, 2006) and its asymptotics (Nadler et al., 2005), and the recent

reformulation of problems and solutions in terms of tensors (He et al., 2005).

Simple Models of Real-World Phenomena and their Mathematical Properties. Much of the research

across communities concerns the study of real-world graphs and their properties with the aim of building

toy models that capture such properties. For example, Newman and Park (2003) study transitivity and as-

sortative mixing (i.e., positive correlation of degrees of adjacent vertices) via group structure; Hoff (2003b)

studies transitivity, reciprocity and balance; Barabasi (2005b) studies burst and heavy tails in human dynam-

ics; Zheng et al. (2005) study the size of individuals’ social networks and means of estimating them from a

certain type of survey questions; and Ganesh et al. (2005) study the effects on epidemics of the topological

properties of graphs.

Research originating in mathematics and physics posit simple algorithms for generating graphs that

replicate observed properties, which are amenable to probabilistic analysis. Bollobas and Riordan (2002)

review few of such algorithms for popular graph types (Barabasi and Albert, 1999; Kumar et al., 2000;

Cooper and Frieze, 2003), and present an extended analysis of the “LCD” model of Buckley and Osthus

(2004). Other notable analytical investigations concern sampling, and asymptotic results. Park and Newman

(2004, 2005) give analytic solutions for the 2-star network and for clustered networks; Milo et al. (2003)

analyze sampling algorithms; Kleinberg and Kleinberg (2005) describe asymptotics of isomorphism and

embedding; Stumpf et al. (2005) find that sub-samples of scale-free graphs are not scale-free, and present

a way to study properties of a sub-sample based on moment generating functions; Flaxman et al. (2005)

describe the behavior of high degree vertices and eigenvalues in scale-free graphs; Chung and Lu (2003)

characterize average distances given expected degrees; and Caldarelli et al. (2004) study the formation of

cycles. A series of works is concerned with models and methods to find “statistically significant” motifs,

i.e., recurring edge patterns over sets of difference nodes (Berg and Lassig, 2004; Shen-Orr et al., 2002;

Milo et al., 2002; Artzy-Randrup et al., 2004; Milo et al., 2004a; Kashtan et al., 2004; Milo et al., 2004b).

Newman (2003a) portrays networks as mixtures of patterns; and Vaszquez et al. (2004) present the only in-

vestigation to date of how global patterns may arise from the composition of local ones. Few comprehensive

reviews are available, which summarize many of these findings (Barabasi et al., 1999; Albert and Barabasi,

2002; Dorogovtsev and Mendes, 2002; Newman, 2003b; Amaral and Ottino, 2004).

A notion that recently captured the attention of funding agencies and high profile journals is that of

“topology types”. Airoldi and Carley (2005) present a review and a critique of such notion. They sur-

vey generative algorithms for random graphs (Erdos and Renyi, 1960), Poisson graphs and others that lead

to heavy tails for the corresponding degree distributions (Simon, 1955; Bollobas, 1985, 2001; Barabasi,

2005b), scale-free graphs (Faloutsos et al., 1999; Barabasi and Albert, 1999; Huberman and Adamic, 1999;

Adamic and Huberman, 2000; Barabasi et al., 2000; Barabasi and Bonabeau, 2003), small-world graphs

(Milgram, 1967; Watts and Strogatz, 1998; Kleinberg, 1999a; Amaral et al., 2000; Liben-Nowell et al., 2005),

Page 12

DRAFT

DO NOT DISTRIBUTE

core-periphery graphs (Borgatti and Everett, 1999), and cellular graphs and networks (Frantz and Carley,

2005a; Airoldi and Carley, 2006). Several of these topology types are presented in heuristic terms, vaguely

consistent across communities

. Airoldi and Carley (2005) show that the slight differences in the sampling

algorithms, which generate topologies that adhere to the heuristic requirements of a specific type, are not

stable in terms of the topological properties of the graphs they lead to. That is, slight differences in the

operational definitions for the same topology type lead to separable graphs in terms of the set of common

metrics used by practitioners in the various communities

. A different set of concerns is explored in “ro-

bustness” studies, which measure the stability of topological properties of graphs and networks of specific

types to disruption and other stress situations (Borgatti et al., 2005; Frantz and Carley, 2005b). These works

are simulation studies that approach the sub-sampling issues discussed above from another perspective.

Remark F. Alternatively, being able to embed the various topology type in a smooth parametric contin-

uum (e.g., Erdos random, to small-world, to ring lattice; see Watts and Strogatz, 1998) would help under-

standing the boundaries. Unfortunately, also this strategy is not practical. There is a potpourri of necessary

conditions that have to be satisfied by such a smooth parametric continuum, which appear in the heuristic

definitions of the various topology types, e.g., the same degree distribution for all nodes or not, or short-

est path as the only notion of distance, or shortest path and metric embedding. Although it is possible to

posit a generative process that satisfies all the necessary conditions, such a generative process relies on a

non-smooth parametric continuum. Specifically, we would need to introduce in such a process a discrete

parameter that controls the number of different “probabilistic treatments” for the nodes, e.g., the number of

degree distributions. The problem is that, on one hand, the value of such a discrete parameter is difficult to

estimate. On the other hand, its correct estimation is fundamental in correctly assigning the topology type.

Ultimately, the diversity in the notions of topology types translates into the hardness of the estimation task,

upon the success of which depends our ability to discriminate among types.

To summarize, we can organize the various works according to few aspects: (a) the notion(s) of distance

between pairs of nodes that are needed; (b) the use, or not, of the descriptive statistics, as well as their

nature, i.e., local versus global; (c) the existence of dependence constraints among neighborhoods; (d)

the focus on node patterns (groups) versus edge patterns (motifs), where we do no distinguish similar edge

configurations among different sets of nodes. These aspects have to be crossed with the nature of the models:

(i) “generative” models and algorithms, both probabilistic and deterministic; (ii) models and algorithms that

contain “generative” ideas, both probabilistic and deterministic; (iii) other models and algorithms.

Problems. More works have introduced methodological innovations in the context of specific problems.

Notable research in this sense concerned how to find communities in networks Girvan and Newman (2002);

Newman (2004a,b), and in bipartite graphs (Mishra et al., 2004). To this extent, Doreian et al. (2004) sum-

marize relevant works in the social sciences and develop a theory of generalized block models. A cluster

of research is about link-mining (Domingos, 2003; Jensen, 1999; Getoor, 2003; Getoor and Diehl, 2005),

graph mining (Chakrabarti, 2005), link prediction (Getoor et al., 2002; Liben-Nowell and Kleinberg, 2003;

Goldenberg and Moore, 2004), and link ranking. (Brin and Page, 1998; Kleinberg, 1999b; Cohen et al.,

1999; Ng et al., 2001). Other notable works are concerned with the information flow within a network; the

emergence of deadlines (Papadimitriu and Servan-Schreiber, 1999), the dynamics of information (Kleinberg,

2001), the dynamics of information exchange (Dodds et al., 2003), how to maximize influence spread

A notable survey is that of Mitzenmacher (2004), who presents a brief history of power-laws and lognormal distributions, and

discusses some of their connections from a generative perspective. Newman (2005) discusses the connections among of power-laws,

the Pareto distribution, and Zipf’s law. Airoldi and Shalizi (2006) present a clear analytical overview of these connections.

Few works survey network metrics and visualization tools; notables are Carley and Reminga (2004) and (Frank, 2000).

Page 13

DRAFT

DO NOT DISTRIBUTE

(Kempe et al., 2003), decentralized information processing (Van Zandt, 1997), and decentralized search

(Kleinberg, 2000, 2004). A practical set of concerns inspired methods for entity disambiguation (Malin et al.,

2005), and classification of relational data (Macskassy and Provost, 2005). Solan et al. (2005) propose a

model to learn grammar-like rules in natural languages.

Empirical Studies. Another portion of research concerns findings that influenced the development of

theoretical aspects. Notable empirical studies include the web (Faloutsos et al., 1999; Albert et al., 1999;

Kleinberg and Lawrence, 2001), air traffic (Guimera et al., 2005), the creative enterprise (Barabasi, 2005a),

scientific collaborations (Newman, 2001), metabolic networks (Guimera and Amaral, 2005), decentralized

search in email network (Adamic and Adar, 2005), transcriptional regulatory network, (Balazsi et al., 2005),

words (Steyvers and Tenenbaum, 2005), the organization within the cell (Barabasi and Oltvai, 2004), poli-

tics (Porter et al., 2005), complex brain networks (Sporns et al., 2004), and more (Newman et al., 2002).

2.2 Dynamics and Evolution

Most existent works focus on static networks, however, there are few that consider methodology to deal with

dynamics and evolution. Notables are the stream of works on cellular automata (Ilachinski, 2001), the early

works on diffusion (Coleman et al., 1957), the treatment of dynamics with Markov-chains Monte-Carlo

(Wasserman, 1980), dynamic random fields on undirected graph (Shalizi, 2003), link-copying processes

(Kleinberg et al., 1999; Leskovec et al., 2005b,a), cascading behaviors (Watts, 2002), network tomography

and latent allocation (Airoldi and Faloutsos, 2004), dynamics in the social space (Banks et al., 2005), and

models that attempt at replicating real-world phenomena such as opinion formation (Wu and Huberman,

2005), and evolution (Doreian and Stokman, 1997).

Empirical Studies. Very few studies exist to guide theoretical developments in a dynamic setting.

Few notables are communication networks (Airoldi, 2003; Airoldi and Faloutsos, 2004), email networks

(Kossinets and Watts, 2006), nucleic acid chain dynamics (Sales-Pardo et al., 2005), and scientific collabo-

rations (Barabasi et al., 2002).

2.3 Building Graphs from Data

The works in this section share the intuition that measurements about objects are inherently noisy; the var-

ious authors attempt to model the uncertainty associated with the measurements in order to make decisions

whether two objects are related or not, and create a graph. A popular approach is that of associating a

random variable with each “object”, e.g., Bernoulli, define the process through which “observations” relate

to binary outcomes, and estimate the parameter of a Bayesian network (Heckerman, 1999) that describes

the observations best, through dependencies among objects. The estimated Bayesian network provides a

probabilistic model for the observed co-occurrences that can be used to predict missing links, or to assess

the likelihood of existing ones, (Getoor et al., 2002; Friedman and Koller, 2003; Heckerman et al., 2004;

Goldenberg and Moore, 2004; Teyssier and Koller, 2005). Important applications based on variations this

approach have been used for building recommender systems (Breese et al., 1998), social networks (Breiger,

2003), and complex cellular networks (Friedman, 2004; Segal et al., 2005).

Page 14

DRAFT

DO NOT DISTRIBUTE

2.4 Inadequacies of the Current Research

There are several dimensions that are relevant to statistical analyses of graphs and networks. Unfortunately

no single approach develops, or at least allows for, all of them

Dimensions of interest are: (a) a “proper” likelihood function; (b) the fully generative nature of the

model; (c) replicability of interesting properties at both global and local level; (d) the focus on edges or

nodes; (e) notions of distance and embedding of graphs in a metric space; (f) identifiability issues that need

be explicitly identified; (g) hierarchical relations between dyads, triangles, k-stars, k-triangles and other

basic structural (connected) components that are used to summarize and characterize an observed graph; (h)

dependencies among relevant quantities, i.e., the sufficient statistics, corresponding to a decomposition of

the observed graph into cliques or other structures of interest need be identified; (i) goodness of fit must be

assessed—current models tend to over-fit observed graphs and can not be easily extended as the observed

network grows; (j) the possibility of integrating data on different object types; and other dimensions.

With respect to this last dimension, i.e., integration of multiple object types and data about them, most

existent work tend to concern or assume specific types of data representations, e.g., temporal and sequential

data in attribute space, or relational data represented by graphs or networks. We view learning problems

along this line as “type-specific-learning” problems. Typically, one can develop solutions to type-specific-

learning problems by devising novel domain- and data-specific models and algorithms that leverage domain

knowledge and semantics of interest for particular applications. Integrating heterogeneous data types un-

der a unified model remains a challenge, however, especially for complex graphs that are simultaneously

described by intrinsically different types of characteristics, such as features in attribute space and links in

relational space (Airoldi et al., 2006c).

As we discussed in the previous section, there is a wide range of research questions that an elegant

solution to the issues above may help us answer. It is useful to keep those questions in mind in order to

guide our technical choices.

3 Overview of the Research Agenda

My previous research on these topics suggests a probabilistic framework and a general modeling approach to

complex systems. In this thesis, I plan to present such a framework and discuss its properties. In particular,

the main goal is the reconciliation of global properties of complex systems with local phenomena of interest.

My solution is to express complexity through hierarchical mixtures of simple patterns that evolve over time,

i.e., complex global behavior emerges from the combination of local interaction patterns and dynamics. I

will discuss and demonstrate how this new framework generalizes other probabilistic approaches present

in the literature, provides a soft version of the deterministic approaches, and lays the foundations of a

statistical theory of graphs and networks. The major part of my effort will then be devoted to modeling

issues that arise in conjunction to the four essential aspects listed above, e.g., integration and dynamics.

I conclude with the investigation of theoretical and computational issues, and with a characterization of

the geometrical intuitions underlying the allocation task, a major inference objective shared by the various

models encompassed by this framework.

Page 15

DRAFT

DO NOT DISTRIBUTE

3.1 Research Work to Date

Here I briefly summarize my work in dynamic networks, computational biology and text analysis. The focus

of the write up is on the lessons learned and, in particular, on the flow of ideas that constitute the Ariadne’s

thread of my research. These ideas naturally call for a science of complex systems, and portray the notion

of ”hierarchical structure” in the likelihood, and that of ”mixtures” as the central technical tools for positing

probabilistic Bayesian models that can express complex global phenomena in terms simpler local patterns,

in a generative and informative fashion, as a continuum rather than as a dichotomy.

3.1.1 Dynamic Communication Networks

With Christos Faloutsos, I developed the Dynamic Network Tomography (DNT) model in order to re-

cover origin-destination (OD) traffic flows from observable link loads, in dynamic communication networks

(Airoldi and Faloutsos, 2004). Link loads are sums of subsets of OD flows, i.e., mixtures of non-observable

quantities, so that our model is essentially a latent allocation model where the OD flows play the role of al-

location weights. Real OD flows are expensive to retrieve, thus there is the necessity for accurate estimation

methods. We were provided validation data by AT&T at first, and from the Networking Institute of Carnegie

Mellon University later.

Previous models assumed either IID observations at different epochs or a fixed dynamics for all times.

The state-of-the-art was a local likelihood model based on Gaussian assumption and IID observations, but

with a powerful parameterization that linked latent flows to observed loads uniquely (Cao et al., 2000, 2001).

I specified the dynamics as a generalized dynamic system, by positing a stable average in the long-term

while allowing some short-term variability to capture ”local” deviations, i.e., localized in time. Further,

instead of modeling the dynamic among the flows (i.e., the mixture weights) directly, I added a layer into

the hierarchical structure of the likelihood by introducing latent variables for the average flows, in order to

move the generalized dynamic system up a level, among these new variables. Therefore the final model

softly constrains the mixture weights through a hierarchical, fully generative, dynamic allocation model.

I produced inference algorithms (MCMC, particle filters, importance sampling, and hybrid schemes) for

several models based on Gamma and Log-Normal distributions. I applied these models to real traffic at

AT&T (16 time series) and at Carnegie Mellon (about 12000 time series) and achieve accuracy improve-

ments of 30-to-40% on all OD routes—in terms of ℓ

distance between estimated and true OD flows.

I then developed some general theory of dynamic allocation models. In particular, a nice feature of

our model is that the ”stochastic dynamics” not only allows for localized variability across the temporal

sequences, but also leads to more parsimonious models, in terms of the number of parameters, than the

traditional state-space formulation. I was interested in formulating a general convolution problem in order to

find the general family of distributions that could substitute for Gamma and Log-Normal analytic forms. The

issue is that a model of stochastic dynamic has to be conjugate to that of the OD flows through composition

by multiplication in order for the distributional assumptions to propagate ”exactly” through subsequent

epochs, e.g., the Gamma model does not lead to exact propagation whereas the Log-Normal model does.

The crucial property of the log-Normal model, though, is not that it is a stable limit of the product of

random variables with finite variances, as I originally conjectured, but rather that it is a infinitely divisible

distribution. In fact, infinitely divisible distributions possess properties that are sufficient to be solution to

the general convolution problem (Airoldi, 2005).

Page 16

DRAFT

DO NOT DISTRIBUTE

In conclusion, we solved a classical ill-posed inverse problem using projections, that we defined through

a hierarchical, fully generative, dynamic allocation model. The model integrates relational and temporal ob-

servations. We learned that picking one solution among the feasible ones is not good enough to recover real

traffic. In general, in cases where several likely solutions exist it is crucial to ”inform” the inference some-

how, e.g., by introducing structure in the likelihood, in order to reach a ”good” solution. Non-informative

inference will give biased estimates in such cases.

3.1.2 Large Collections of Electronic Documents and Emails

With Stephen Fienberg, I developed the Dirichlet-Poisson and Dirichlet-negative-binomial models in order

to resolve an authorship attribution problem. In particular, we were asked to produce predictions and confi-

dence bounds on whether Reagan had drafted the texts of 312 of his own radio addresses, aired between 1975

and 1979, for which the original drafts were not found, neither other evidence was available (Airoldi et al.,

2006a).

The distinguishing feature of authorship attribution problems is that ”topical” words are not useful, and

in our case spellings and punctuation were not useful either. Rather, words used for discrimination in these

problems are typically high-frequency, non-topical words, called ”function words,” as well as idioms and

expressions. Probabilistic models based on multivariate Bernoulli, multinomial and mixtures of such models

do not fit word counts satisfactorily—note that a Dirichlet mixture of multinomials (Blei et al., 2003) is

again a multinomial. Further, word counts corresponding to function words display a variance greater than

the mean, thus the Poisson model is not entirely adequate, although significantly improves the previous ones.

The negative-binomial model, essentially a gamma mixture of Poissons, fits the data best and leads to robust

and accurate predictions.

We started from standard formulation of Poisson and negative-binomial models, we then introduced

reparameterizations that allow for a natural interpretation of the parameters in the language modeling con-

text, and that introduce the notion of contagion as a model of context, i.e., the more a word is used, the more

likely it is that it will occur again, thus defining the writing style of an author. We then devised reasonable

priors for improving the estimates of rare words. In order to produce ”interpretable” predictions we also

tackled the problem of word selection. We were looking for a small set of words that captured the writing

style of Reagan, which could be recognized by experts and close collaborators of his at the Hoover Insti-

tution as ”sounding like him.” To this extent, we devised a statistical test for determining the importance

of a word, based on a smooth function of word counts, which we called “∆

statistic.” This enabled us to

apply the theory of false discovery rates in order to correct for the fact that we had to test a large amount

of words, idioms and expressions. By the end, cross validated accuracy of predictions about authors was up

high beyond 95% with only 30 to 40 words.

Given the success of our methodology, we tested our models on a large set of text classification problems,

old and new, to discover that they consistently outperformed naive Bayes mixtures and other models based

on multivariate Bernoulli and multinomial. We found that whenever the classification was driven by non-

rare words our methods ”largely” outperformed those models. We found that our ∆

statistic selected

more discriminative words than Information Gain and improved accuracy. We thus extended our statistic

to deal with multiple classes and derived an asymptotic approximations for its distribution. This enabled to

compute p-values for each word by looking up a Gaussian probability table, rather than having to draw a

large sample, thus dramatically reducing processing time and providing a feasible alternative to information

Page 17

DRAFT

DO NOT DISTRIBUTE

gain (Airoldi et al., 2005b). In two side projects, with other collaborators, I further explored the power of the

∆

statistic in finding associations among words, with the final goal of extracting sentiments and detecting

fraudulent intent in emails. I found that ∆

led to Markov blankets that encoded sentiments better those

obtained with Chi-Square and G-Square tests. ∆

is more robust to sparse counts (Airoldi et al., 2006b,

2005c).

In conclusion, we developed probabilistic solutions that improved performance on a wide range of practi-

cal problems using models of contagion and approximate inference. In our formulation, models of contagion

are flexible hierarchical mixture models, e.g., negative-binomial is essentially Gamma mixture of exchange-

able Poissons with a Dirichlet prior on its parameters. The asymptotic formulas for our statistical test were

crucial in reducing processing time in computing p-values, and now the ∆

statistic provides a feasible al-

ternative to information gain. There is more theoretical value in the sum/ratio type of reparameterizations,

which maintain tractability, allow to condition on observable quantities in mixture models of contagion, and

provide a bridge to understand the geometry of such models in the simplex—more below.

3.1.3 Biological Sequences and Networks

With Stephen Fienberg, I developed a soft-clustering version of the Dirichlet-Poisson model for the serial

analysis of gene expression data. The problem is that of recovering non-observable functional processes

that express the observed sequential profiles. In this mixture model, we assume that each gene tag may be

expressed under more than one functional process. The Poisson model provides better fit for gene expres-

sion profiles. The notions of ”contagion” and latent ”functional context” entailed by the model are very

compelling in this application domain. On mouse retina data, our model recovered a richer set of sequential

profiles than latent Dirichlet allocation (Blei et al., 2003), the analysis indicated that less latent profiles are

needed to express the observed gene profiles, and we obtained a better log-likelihood plateau for the same

number of parameters.

With Stephen Fienberg, Eric Xing and David Blei, I developed a latent mixture model of complex data,

in order to integrate gene expression (GE) data with protein-protein interactions (PPI). The problem is still

the same, i.e., we want to recover non-observable functional processes that supposedly drive the observable

interaction patterns among proteins. We posit a hierarchy of probabilistic assumptions that encodes a simple

intuition: proteins that interact with others according to similar patterns are likely to participate in the

same latent functional processes. For the relational portion of the data, we posit a stochastic block mixture

model where each protein may participate in multiple non-observable functions. We developed a ”nested”

variational algorithm for the relational portion of the model that is parallelizable and scales in terms of

memory to very-large matrices Preliminary experiments on PPI in Yeast suggested that our model is a

powerful tool for filtering outcomes of high-throughput experiments starting from a small sample of hand-

curated functional annotations. Our model predicted correctly more than 50% of the functional annotations

for unknown proteins, in cross-validation experiments (Airoldi et al., 2005a, 2006d).

These projects are still being completed and we are now at the stage where we need to integrate GE

and PPI observations on overlapping sets of proteins, under the assumption that both data sources provide

information on the same set of non-observable functional processes. An important lesson we learned is that

conditioning, identifiability and robustness are tied in these models, and a simple reparameterization may

have profound consequences. For example, in the model for GE the sum/ratio parameterization is crucial

in that it leads to closed formula updates for the variational inference. Further, conditioning on observable

Page 18

DRAFT

DO NOT DISTRIBUTE

quantities pushes the log-likelihood to a better plateau than competing models (Airoldi et al., 2006f). In the

model for PPI we conjecture that conditioning, e.g., on observable density of the interaction matrix, will

lead to more robust inferences by eliminating penalizing local optima that would bias the estimates.

3.2 Proposed Research

The proposed research advances science by providing theoretical understanding, geometrical intuitions, ap-

proximations and computational shortcuts, and an array of fresh modeling ideas for temporal and relational

components of systems of interest, along with strategies to integrate them. In particular, we propose prob-

abilistic models that posit a hierarchy of probabilistic assumptions about how the data interact, which we

compose in a generative fashion, i.e., data are characterized as arising from probabilistic algorithms that are

completely specified by few parameters. See Table 1 for an overview of my research agenda.

In my thesis I plan to develop foundations of a statistical theory of random graphs and networks in three

main directions.

1. Graphs and Networks

(a) Pure Topology Types. In this part I survey the different processes that explain both global and

local properties (Airoldi and Carley, 2005). I then show how to combine a basic probabilistic

model where the edges are exchangeable with probabilistic generative modules to replicate the

properties of interest with high probability. I plan to explore cellular networks in more details.

(b) Discovering Latent Motifs. In this part I show how to use latent variable models to infer ba-

sic motifs using admixtures and a hierarchy of probabilistic assumptions. Admixtures at the

node level serve the purpose of expressing the complexity in the data from simple motifs. The

hierarchy of probabilistic assumption serves to purpose of targeting latent motifs that exhibit

regularities of interest, and provides a unified framework for the analysis of one-, two-, and

multiple-mode data (Erosheva and Fienberg, 2005; Airoldi et al., 2006d).

2. Graph Evolution and Dynamics

(a) Birth-Death Processes. In this part I show how to posit a birth-death process in the latent space,

and I show how to make inference. A latent birth-death process is not only a convenient and

elegant way of enhancing the complexity that any given set of motifs can express, but it also

provides a fundamental advance in modern time series analysis (Xing, 2005).

3. Complex Graphs and Networks

(a) Integration and Prediction. In this part I show how to integrate the different models in a modular

fashion. Integration provides to our model the ability to share statistical power among several

sources of information, all of which tell us something important about the characteristics of

the system, which relate to a particular set of attributes of primary interest. The main purpose

of the integration, in fact, is to infer latent motifs that are tailored for predicting the target set

of attributes in the system, for which only a small amount of (possibly) partial information is

available (Airoldi et al., 2006c). I plan to develop an example in detail to demonstrate the theory.

Page 19

DRAFT

DO NOT DISTRIBUTE

1. Graphs and Networks

(a) Pure Topology Types

i. Stability and Separability of Metric Embeddings

ii. Exchangeability and Basic Likelihoods

iii. Canonical Bases for Graph Generation

iv. Example: Cellular Networks

(b) Discovering Latent Motifs

i. Mixtures of Connection Sub-Graphs

ii. Stochastic Block Models of Mixed Membership

iii. Latent Space Models of Mixed Membership

iv. Diffusion Models

2. Evolution and Dynamics

(a) Order-Dependent Likelihoods

i. Laws of Evolution

ii. Diffusion Models Revisited

(b) Birth-Death Processes

i. Mixtures of Hidden Markov Models

ii. Mixtures of State-Space Models

3. Integration and Complexity

(a) Multi-Graphs, Hyper-Graphs, and the Meta-Matrix

(b) Example Applications

i. Dynamic Graphs and Networks

ii. Biological Sequences and Networks

iii. Textual Documents and Word Networks

4. Theoretical Issues

(a) Hierarchical (Bayesian) Mixture Models

i. Identifiability

ii. Exchangeability

iii. Model Selection Strategies

(b) Latent Allocation as Projection

i. Projective Geometry in the Exponential Family

(d) Gaussian Unit Ensembles

(e) Approximate Inference Techniques

Table 1: Overview of the research agenda.

Page 20

DRAFT

DO NOT DISTRIBUTE

I also plan to explore theoretical and computational issues such as approximate inference techniques,

identifiability of parameters in complex models, model selection strategies for the number of latent patterns

(Airoldi et al., 2006e), a characterization of latent allocation tasks in terms of projections, sub-sampling for

large graphs, connections to spectral theory via Gaussian ensembles, and remarks on the asymptotics of the

models encompassed in this framework.

3.3 Time to Completion

E (Time) Ref. Research to be completed

Ideal venues

6 weeks

The emergence of cellular structures

1 science / pnas

6 weeks

Stochastic block models of mixed membership

1 jmlr / ba / jasa / jrss(b)

16 weeks

Latent birth-death processes

2 jasa / jrss(b)

8 weeks

Integration and prediction in complex biological systems 1 ba, 1 valencia

Expected completion in November 2006

References

Adamic, L. A., E. Adar. 2005. How to search a social network. Social Networks 27 187–203.

Adamic, L. A., B. A. Huberman. 2000. Comment on power-law distribution of the world wide web. Science

287 2115a.

Airoldi, E. M. 2003. Advances in network tomography. Tech. Rep. CMU-CALD-03-101, Carnegie Mellon

University.

Airoldi, E. M. 2005. Latent allocation models for dynamic network analysis: An application to network

tomography. Manuscript.

Airoldi, E. M., A. G. Anderson, S. E. Fienberg, K. K. Skinner. 2006a. Who wrote Ronald Reagan radio

addresses? Bayesian Analysis 1 289–320.

Airoldi, E. M., X. Bai, R. Padman. 2006b. Sentiment extraction from unstructured texts: Markov blankets

and meta-heuristic search. Lecture Notes in Computer Science, vol. 3932. Springer-Verlag.

Airoldi, E. M., D. M. Blei, S. E. Fienberg, E. P. Xing. 2006c. Latent mixed-membership allocation models

of relational and multivariate attribute data. Valencia & ISBA Joint World Meeting on Bayesian Statistics.

Forthcoming.

Airoldi, E. M., D. M. Blei, S. E. Fienberg, E. P. Xing. 2006d. Mixed membership stochastic block models for

relational data with application to protein-protein interactions. Proceedings of ENAR Annual Meetings.

Forthcoming.

Airoldi, E. M., D. M. Blei, E. P. Xing, S. E. Fienberg. 2005a. A latent mixed-membership model for

relational data. Workshop on Link Discovery: Issues, Approaches and Applications. In conjunction with

the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Page 21

DRAFT

DO NOT DISTRIBUTE

Airoldi, E. M., K. M. Carley. 2005. Sampling algorithms for pure network topologies: Stability and separa-

bility of metric embeddings. ACM SIGKDD Explorations 7 13–22.

Airoldi, E. M., K. M. Carley. 2006. The emergence of cellular structures. Manuscript.

Airoldi, E. M., W. W. Cohen, S. E. Fienberg. 2005b. Bayesian models for frequent terms in text. Proceedings

of the Classification Society of North America and INTERFACE Annual Meetings.

Airoldi, E. M., C. Faloutsos. 2004. Recovering latent time-series from their observed sums: network to-

mography with particle filters. Proceedings of the ACM SIGKDD International Conference on Knowledge

Discovery and Data Mining, vol. 10. 30–39.

Airoldi, E. M., S. E. Fienberg, C. Joutard, T. M. Love. 2006e. Model comparison and choice for hierarchical

Bayesian mixed-membership models with application to scientific publications. Forthcoming.

Airoldi, E. M., B. Malin, L. A. Sweeney. 2005c. X-RAY: Technologies to defeat identity theft vulnerabilities

related to e-mail requests. Proceeding of the AAAI Spring Symposium.

Airoldi, E. M., C. R. Shalizi. 2006. The analytical nexus from Zipf’s law to lognormal distributions.

Manuscript.

Airoldi, E. M., E. P. Xing, S. E. Fienberg. 2006f. Contagion processes for soft clustering: Learning latent

themes from frequently co-occurring events. Manuscript.

Albert, R., A. L. Barabasi. 2002. Statistical mechanics of complex networks. Rebiews of Modern Physics

74.

Albert, R., H. Jeong, A.-L. Barabasi. 1999. Diameter of the world wide web. Nature 401 130.

Amaral, L. A. N., J. M. Ottino. 2004. Complex networks. European Physics Journal B 38 147–162.

Amaral, L. A. N., A. Scala, M. Barthelemi, H. E. Stanley. 2000. Classes of small-world networks. Proceed-

ings of the National Academy of Sciences 97 11149–11152.

Anderson, C. J., S. Wasserman, K. Faust. 1992. Building stochastic blockmodels. Social Networks 14

137–161.

Artzy-Randrup, Y., S. J. Fleishman, N. Ben-Tal, L. Stone. 2004. Comment on “Network motifs: Simple

building blocks of complex networks” and “Superfamilies of evolved and designed networks”. Science

305 1107c.

Balazsi, G., A. L. Barabasi, Z. N. Oltvai. 2005. Topological units of environmental signal processing in the

transcriptional regulatory network of Escherichia coli. Proceedings of the National Academies of Science

102 7841–7846.

Banks, H. T., A. F. Karr, H. K. Nguyen, J. R. Samuels, Jr. 2005. Sensitivity to noise variance in a social

network dynamics model. 2005 2005-10, Statistical and Applied Mathematical Sciences Institute.

Barabasi, A. L. 2005a. Network theory—the emergence of the creative enterprise. Science 308 639–641.

Barabasi, A. L. 2005b. The origins and heavy tails in human dynamics. Nature 435 207–211.

Page 22

DRAFT

DO NOT DISTRIBUTE

Barabasi, A. L., R. Albert. 1999. Emergence of scaling in random networks. Science 286 509–512.

Barabasi, A. L., R. Albert, H. Jeong. 1999. Mean-field theory for scale-free random networks. Physyca A

272 173–187.

Barabasi, A. L., R. Albert, H. Jeong, G. Bianconi. 2000. Response to power-law distribution of the world

wide web. Science 287 2115a.

Barabasi, A. L., E. Bonabeau. 2003. Scale-free networks. Scientific American 50–59.

Barabasi, A. L., H. Jeong, Z. Neda, E. Ravasz, A. Schubert, T. Vicsek. 2002. Evolution of the social network

of scientifc collaborations. Physica A 311 590–614.

Barabasi, A. L., Z. N. Oltvai. 2004. Network biology: Understanding the cell’s functional organization.

Nature Reviews Genetics 5 101–113.

Belkin, M., P. Niyogi. 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering.

Advances in Neural Information Processing Systems, vol. 14.

Berg, J., M. Lassig. 2004. Local graph alignment and motif search in biological networks. Proceedings of

the National Academy of Sciences 101 14689–14694.

Besag, J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal

Statistical Society, Series B .

Blei, D. M., A. Ng, M. I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research

3 993–1022.

Bollobas, B. 1985. Random Graphs. Academic Press, London.

Bollobas, B. 2001. Random Graphs. 2nd ed. Academic Press, New York.

Bollobas, B., O. Riordan. 2002. Mathematical results on scale-free graphs. S. Bornholdt, H. Schuster, eds.,

Handbook of graphs and networks. Wiley-VCH.

Borgatti, S. P., K. M. Carley, D. Krackhardt. 2005. Robustness of centrality measures under conditions of

imperfect data. Social Networks Forthcoming.

Borgatti, S. P., M. G. Everett. 1999. Models of core / periphery structures. Social Networks 21 375–395.

Breese, J., D. Heckerman, C. Kadie. 1998. Empirical analysis of predictive algorithms for collaborative

filtering. Uncertainty in Artificial Intelligence, vol. 14.

Breiger, R. L. 2003. Emergent themes in social network analysis: Results, challenges, opportunities. Dy-

namic Social Network Modeling and Analysis: Workshop Summary and Papers.

Breiger, R. L., S. A. Boorman, P. Arabie. 1975. An algorithm for clustering relational data with applica-

tions to social network analysis and comparison to multidimensional scaling. Journal of Mathematical

Psychology 12 328–383.

Brin, S., L. Page. 1998. The anatomy of a large-scale hypertextual (Web) search engine. Proceedings of the

International World Wide Web Conference, vol. 7.

Page 23

DRAFT

DO NOT DISTRIBUTE

Buckley, P. G., D. Osthus. 2004. Popularity based random graph models leading to a scale-free degree

sequence. Discrete Mathematics 282 53–68.

Caldarelli, G., R. Pastor-Satorras, A. Vespignani. 2004. Structure of cycles and local ordering in complex

networks. European Physics Journal B 38 183—186.

Cao, J., D. Davis, S. Van Der Viel, B. Yu. 2000. Time-varying network tomography: router link data.

Journal of the American Statistical Association 95 1063–75.

Cao, J., D. Davis, S. Van Der Viel, B. Yu, Z. Zu. 2001. A scalable method for estimating network traffic

matrices from link counts. Tech. rep., Bell Labs.

Carley, K. M. 2002. Smart agents and organizations of the future. L. Lievrouw, S. Livingstone, eds., The

Handbook of New Media. 206–220.

Carley, K. M., J. Reminga. 2004. ORA: Organizational Risk Analyzer. Available for download at

http://www.casos.cs.cmu.edu/projects/ora/.

Chakrabarti, Deepayan. 2005. Tools for large graph mining. Ph.D. thesis, School of Computer Science,

Carnegie Mellon University.

Chung, F., L. Lu. 2003. The average distances in random graphs with given expected degrees. Internet

Mathematics 1 91–114.

Cohen, William C., Robert E. Schapire, Yoram Singer. 1999. Learning to order things. Journal of Artificial

Intelligence Research 10 243–270.

Coifman, R. R., S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, S. W. Zucker. 2005a. Geometric

diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings

of the National Academy of Sciences 102 7426–7431.

Coifman, R. R., S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, S. W. Zucker. 2005b. Geometric

diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods. Proceed-

ings of the National Academy of Sciences 102 7432–7437.

Coleman, J., E. Katz, H. Menzel. 1957. The diffusion of an innovation among physicians. Sociometry 20

253–270.

Cooper, C., A. M. Frieze. 2003. A general model of web graphs. Random Structures and Algorithms 22

311–335.

Ding, C. 2005. Principal component analysis and matrix factorizations for learning. International Confer-

ence on Machine Learning. Tutorial.

Dodds, P. S., D. J. Watts, C. F. Sabel. 2003. Information exchange and the robustness of organizational

networks. Proceedings of the National Academy of Sciences 100 12516–12521.

Domingos, P. 2003. Prospects and challenges for multirelational data mining. SIGKDD Explorations 5

80–83.

Donoho, D. L., C. Grimes. 2003. Hessian eigenmaps: Locally linear embedding techniques for high-

dimensional data. Proceedings of the National Academy of Sciences 100 5591–5596.

Page 24

DRAFT

DO NOT DISTRIBUTE

Doreian, P., V. Batagelj, A. Ferligoj. 2004. Generalized Blockmodeling. Cambridge University Press.

Doreian, P., F. N. Stokman, eds. 1997. Evolution of Social Networks. Gordon and Breach Publishers.

Dorogovtsev, S. N., J. F. F. Mendes. 2002. Evolution of networks. Advances in Physics 51 1079–1187.

Erdos, P., A. Renyi. 1960. On the evolution of random graphs. Publications of the Mathematical Institute

of the Hungarian Academy of Sciences 5 17–61.

Erosheva, E., S. E. Fienberg. 2005. Bayesian mixed membership models for soft clustering and classifica-

tion. C. Weihs, W. Gaul, eds., Classification—The Ubiquitous Challenge. Springer-Verlag, 11–26.

Evans, J. St. B. T., S. E. Newstead, R. M. J. Bynre. 1993. Human Reasoning. Psychology Press.

Faloutsos, M., P. Faloutsos, C. Faloutsos. 1999. On power-law relationships of the internet topology. Pro-

ceedings of the ACM SIGCOMM Conference. 251–261.

Faust, K. 1988. Comparison of methods for positional analysis: Structural and general equivalences. Social

Networks 10 313–341.

Fienberg, S. E. 1999. History of science: Statistics in the social world. Science 283 2025.

Fienberg, S. E., M. M. Meyer, S. Wasserman. 1981. Analyzing data from multivariate directed graphs: An

application to social networks. Interpreting Multivariate Data. New York: Wiley, 289–306.

Fienberg, S. E., M. M. Meyer, S. Wasserman. 1985. Statistical analysis of multiple sociometric relations.

Journal of the American Statistical Association 80 51–67.

Fienberg, S. E., S. Wasserman. 1981. Categorical data analysis of single sociometric relations. S. Leinhardt,

ed., Sociological Methodology. San Francisco: Jossey-Bass, 156–192.

Flaxman, A., A. Frieze, Trevor Fenner. 2005. High degree vertices and eigenvalues in the preferential

attachment graph. Internet Mathematics 2.

Frank, O. 2000. Structural plots of multivariate binary data. Journal of Social Structure 1 1–19.

Frank, O., D. Strauss. 1986. Markov graphs. Journal of the American Statistical Association 81 832–842.

Frantz, T., K. M. Carley. 2005a. A formal characterization of cellular networks. Tech. Rep. CMU-ISRI-05-

109, School of Computer Science, Canregie Mellon University.

Frantz, T., K. M. Carley. 2005b. Relating network topology to the robustness of centrality measures. Tech.

Rep. CMU-ISRI-05-117, School of Computer Science, Canregie Mellon University.

Friedman, N. 2004. Inferring cellular networks using probabilistic graphical models. Science 303 799–805.

Friedman, N., D. Koller. 2003. Being bayesian about bayesian network structure: A bayesian approach to

structure discovery in bayesian networks. Machine Learning 50 95–125.

Ganesh, A., L Massoulie, D. Towsley. 2005. The effect of network topology on the spread of epidemics.

Proceedings of the 24th IEEE INFOCOM Annual Conference.

Getoor, L. 2003. Link mining: A new data mining challenge. SIGKDD Explorations 5 84–89.

Page 25

DRAFT

DO NOT DISTRIBUTE

Getoor, L., C. Diehl. 2005. Link mining: A survey. ACM SIGKDD Explorations 7 3–12.

Getoor, L., N. Friedman, D. Koller, B. Taskar. 2002. Learning probabilistic models with link uncertainty.

Journal of Machine Learning Research .

Ghahramani, Z. 2005. Unsupervised learning. O. Bousquet, G. Raetsch, U. von Luxburg, eds., Advanced

Lectures on Machine Learning, vol. LNAI 3176. Springer-Verlag.

Girvan, M., M. E. J. Newman. 2002. Community structure in social and biological networks. Proceedings

of the National Academy of Sciences 99 7821—7826.

Goldenberg, A., A. W. Moore. 2004. Tractable learning of large bayes net structures from sparse data.

Proceedings of the Intenational Conference on Machine Learning, vol. 21.

Guimera, R., L. A. N. Amaral. 2005. Functional cartography of complex metabolic networks. Nature 433

895–900.

Guimera, R., S. Mossa, A. Turtschi, L. A. N. Amaral. 2005. The worldwide air transportation net-

work: Anomalous centrality, community structure, and cities’ global roles. Proceedings of the National

Academy of Sciences 102 7794—7799.

Habib, M., C. McDiarmid, J. Ramirez-Alfonsin, B. Reeds, eds. 1998. Probabilistic Methods for Algorithmic

Discrete Mathematics. Springer.

He, X., D. Cai, P. Niyogi. 2005. Tensor subspace analysis. Advances in Neural Information Processing

Systems, vol. 18.

Heckerman, D. 1999. A tutorial on learning with Bayesian networks. M. Jordan, ed., Learning in Graphical

Models. MIT Press.

Heckerman, D., C. Meek, D. Koller. 2004. Probabilistic models for relational data. Tech. Rep. MSR-TR-

2004-30, Microsoft Research.

Hoff, P. D. 2003a. Bilinear mixed effects models for dyadic data. Tech. Rep. 32, University of Washigton,

Seattle.

Hoff, P. D. 2003b. Random effects models for network data. R. Breiger, K. Carley, P. Pattison, eds.,

Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers. National Academies

Press, 303–312.

Hoff, P. D., A. E. Raftery, M. S. Handcock. 2002. Latent space approaches to social network analysis.

Journal of the American Statistical Association 97 1090–1098.

Holland, P., K. B. Laskey, S. Leinhardt. 1983. Stochastic blockmodels: Some first steps. Social Networks 5

109–137.

Holland, P. W., S. Leinhardt. 1975. Sociological Methodology, chap. Local structure in social networks.

Jossey-Bass, 1–45.

Holzle, Urs. 2005. Petabyte processing made easy. Advances in Neural Information Processing Systems.

Invited Talk.

Page 26

DRAFT

DO NOT DISTRIBUTE

Huberman, B. A., L. A. Adamic. 1999. Growth dynamics of the world-wide web. Nature 401 131.

Hunter, D., M. Handcock. 2004. Inference in curved exponential family models for networks. Tech. Rep.

TR0402, Department of Statistics, Penn State University.

Ilachinski, A. 2001. Cellular Automata: A Discrete Universe. World Scientific.

Jensen, D. 1999. Statistical challenges to inductive inference in linked data. Proceedings of the 17th

International Workshop on Artificial Intelligence and Statistics.

Kashtan, N., S. Itzkovitz, R. Milo, U. Alon. 2004. Efficient sampling algorithm for estimating subgraph

concentrations and detecting network motifs. Bioinformatics 20 1746–1758.

Kemp, C., T. L. Griffiths, J. B. Tenenbaum. 2004. Discovering latent classes in relational data. Tech. Rep.

AI Memo 2004-019, MIT.

Kempe, D., J. Kleinberg, E. Tardos. 2003. Maximizing the spread of influence through a social network.

Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Min-

ing.

Kleinberg, J. 1999a. The small-world phenomenon: An algorithmic perspective. Tech. Rep. 99-1776,

Department of Computer Science, Cornell University.

Kleinberg, J. 2000. Navigation in a small world. Nature 845.

Kleinberg, J. 2001. Small-world phenomena and the dynamics of information. Advances in Neural Infor-

mation Processing Systems 14.

Kleinberg, J. 2004. The small-world phenomenon and decentralized search. SIAM News 37.

Kleinberg, J., S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins. 1999. The web as a graph: Mea-

surements, models and methods. Proceedings of the International Conference on Combinatorics and

Computing. 1–17.

Kleinberg, J., S. Lawrence. 2001. The structure of the web. Science 294 1849–1850.

Kleinberg, J. M. 1999b. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 604–

632.

Kleinberg, R., J. Kleinberg. 2005. Isomorphism and embedding problems for infinite limits of scale-free

graphs. Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms.

Kossinets, G., D. J. Watts. 2006. Empirical analysis of an evolving social network. Science 311 88–90.

Kumar, R., P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. 2000. Random graph models

for the web graph. Annual Symposium on Foundations of Computer Science. 57–65.

Lafon, S., A. B. Lee. 2006. Diffusion maps and coarse-graining: A unified framework for dimensionality

reduction, graph partitioning and data set parameterization. Manuscript.

Leighton, J. P., R. J. Sternberg, eds. 2003. The Nature of Reasoning. Cambridge University Press.

Page 27

DRAFT

DO NOT DISTRIBUTE

Leskovec, J., D. Chakrabarti, J. Kleinberg, C. Faloutsos. 2005a. Realistic, mathematically tractable graphh

generation and evolution, using Kronecker multiplication. Proceedings of the 9th European Conference

on Principles and Practice of Knowledge Discovery in Databases.

Leskovec, J., J. Kleinberg, C. Faloutsos. 2005b. Graphs over time: Densification laws, shrinking diam-

eters and possible explanations. Proceedings of the 11th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining.

Liben-Nowell, D., J. Kleinberg. 2003. The link prediction problem for social networks. Proceedings of the

12th Annuam ACM International Conference on Knowledge Management. 556–559.

Liben-Nowell, D., J. Novak, R. Kumar, P. Raghavan, A. Tomkins. 2005. Geographic routing in social

networks. Proceedings of the National Academy of Sciences 102 11623—11628.

Lorrain, F., H. C. White. 1971. Structural equivalence of individuals in social networks. Journal of Mathe-

matical Sociology 1 49–80.

Macskassy, S. A., F. Provost. 2005. Classification in networked data: A toolkit and a univariate case study.

Tech. Rep. CeDER-04-08, Stern School of Business, New York University.

Malin, B., E. M. Airoldi, K. M. Carley. 2005. A social network analysis model for name disambiguation in

lists. Journal of Computational and Mathematical Organization Theory 11 119–139.

Metha, M. L. 2004. Ranomd Matrices. Elsevier.

Milgram, S. 1967. The small world phenomenon. Psychology Today 1.

Milo, R., S. Itzkovitz, N. Kashtan, R. Levitt, U. Alon. 2004a. Response to comment on “Network motifs:

Simple building blocks of complex networks” and “Superfamilies of evolved and designed networks”.

Science 305 1107d.

Milo, R., S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzenshtat, M. Sheffer, U. Alon. 2004b.

Superfamilies of evolved and designed networks. Science 303 1538–1542.

Milo, R., N. Kashtan, S. Itzkovitz, M. E. J. Newman, U. Alon. 2003. On the uniform generation of random

graphs with prescribed degree sequence. ArXiv Condensed Matter e-prints .

Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon. 2002. Network motifs: Simple

building blocks of complex networks. Science 298 824–827.

Mishra, N., D. Ron, R. Swaminathan. 2004. A new conceptual clustering framework. Machine Learning

Journal 56 115–151.

Mitzenmacher, M. 2004. A brief history of generative models for power law and lognormal distributions.

Internet Mathematics 1 226–251.

Nadler, B., S. Lafon, R. Coifman, I. Kevrekidis. 2005. Diffusion maps, spectral clustering and eigenfunc-

tions of fokker-planck operators. Advances in Neural Information Processing Systems, vol. 18.

Newman, M. E. J. 2001. The structure of scientific collaboration networks. Proceedings of the National

Academy of Sciences 98 404—409.

Page 28

DRAFT

DO NOT DISTRIBUTE

Newman, M. E. J. 2003a. Mixing patterns in networks. Physics Reviews E 67.

Newman, M. E. J. 2003b. The structure and function of complex networks. SIAM Review 45 167–256.

Newman, M. E. J. 2004a. Analysis of weighted networks. Physics Reviews E 70.

Newman, M. E. J. 2004b. Detecting community structure in networks. European Physics Journal B 38

321–330.

Newman, M. E. J. 2005. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics 46 323–

351.

Newman, M. E. J., J. Park. 2003. Why social networks are different from other types of networks. Physics

Reviews E 68.

Newman, M. E. J., D. J. Watts, S. H. Strogatz. 2002. Random graph models of social networks. Proceedings

of the National Academy of Sciences 99 2566—2572.

Ng, A. Y., A. X. Zheng, M. I. Jordan. 2001. Link analysis, eigenvectors, and stabiliy. Proceedings of the

International Joint Conference on Artificial Intelligence, vol. 17.

Nowicki, K., T. A. B. Snijders. 2001. Estimation and prediction for stochastic blockstructures. Journal of

the American Statistical Association 96 1077–1087.

Papadimitriu, C., E. Servan-Schreiber. 1999. The origins of the deadline: Optimizing communication in

organizations. Complexity in Economics.

Park, J., M. E. J. Newman. 2004. Solution of the 2-star model of a network. Physics Reviews E 70.

Park, J., M. E. J. Newman. 2005. Solution for the properties of a clustered network. Physics Reviews E 72.

Piatetsky-Shapiro, G. 2005. Interview with usama fayyad, yahoo chief data officer. ACM SIGKDD Explo-

rations 7 84–90.

Porter, M. A., P. J. Mucha, M. E. J. Newman, C. M. Warmbrand. 2005. A network analysis of committees

in the united states house of representatives. Proceedings of the National Academy of Sciences 102 7057–

7062.

Reddy, R. 1996. To dream the possible dream. Communications of the ACM 39 105–112.

Reddy, R. 2003. Three open problems in AI. Journal of the ACM 50 83–86.

Roweis, S. T., L. K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science

22 2323–2326.

Sales-Pardo, M., R. Guimera, A. A. Moreira, J. Widom, L. A. N. Amaral. 2005. Mesoscopic modeling for

nucleic acid chain dynamics. Physics Reviews E 71 1–13.

Saul, L. 2005. Spectral methods for dimensionality reduction. Advances in Neural Information Processing

Systems. Tutorial.

Segal, E., D. Pe’er, A. Regev, D. Koller, N. Friedman. 2005. Learning module networks. Journal of Machine

Learning Research 6 503–556.

Page 29

DRAFT

DO NOT DISTRIBUTE

Senator, T. E. 2005. Link mining applications: Progress and challenges. ACM SIGKDD Explorations 7

76–83.

Shalizi, C. R. 2003. Optimal nonlinear prediction of random fields on networks. Discrete Mathematics and

Theoretical Computer Science AB(DMCS) 11–30.

Shen-Orr, S., R. Milo, S. Mangan, U. Alon. 2002. Network motifs in the transcriptional regulation network

of Escherichia coli. Nature Genetics 31 64–68.

Simon, H. A. 1955. On a class of skew distribution functions. Biometrika 42 425–440.

Snijders, T. A. B., K. Nowicki. 1997. Estimation and prediction for stochastic blockmodels for graphs with

latent block structure. Journal of Classification 14 75–100.

Snijders, T. A. B., P. E. Pattison, G. L. Robins, M. S. Handcock. 2004. New specifications for exponential

random graph models. Sociological Methodology .

Solan, Z., D. Horn, E. Ruppin, S. Edelman. 2005. Unsupervised learning of natural languages. Proceedings

of the National Academy of Sciences 102 11629–11634.

Sporns, O., D. R. Chialvo, M. Kaiser, C. C. Higetag. 2004. Organization, development and function of

complex brain networks. Trends in Cognitive Science 8 418–425.

Steyvers, M., J. B. Tenenbaum. 2005. The large-scale structure of semantic networks: Statistical analyses

and a model of semantic growth. Cognitive Science 29.

Stumpf, M. P. H., C. Wiuf, R. M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling

properties of networks. Proceedings of the National Academy of Sciences 102 4221–4224.

Sun, J., S. Boyd, L. Xiao, P. Diaconis. 2006. The fastest mixing markov process on a graph and a connection

to a maximum variance unfolding problem. SIAM Review Forthcoming.

Tenenbaum, J. B. 1999. A bayesian framework for concept learning. Ph.D. thesis, Massachussetts Institute

of Technology.

Tenenbaum, J. B., V. de Silva, J. C. Langford. 2000. A global geometric framework for nonlinear dimen-

sionality reduction. Science 290 2319–2323.

Teyssier, M., D. Koller. 2005. Ordering-based search: A simple and effective algorithm for learning bayesian

networks. Uncertainty in Artificial Intelligence, vol. 21. 584–590.

Van Zandt, T. 1997. Decentralized information processing in the theory of organizations. M. Sertel, ed.,

Contemporary Economic Development Reviewed, vol. 4: The Enterprise and its Environment. MacMillan

Press Ltd.

Vaszquez, A., R. Dobrin, D. Sergi, J.-P. Eckmann, Z. N. Oltvai, A.-L. Barabasi. 2004. The topological

relationship between the large-scale attributes and local interaction patterns of complex networks. Pro-

ceedings of the National Academy of Sciences 101 17940—17945.

Wang, X., N. Mohanty, A. K. McCallum. 2005. Group and topic discovery from relations and text. Advances

in Neural Information Processing Systems, vol. 18.

Page 30

DRAFT

DO NOT DISTRIBUTE

Wang, Y. J., G. Y. Wong. 1987. Stochastic blockmodels for directed graphs. Journal of the American

Statistical Association 82 8–19.

Wasserman, S. 1980. Analyzing social networks as stochastic processes. Journal of the American Statistical

Association 75 280—294.

Wasserman, S., C. J. Anderson. 1987. Stochastic a posteriori blockmodels: Construction and assessment.

Social Networks 9 1–36.

Wasserman, S., K. Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University

Press.

Wasserman, S., P. Pattison. 1996. Logit models and logistic regression for social networks: I. an introduction

to markov graphs and p

∗

. Psychometrika 61 401–425.

Watts, D. J. 2002. A simple model of global cascades on random networks. Proceedings of the National

Academy of Sciences 99 5766—5771.

Watts, D. J., S. H. Strogatz. 1998. Collective dynamics of “small-world” networks. Nature 393 440–442.

Weinberger, K., L. Saul. 2004. Unsupervised learning of image manufolds by semidefinite programming.

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Weinberger, K., F. Sha, L. Saul. 2004. Learning a kernel matrix for nonlinear dimensionality reduction.

International Conference on Machine Learning.

Wu, F., B. A. Huberman. 2005. Social structure and opinion formation. Online Manuscript.

Xing, E. P. 2005. Dynamic non-parametric Bayesian models. Tech. rep., School of Computer Science,

Canregie Mellon University.

Xing, E. P., M. I Jordan. 2003. On semidefinite relaxation for normalized k-cut and connections to spectral

clustering. Tech. Rep. CSD-03-1265, Division of Computer Science, University of California at Berkeley.

Zheng, T., M. Salganik, A. Gelman. 2005. How many people do you know in prison? Journal of the

American Statistical Association Forthcoming.