*To receive email
reminders about upcoming pattern theory
seminars, send an email to:
LISTSERV@LISTSERV.BROWN.EDU
with the following line in the body of the
email:
SUBSCRIBE PTGSEMINARL
Fall 2017
September
20, 2017
121pm, 182 George St, Room 110
Graphbased Bayesian learning: continuum limits
and algorithms
Daniel SanzAlonso
Division of Applied Mathematics
Brown University
The principled
learning of functions from data is at the core of
statistics, machine learning and artificial
intelligence. The aim of this talk is to present
some new theoretical and methodological
developments concerning the graphbased, Bayesian
approach to semisupervised learning. I will show
suitable scalings of graph parameters that
provably lead to robust Bayesian solutions in the
limit of large number of unlabeled data. The
analysis relies on a careful choice of topology
and in the study of the the spectrum of graph
Laplacians. Besides guaranteeing the consistency
of graphbased methods, our theory explains the
robustness of discretized function space MCMC
methods in semisupervised learning settings. This
is joint work with Nicolas Garcia Trillos, Zachary
Kaplan, and Thabo Samakhoana.
October
11, 2017
121pm, 182 George St, Room 110
Multireference Alignment with
Nonperiodic Distribution is Easier
Nir Sharon
Program of Applied and Computational Mathematics
Princeton University
Multireference alignment is the
problem of estimating a signal from its circularly
translated copies in the presence of noise.
Recently, It was shown that if the translations
are drawn from the uniform distribution, the
number of samples needed is proportional to
1/SNR^3. We prove that for any nonperiodic
distribution, the sample complexity is actually
1/SNR^2. This rate is optimal and can be achieved
by a simple spectral algorithm. We further propose
and analyze two additional algorithms based on
nonconvex optimization and semidefinite
programming.
October
25, 2017
121pm, 182 George St, Room 110
Feature Driven Exploration of Images
Using Topology
Attila Gyulassy
Scientific Computing and Imaging Institute
University of Utah
Two  and threedimensional images
arise in a multitude of disciplines, from computer
simulations to experimental sources such as
microscopy. A key step to scientific discovery is
extracting semantic information from image data.
This talk presents practical application of Morse
theory to extract and reason with features
intrinsic to images. We will discuss both
practical algorithms for computing the MorseSmale
complex as well as realworld applications drawing
on imaged and simulated examples from material
science, basic energy research, geologic
exploration, and neuroscience.
Nov
1, 2017
121pm, 182 George St, Room 110
Sifting through Measures on Networks: From a
Theoretical Framework to an Empirical Guide
Tina EliassiRad
Northeastern University
In this talk, I will discuss two
problems on network data. (1) Measuring
tiestrength: Given a set of people and a set of
events attended by them, how should we measure
connectedness or tie strength between each pair of
persons? The underlying assumption is that
attendance at mutual events produces an implicit
social network between people. I will describe an
axiomatic solution to this problem. (2) Measuring
similarity between networks: Given two networks
(without known nodecorrespondences), how should
we measure similarity between them? This problem
occurs frequently in many realworld applications
such as transfer learning, reidentification, and
change detection. I will present an empirical
guide on how to select a networksimilarity
method, and discuss some promising results based
on topological data analysis.
Nov
8, 2017
121pm, 182 George St, Room 110
Toward perceptually consistent stereo
Todd Zickler
Harvard University
There are two sources of shape information
in a pair of binocular stereo images. One is
the correlation (matching) signal from
surfaces that are visible to both cameras.
The other is the decorrelation
(antimatching) signal from regions that are
visible to one camera but occluded in
the other. Vision science has
repeatedly shown that both types of
information are used in the
visual cortex, and that people can
perceive depth even when correlation cues
are absent or very weak, a capability
that remains absent from most computational
stereo systems.
I will
describe two research directions that
(hopefully) move us toward computational
stereo algorithms that are more
consistent with these perceptual phenomena.
Both directions are based on representing
a depth map as a piecewise smooth function
over the visual field, with a flexible
notion of smoothness. One is a scanline
algorithm that naturally combines correlation
and decorrelation cues, and that matches
human perception on a collection of wellknown
perceptual stimuli. The other is a 2D
algorithm that efficiently exploits piecewise
smoothness, but so far
without incorporating decorrelation cues.
At the end of the talk, I hope to discuss how
insights from these two different
directions might be combined. The talk
is based on two papers:
1. Juialing Wang, Daniel Glasner and Todd
Zickler:
ICCV
2017
http://vision.seas.harvard.edu/stereo/
2. Ayan
Chakrabarti, Ying Xiong, Steven J. Gortler and
Todd Zickler
CVPR 2015. http://ttic.uchicago.edu/~ayanc/consensus/
Nov
29, 2017
121pm, 182 George St, Room 110
Sequential
Bayesian inference via lowdimensional
couplings
Youseff Marzouk
MIT
Integration
against an intractable probability measure is
among the fundamental challenges of Bayesian
inference. A useful approach to this problem seeks
a deterministic coupling of the measure of
interest with a tractable "reference" measure
(e.g., a standard Gaussian). This coupling is
induced by a transport map, and enables direct
simulation from the desired measure simply by
evaluating the transport map at samples from the
reference. Approximate transports can also be used
to "precondition" standard Monte Carlo schemes.
Yet characterizing a suitable transport
mape.g., representing, constructing, and
evaluating itgrows challenging in high
dimensions.
We establish
links between the conditional independence
structure of the target measure and the existence
of certain lowdimensional couplings, induced by
transport maps that are sparse or decomposable.
Our analysis not only facilitates the construction
of couplings in highdimensional settings, but
also suggests new inference methodologies. We will
highlight applications of our theory to nonlinear
statespace models, where sparse and/or
decomposable transports yield new variational
algorithms for filtering, smoothing, and
sequential parameter inference.
This is joint
work with Alessio Spantini and Daniele Bigoni.
Dec
6, 2017
121pm, 182 George St, Room 110
Advanced
Editing, Exploration, and Interaction for Video
James Tompkin
Brown
Video best
represents how we visually perceive, and with
ubiquitous cameras everyone can capture imagery of
our world. However, our tools to edit, explore,
and interact with video often require laborious
work or expert skill, which makes it merely a
consumption medium for most people. With advances
in image understanding through computer graphics,
vision, and interaction, we can change how we
think about video: First, to transform video from
a rigid and inaccessible medium into a malleable
and creative one, so that nonexperts can make
sophisticated contentbased edits. Second, to
transform video from a linear medium viewed
sequentially into novel interactive contentbased
and contextaware visual explorations. Third, to
transform video cameras from passive observers
into essential tools for realtime creation and
interaction with digital imagery.
Spring 2017
February
1, 2017
121pm, 182 George St, Room 110
Nonequilibrium
transitions between metastable patterns in
populations of motile bacteria
Eric VandenEijnden
Professor
Courant Institute of Mathematical Sciences
New York University
Active materials can selforganize in many more
ways than their equilibrium counterparts. For
example, selfpropelled particles whose
velocity decreases with their density can
display motilityinduced phase separation
(MIPS), a phenomenon building on a
positive feedback loop in which patterns emerge
in locations where the particles slow down.
Here, we investigate the effects of intrinsic
fluctuations in the system's dynamics on MIPS,
using a field theoretic description building on
results by Cates and collaborators. We show that
these fluctuations can lead to transitions
between metastable patterns. The pathway and
rate of these transitions is analyzed within the
realm of large deviation theory, and they are
shown to proceed in a very different way than
one would predict from arguments based on
detailedbalance and microscopic
reversibility. Specifically, we show that
these transitions involve fluctuations in
diffiusivity of the bacteria followed by
fluctuations in their population, in a specific
sequence. The method of analysis proposed here,
including its numerical component, can be
used to study noiseinduced nonequilibrium
transitions in a variety of other
nonequilibrium setups, and leads to
predictions that are verifiable experimentally.
February
8, 2017
121pm, 182 George St, Room 110
Deformation
models for image and shape matching
Ronen Basri
Professor
Department of Computer Science and Applied
Mathematics
Weizmann Institute of Science
Modeling deformations is important for various
applications in computer vision, graphics and
geometry processing. In this talk I will
describe our recent progress in modeling
deformations. In particular, I will describe
methods for computing bounded distortion
transformations, locally injective maps whose
differentials' conformal distortion is bounded.
Toward this end, we developed a convex framework
for solving optimization problems over matrices
that involve functionals and constraints
expressed in terms of their extremal singular
values. In addition, I will describe methods for
computing physicallymotivated elastic maps
between shapes. We have applied these methods in
a number of challenging problems, including
feature matching between images related by
nonrigid deformation, nonrigid registration of
shape models, and computing extremal
quasiconformal maps.
March
8, 2017
121pm, 182 George St, Room 110
Tropical
Coordinates on the Space of Persistence
Barcodes
Sara Kalisnik Verovsek
Postdoctoral
Research Associate
Department of Mathematics
Data Science Initiative
Brown University
In the last two decades applied topologists have
developed numerous methods for ‘measuring’ and
building combinatorial representations of the
shape of the data. The most famous example of
the former is persistent homology. This
adaptation of classical homology assigns a
barcode, i.e. a collection of intervals with
endpoints on the real line, to a finite metric
space. Unfortunately, barcodes are not
welladapted for use by practitioners in machine
learning tasks. We can circumvent this problem
by assigning numerical quantities to barcodes
and these outputs can then be used as input to
standard algorithms. I will talk about maxplus
polynomials and tropical rational functions that
can be used as coordinates on the space of
barcodes. All of these are stable with respect
to the standard distance functions (Bottleneck,
Wasserstein) used on the barcode space.
April
19, 2017
121pm, 182 George St, Room 110
Inference
in Dynamical Systems
Sayan Mukherjee
Professor
Department of Statistical Science
Duke University
We consider the asymptotic consistency of
maximum likelihood parameter estimation for
dynamical systems observed with noise. Under
suitable conditions on the dynamical systems and
the observations, we show that maximum
likelihood parameter estimation is consistent.
Furthermore, we show how some wellstudied
properties of dynamical systems imply the
general statistical properties related to
maximum likelihood estimation. Finally, we
exhibit classical families of dynamical systems
for which maximum likelihood estimation is
consistent. Examples include shifts of finite
type with Gibbs measures and Axiom A attractors
with SRB measures. We also relate Bayesian
inference to the thermodynamic formalism in
tracking dynamical systems. We state conditions
for consistency of a Gibbs distribution using
classic ideas from dynamical systems such as
topological entropy and pressure.
April
26, 2017
121pm, 182 George St, Room 110
Numerical
Gaussian Processes (Physics Informed Learning
Machines)
Maziar Raissi
Postdoctoral
Research Associate
Department of Applied Mathematics
Brown University
We introduce the concept of numerical Gaussian
processes, which we define as Gaussian processes
with covariance functions resulting from
temporal discretization of timedependent
partial differential equations. Numerical
Gaussian processes, by construction, are
designed to deal with cases where: (1) all we
observe are noisy data on blackbox initial
conditions, and (2) we are interested in
quantifying the uncertainty associated with such
noisy data in our solutions to timedependent
partial differential equations. Our method
circumvents the need for spatial discretization
of the differential operators by proper
placement of Gaussian process priors. This is an
attempt to construct structured and
dataefficient learning machines, which are
explicitly informed by the underlying physics
that possibly generated the observed data. The
effectiveness of the proposed approach is
demonstrated through several benchmark problems
involving linear and nonlinear timedependent
operators. In all examples, we are able to
recover accurate approximations of the latent
solutions, and consistently propagate
uncertainty, even in cases involving very long
time integration.
May
3, 2017
121pm, 182 George St, Room 110
PCA
from noisy linearly transformed measurements
Amit Singer
Professor
Department of Mathematics
Princeton University
We consider the problem of estimating the
covariance of X from measurements of the form
y_i = A_i*x_i + e_i (for i = 1, . . . , n) where
x_i are i.i.d. unobserved samples of X, A_i are
given linear operators, and e_i represent noise.
Our estimator is constructed efficiently via a
simple linear inversion using conjugate gradient
performed after eigenvalue shrinkage motivated
by the spiked model in high dimensional PCA.
Applications to lowrank matrix completion, 2D
image denoising, and 3D structure classification
in single particle cryoelectron microscopy will
be discussed.
Fall 2016
September
14, 2016
121pm, 182 George St, Room 110
Discovering
the Nature of Nonlinear Relationships
Joshua Vogelstein
Assistant
Professor
Department of Biomedical Engineering
Institute for Computational Medicine
Johns Hopkins University
As data collection is becoming easier, it is
becoming increasingly difficult and important to
answer questions of the form: is one property
(e.g., clouds) related to another (e.g., grass
wetness). Only If we can determine that
these two properties are related toor
statistically dependent onone another does it
make sense to further investigate the nature of
this relationship. Unfortunately, reliably
establishing such a relationship can be
challenging, especially when the properties
themselves are complex and the relationship is
nonlinear. We here describe a procedure, called
Multiscale Generalized Correlation (MGC), that
addresses these challenges. Our key insight is
that if two properties are related, comparisons
between measurements of similar pairs of the
first property (e.g., similarly shaped clouds)
should be correlated with the comparisons
between corresponding pairs of the second
property (grass wetness under those clouds). We
demonstrate the statistical and computational
efficiency of MGC in both simulations and
theory. We then apply it to detect the
presence and nature of the relationships between
brain activity and personality, brain shape and
disorder, and brain connectivity and
creativity. Finally, we demonstrate that
MGC does not suffer from the false positives
that have plagued conventional dependence tests.
Our open source implementation of MGC is
applicable to fundamental questions confronting
science, government, finance, and many other
disciplines.
September
21, 2016
121pm, 182 George St, Room 110
Structurebased
comparisons for sequential data
Katherine Kinnaird
Postdoctoral
Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University
We present aligned hierarchies, a
lowdimensional representation for sequential
data streams. The aligned hierarchies encode all
hierarchical decompositions of repeated elements
from a highdimensional and noisy sequential
data stream in one object. These aligned
hierarchies can be embedded into a
classification space with a natural notion of
distance. We motivate our discussion through the
lens of Music Information Retrieval (MIR),
constructing aligned hierarchies by finding,
encoding, and synthesizing all repeated
structure present in a song. For a data set of
digitized scores, we conducted experiments
addressing the fingerprint task, a song
comparison task in MIR, that achieved perfect
precisionrecall values and provide a proof of
concept for the aligned hierarchies.
We also introduce aligned subhierarchies and
aligned subdecompositions. Both derived from
the aligned hierarchies, these structure based
representations for songs can be embedded into
classification spaces and can address additional
MIR tasks. We will compare properties of the
aligned hierarchies, aligned subhierarchies,
and the aligned subdecompositions.
September
28, 2016
121pm, 182 George St, Room 110
Title:
New Perspectives on Importance Sampling
Daniel Sanz Alonso
Postdoctoral
Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University
Importance sampling is a building block of many
algorithms in computational statistics, perhaps
most notably particle filters. It is the
importance sampling step that often limits the
accuracy of these algorithms. In this talk I
will introduce a new way of understanding
importance sampling based on information theory.
I will argue that the fundamental problem facing
algorithms based on importance sampling can be
understood in terms of the distance between
certain measures. The results give new
understanding on the potential use of importance
sampling and particle filters in high (possibly
infinite) dimensional spaces.
October
5, 2016
121pm, 182 George St, Room 110
Robust
Bayesian inference via coarsening
Jeffrey Miller
Assistant
Professor
Department of Biostatistics
Harvard University
The standard approach to Bayesian inference is
based on the assumption that the distribution of
the data belongs to the chosen model
class. However, even a small violation of
this assumption can have a large impact on the
outcome of a Bayesian procedure, particularly
when the data set is large. We introduce a
simple, coherent approach to Bayesian inference
that improves robustness to small departures
from the model: rather than conditioning on the
observed data exactly, one conditions on the
event that the model generates data close to the
observed data, with respect to a given
statistical distance. When closeness is
defined in terms of relative entropy, the
resulting "coarsened posterior" can be
approximated by simply raising the likelihood to
a certain fractional power, making the method
computationally efficient and easy to implement
in practice. We illustrate with real and
simulated data, and provide theoretical results.
November
2, 2016
121pm, 182 George St, Room 110
Robust
Bayesian inference via coarsening
Tamara Broderick
Assistant
Professor
Department of Electrical Engineering and
Computer Science
Massachusetts Institute of Technology
In
Bayesian analysis, the posterior follows from
the data and a choice of a prior and a
likelihood. These choices may be somewhat
subjective and reasonably vary over some range.
Thus, we wish to measure the sensitivity of
posterior estimates to variation in these
choices. While the field of robust Bayes has
been formed to address this problem, its tools
are not commonly used in practiceat least in
part due to the difficulty of calculating
robustness measures from MCMC draws. We
demonstrate that, by contrast to MCMC,
variational Bayes (VB) techniques are readily
amenable to robustness analysis. Since VB casts
posterior inference as an optimization problem,
its methodology is built on the ability to
calculate derivatives of posterior quantities
with respect to model parameters. We use this
insight to develop local prior robustness
measures for meanfield variational Bayes
(MFVB), a particularly popular form of VB due to
its fast runtime on large data sets. A potential
problem with MFVB is that it has a wellknown
major failing: it can severely underestimate
uncertainty and provides no information about
covariance. We generalize linear response
methods from statistical physics to deliver
accurate uncertainty estimates for MFVBboth
for individual variables and coherently across
variables. We call our method linear response
variational Bayes (LRVB).
November
9, 2016
121pm, 182 George St, Room 110
Joint Scientific Computing and Pattern
Theory Seminar
Solution
uncertainty quantification for differential
equations
Oksana Chkrebtii
Assistant
Professor
Department of Statistics
Ohio State University
When
models are defined implicitly by systems of
differential equations without a closed form
solution, small local errors in
finitedimensional solution approximations can
propagate into large deviations from the true
underlying state trajectory. Inference for such
models relies on a likelihood approximation
constructed around a numerical solution, which
underestimates posterior uncertainty. This talk
will introduce and discuss progress in a new
formalism for modeling and propagating
discretization uncertainty through the Bayesian
inferential framework, allowing exact inference
and uncertainty quantification for discretized
differential equation models.
November
16, 2016
121pm, 182 George St, Room 110
SingleChannel MultiSpeaker Separation using
Deep Clustering
John Hershey
Senior
Principal Research Scientist
Mitsubishi Electric Research Laboratories (MERL)
Cambridge, Massachusetts
The human
auditory system gives us the extraordinary
ability to converse in the midst of a noisy
throng of party goers. Solving this socalled
cocktail party problem has proven extremely
challenging for computers, and separating and
recognizing speech in such conditions has been
the holy grail of speech processing for more
than 50 years. Deep clustering is a recently
introduced deep learning architecture that uses
discriminatively trained embeddings as the basis
for clustering, producing unprecedented
speakerindependent singlechannel separation
performance on twospeaker and threespeaker
mixtures. In this framework, a neural
network is trained to assign an embedding vector
to each element of a multidimensional signal,
such that clustering the embeddings yields a
desired segmentation of the signal. In the
cocktailparty problem, the embeddings are
assigned to each timefrequency (TF) index of
the shorttime Fourier transform (STFT) of the
mixture of speech signals. Clustering these
embeddings yields an assignment of each TF bin
to one of the inferred sources. These
assignments are used as a masking function to
extract the dominant parts of each
source. This method has produced
remarkable performance, reaching levels of
improvement  over 10 dB SNR gain  that were
previously unobtainable even in simpler speech
enhancement tasks. Extensions to the model
allow for endtoend training to optimize the
whole architecture for best signal
quality. We also evaluate our method using
automatic speech recognition (ASR), and show
that it can reduce the word error rate (WER)
from 89.1% down to 30.8%, representing important
progress towards solving the cocktail party
problem.
November
30, 2016
121pm, 182 George St, Room 110
How
highorder image statistics shape cortical
visual processing
Jonathan Victor
Professor
Brain and Mind Research Institute
Department of Neurology
Cornell University
Several
decades of work have suggested that Barlow's
principle of efficient coding is a powerful
framework for understanding retinal design
principles. Whether a similar notion extends to
cortical visual processing is less clear, as
there is no "bottleneck" comparable to the optic
nerve, and much redundancy has already been
removed. Here, we present convergent
psychophysical and physiological evidence
that regularities of highorder image
statistics are indeed exploited by central
visual processing, and at a surprising level of
detail.
The starting point is a study of natural image
statistics (Tkacic et al., 2010), in which
we showed that highorder correlations in
certain specific spatial configurations
are informative, while highorder correlations
in other spatial configurations are
not: they can be accurately guessed from
lowerorder ones. We then construct
artificial images (visual textures) composed
either of informative or uninformative
correlations. We find that informative
highorder correlations are visually salient,
while the uninformative correlations are nearly
imperceptible. Physiological studies in macaque
visual cortex identify the locus of the
underlying computations. First, neuronal
responses in macaque V1 and V2 mirror the
psychophysical findings, in that many neurons
respond differentially to the informative
statistics, while few respond to the
uninformative ones. Moreover, the differential
responses largely arise in the supragranular
layers, indicating that the computations are
the result of intracortical processing.
We then consider low and highorder local image
statistics together, and apply a
dimensionreduction (binarization) to cast them
into a 10dimensional space. We determine the
perceptual isodiscrimination surfaces
within this space. These are
wellapproximated by ellipsoids, and the
principal axes of the ellipsoids correspond to
the distribution of the local statistics in
natural images. Interestingly, this
correspondence differs in specific ways from the
predictions of a model that implements efficient
coding in an unrestricted manner. These
deviations provide insights into the strategies
that underlie the representation of image
statistics..
Spring 2016
February
3, 2016
121pm, 182 George St, Room 110
Constructing
representations using Bayesian nonparametrics
and connections between human knowledge,
optimal foraging, and random walks on graphs
Joseph Austerweil
Assistant
Professor
Department of Cognitive, Linguistic, and
Psychological Sciences
Brown University
This talk has two parts (how much detail we go
into each will be guided by audience interest).
In the first part of the talk, I outline a
statistical framework for constructing
representations in a humanlike manner. People
still outperform computational methods on a
diverse set of problems, ranging from language
learning to recognizing objects in a scene. To
understand human success on these problems,
cognitive scientists appeal to representation as
a key explanatory device. From a statistical
perspective, representations can be interpreted
as structure within a probability distribution,
which provide biases to prevent overfitting.
Puzzlingly, people are remarkably flexible,
changing their representation of the same data
depending a number of different factors. I will
describe a statistical framework for this
intriguing phenomenon, which uses Bayesian
nonparametric processes.
In the second part of the talk, I describe
recent work relating how optimal animals forage
for food in a patchy environment to how people
search semantic memory. I will discuss how
representing semantic memory as a graph enables
random walks to mimic optimal foraging. I will
conclude with some recent (and ongoing) work
that exploits this relation to perform Bayesian
inference for a graph from the first hitting
times of multiple walks on that graph.
February
17, 2016
121pm, 182 George St, Room 110
New
applications and algorithms for submodular
probabilistic models
Stefanie Jegelka
Assistant
Professor
Department of Electrical Engineering and
Computer Science
Massachusetts Institute of Technology
Many realworld inference problems are, at their
core, subset selection problems. Probabilistic
models for such scenarios rely on having
sufficiently accurate yet tractable
distributions over discrete sets. We focus on
subfamilies of such distributions whose special
mathematical properties are the basis for fast
algorithms. As a specific example, Determinantal
Point Processes (DPPs) have recently become
popular in machine learning, as elegant and
tractable probabilistic models of diversity. We
explore new applications of DPPs for variational
inference over combinatorial objects, such as
coupled cascades in a collection of networks,
where we are able to leverage combinatorial and
convex structure in the problem.
While sampling from DPPs is possible in
polynomial time, the associated algorithm is not
practical for large data. In the second part of
the talk, I will outline ideas for faster
sampling that build on new insights for
algorithms that compute bilinear inverse forms.
These results have applications beyond DPPs,
including sensing with Gaussian Processes and
submodular maximization.
This is joint work with Chengtao Li, Josip
Djolonga, Suvrit Sra and Andreas Krause.
March
2, 2016
121pm, 182 George St, Room 110
The
shape space defined by the GromovWasserstein
distance
Facundo Memoli
Assistant
Professor
Department of Mathematics and Department of
Computer Science and Engineering The Ohio State
University
The GromovWasserstein distance a variant of
the GromovHausdorff distance based on ideas
from mass transport provides an intrinsic
metric on the collection of all metric measure
spaces. I will give an overview of its
construction, main properties, lower bounds, and
their computation.
March
16, 2016
121pm, 182 George St, Room 110
A
simple network model for a variety of Delay
Match to Sample tasks
Yali Amit
Professor
Department of Statistics and Department of
Computer Science
University of Chicago
Delay match to sample experiments (DMS) have
inspired much of the modeling work on attractor
neural networks. The basic experiment involves
showing a target image, removing it, and after a
delay showing a cue image: either the original
image or a different one. The monkey needs to
indicate if the cue is the same or different
than the target. Electrophysiological recordings
have shown that if the target is a learned one
(has been observed multiple times) neurons
selective for it maintain activity during the
delay between target and cue presentation. This
persistent activity is hypothesized to represent
'working' or 'short term' memory. The attractor
network model posits that stimulation with
learned patterns leads to sustained activity due
to the learned recurrent synaptic connections in
the network. There are a number of variations on
the basic DMS paradigm involving distractors in
between target and cue, or repetition detection
experiments where a sequence of images is shown
and one of them chosen at random is repeated.
These different experiments raise several
questions that are rarely addressed in the
literature. With distractors, how does the
network ensure the target sample pattern stays
in working memory? How does the network figure
out whether the cue matches the target? How does
the network avoid distractor repetitions?
Additional interesting phenomena have been
observed in more recent experiments on
repetition detection, including better
performance with novel patterns than with
learned patterns.
I will present a parsimonious network model of
binary neurons and binary synapses and show how
all these phenomena can be handled within this
framework, using simple adjustments of certain
global parameters such as inhibition, noise
level and depression rate. Recent experiments
also show that average responses to novel
stimuli in IT are higher than to learned
stimuli. I will show what adjustment needs to be
made to accommodate this phenomenon. Time
permitting I will discuss how these issues can
be reconciled in a two layer network that can
also perform classification on real data.
March
23, 2016
121pm, 182 George St, Room 110
The
Dynamics of the Unconscious Brain Under
General Anesthesia
Emery Brown
Warren
M. Zapol Professor of Anaesthesia
Massachusetts General Hospital and Harvard
Medical School
Edward Hood Taplin Professor of Medical
Engineering & Computational Neuroscience
Massachusetts Institute of Technology
General anesthesia is a druginduced, reversible
condition comprised of five behavioral states:
unconsciousness, amnesia (loss of memory),
analgesia (loss of pain sensation), akinesia
(immobility), and hemodynamic stability with
control of the stress response. Our work shows
that a primary mechanism through which
anesthetics create these altered states of
arousal is by initiating and maintaining highly
structured oscillations. We present findings
from our human studies of general anesthesia
using highdensity EEG recordings and
intracranial recordings which have allowed us to
give a detailed characterization of the
neurophysiology of loss and recovery of
consciousness due to propofol. We show how these
dynamics change with other anesthetics and with
age. We present a neurometabolic model of burst
suppression, the profound state of brain
inactivation seen in deep states of general
anesthesia. We use our characterization of burst
suppression to implement a closedloop
anesthesia delivery system for control of a
medicallyinduced coma. Finally, we demonstrate
that the state of general anesthesia can be
rapidly reversed by activating specific brain
circuits. The success of our research has
depended critically on tight coupling of
experiments, signal processing research and
mathematical modeling.
April
20, 2016
121pm, 182 George St, Room 110
Learning
Causal Graphical Models of LargeScale Systems
David Jensen
Professor
of Computer Science
Director of the Knowledge Discovery Laboratory
University of Massachusetts Amherst
Effective methods for inferring causal
dependence from observational data have been
developed within both computer science and
quantitative social science. Methods in computer
science have focused on the correspondence
between casual graphical models and observed
patterns of statistical association. Methods in
social science have focused on templates for
causal inference often called quasiexperimental
designs, including instrumental variables,
propensity scores matching, regression
discontinuinty designs, and interrupted
timeseries designs. Recent work has begun to
unify these methods using the framework of
causal graphical models, but barriers remain
because the formal framework of causal graphical
models has been insufficiently expressive to
represent some quasiexperimental designs. In
this talk, I will introduce many of the known
experimental and quasiexperimental designs in
the language of directed graphical models, and I
will present very recent work on defining
additional designs in terms of classes of
graphical models that can represent relational
and temporal dependence. Finally, I will present
two novel designs that have resulted from our
work on causal inference in relational data.
May
18, 2016
121pm, 182 George St, Room 110
Learning
Lattice Operators used in Computer Vision
Tasks
Junior Barrera
Professor
of Computer Science
University of Sao Paulo, Brazil
The
transformation of images by lattice operators
has some interesting properties: they keep in
the discrete and quantized image domain;
theoretical results guarantee that any
transformation can be implemented by a lattice
operator; lattice operators can be implemented
through simple elementary lattice operators;
lattice operators can be learned through
discrete machine learning techniques. In this
lecture, we present the general algebraic
representation of lattice operator and its
application in the learning of these operators.
Several approaches for efficient learning (i.e.,
from a relatively small sample in a reasonable
processing time) of complex operators are
presented. Between them is the Ucurve problem,
which is a new combinatorial optimization
problem useful for approaching the classical
feature selection problem. Another remarkable
result is the multiresolution learning, witch,
for a fixed sample, avoids over fitting and
gives decreasing estimated errors for an
increasing number of parameters. The techniques
are illustrated by the learning of several kinds
of binary and gray scale image transformations:
noise reduction; interpolation; enhancement;
segmentation; texture recognition; shape
recognition; etc. All the results
presented refer to already published papers, but
some open problems are pointed and discussed.
Fall 2015
September
9, 2015
121pm, 182 George St, Room 110
Random
matrix theory based analysis of the
correlation structure of protein sequences
Lucy Colwell
Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK
Extracting
interesting biological information from protein
sequences is a grand challenge, with the pace of
sequencing increasing by the day. There are many
problems in biology where correlations between
protein sequences could lead to important
information and new tools; my research program
aims to find new ways of using these
correlations to make biological discoveries.
Recently it has been shown by us and others that
the correlation structure of large protein
sequence alignments contains sufficient
information to accurately predict protein 3D
structure using approximate graphical modeling
and probabilistic inference. The methodology
requires inferring a large number of parameter
values from a dataset that is highly
undersampled, including samples with varying
degrees of correlation to one another, violating
the assumptions made by the inference methods.
For this approach to be useful more generally 
both for protein structure prediction and for
solving other biological problems  it is
essential that we understand how prediction
quality depends on the input data available for
the protein family of interest. This is because
those families for which no homologous crystal
structure has been solved typically contain few
sequences.
In this talk we first briefly compare the
different inference procedures on both exact
data simulated from models, and protein sequence
data. We then identify a phenomena for protein
sequence data in which iterating the model
inference procedure allows the inferred model to
converge to a more accurate solution, and show
how this can be used to accurately predict
interacting protein pairs from sequence data
alone. This convergence suggests that there is
scope to improve the accuracy of model
inference, for example by 'cleaning' the sample
correlation matrix. The standard random matrix
theory approach identifies those few eigenvalues
associated with a low rank signal by comparing
the spectrum of the sample covariance matrix to
the MarcenkoPastur distribution. In the context
of protein sequence data the extensive
correlations between samples mean that this
approach will not work, but if these
correlations are known (via a phylogeny) I will
show how the Marcenko Pastur equation can be
solved to yield the expected empirical spectral
distribution caused by phylogeny alone.
September
23, 2015
121pm, 182 George St, Room 110
Continuum
limit of total variation on point clouds
Nicolas GarciaTrillos
Prager Assistant Professor
Division of Applied Mathematics
Brown University
We consider point
clouds obtained as random samples of a measure
on a Euclidean domain. A graph representing the
point cloud is obtained by assigning weights to
edges based on the distance between the points
they connect. We study when is the cut
capacity, and more generally total variation, on
these graphs a good approximation of the
perimeter (total variation) in the continuum
setting. We address this question in the setting
of $\Gamma$convergence. Applications to the
study of consistency of cut based clustering
procedures will be discussed.
October
7, 2015
121pm, 182 George St, Room 110
Local
Shape from Shading with a Generic Constraint
Ben Kunsberg
Prager Assistant Professor
Division of Applied Mathematics
Brown University
Humans have a
remarkable ability to infer shape from shading
(SFS) information. In computer vision this is
often formulated with a Lambertian reflectance
function, but it remains underposed and
incompletely solved. Abstractly, the
intensity in an image is a single valued
function and the goal is to uncover the vector
valued normal function. This illposedness has
resulted in many proposed techniques that are
either regularizations or propagations from
known values. Our goal is to understand,
mathematically and computationally, how we solve
this problem.
First, it has been shown psychophysically that
our perception (via gauge figure estimates) is
remarkably accurate even when the boundary is
masked. Thus classical propagating
approaches requiring a known values along a
boundary, such as that of characteristic curves
or fast marching methods, are unlikely to
model the visual system's solution.
An alternative approach requires regularization
priors (in a Bayesian framework) or energy terms
(in a variational framework). However,
many of the proposed priors are adhoc and
chosen by researchers to optimize performance
for a particular test dataset. It is hard
to conclude (from solely performance metrics)
whether these priors are useful or accurate,
e.g. good results are functions of these
priors, resolution, the optimization techniques,
the test set, and so on.
In this talk, we describe a different approach.
We consider the SFS problem on image patches
modeled as Taylor polynomials of any order and
seek to recover a solution for that patch.
We build a bootstrapping tensor framework that
allows us to relate a smooth image patch to all
of the polynomial surface solutions (under any
light source). We then use a generic
constraint on the light source to restrict these
solutions to a 2D subspace, plus an unknown
rotation matrix. We then investigate
several special cases where the ambiguity
reduces and the solution can be anchored.
Interestingly, these anchor solutions relate to
those situations in which human performance is
also veridical.
October
14, 2015
121pm, 182 George St, Room 110
Importance sampling
in largescale machine learning problems: why
it works and how it can help
Rachel Ward
Assistant Professor
Department of Mathematics
University of Texas at Austin
A recent trend in signal
processing and machine learning research is
that exact reconstruction is achievable from
highly subsampled data by passing to
nonlinear, sparsityinducing, reconstruction
methods such as l1 minimization. Such
guarantees often require strong structural
conditions on the data in addition to
sparsity, such as incoherence, which render
the theory unusable on problems of practical
importance. Here, we show that many of
these strong assumptions are tied to i.i.d
uniform sampling, and can be dropped by
allowing weighted, or importance
sampling. First, we explain why
importance sampling works in this context: it
aims to make the inverse problem as
wellconditioned as possible given a fixed
sample budget. We then discuss several
problem domains where importance sampling
strategies can be derived explicitly, and
outperform stateoftheart sampling
strategies used in practice: medical imaging,
collaborative filtering, uncertainty
quantification, and stochastic gradient
methods. Along the way, we derive
results at the intersection of applied
harmonic analysis and random matrix theory
that are of independent interest.
October
28, 2015
121pm, 182 George St, Room 110
A
stochastic model for Gompertzian growth
Benar Svaiter
Professor
IMPA, Brazil
The Gompertz curve has been
successfully used for modeling mortality of an
aging population since its creation in the
1820's. In the 1960's a time reversed version
of this curve was used by Laird to model
tumoral and somatic growth. Since them, this
is the standard model for tumoral and somatic
growth and has been used also for modeling
regeneration. Why is it that a model for
mortality has been so apt to model growth? We
propose a stochastic model for cell divisions
based on recent findings on DNA structure and
show that Gompertzian growth is the
thermodynamic limit of this model after a
suitable renormalization.
November
18, 2015
121pm, 182 George St, Room 110
Big
data, Google and disease detection: the
statistical story
Samuel Kou
Professor
Department of Statistics
Harvard University
Big data collected
from the internet have generated significant
interest in not only the academic community but
also industry and government agencies. They
bring great potential in tracking and predicting
massive social activities. We focus on tracking
disease epidemics in this talk. We will discuss
the applications, in particular Google Flu
Trends, some of the fallacy and the statistical
implications. We will propose a new model that
utilizes publicly available online data to
estimate disease epidemics. Our model
outperforms all previous realtime tracking
models for influenza epidemics at the national
level of the US. We will also draw some lessons
for big data applications.
Spring 2015
February
4, 2015
121pm, 182 George St, Room 110
Computational reconstruction and modeling of
multicellular dynamics from 3D+time in vivo
imaging of animal early embryogenesis. Extension
to Artificial Life.
Rene Doursat
Research Scientist and Lecturer
(a) Complex Systems Institute, Paris IledeFrance
(ISCPIF), Paris, France
(b) BioEmergences Lab, CNRS (USR 3695),
GifsurYvette (Paris), France
(c) Erasmus Mundus Master's Program in Complex
Systems Science, Ecole Polytechnique, Palaiseau
(Paris), France
The BioEmergences platform [1,2], designed at
Nadine Peyrieras's lab (b), provides automated
analysis and reconstruction of collective cell
movements based on timelapse microscopy of
organism development. It offers biologists
advanced software tools capable of handling large
amounts of 4D data through a workflow of image
processing algorithms. Raw voxelbased movies of
embryos (zebrafish, sea urchin), containing
fluorescent proteins to highlight cell membranes
and nuclei, are filtered by an edgepreserving
smoothing method, then cell positions are
extracted from the local maxima of these images
and passed on to shape segmentation and
celltracking modules. The output is a cell
lineage tree annotated with various quantitative
measurements. In parallel to this data
reconstruction effort, the MecaGen platform [3],
directed by myself, supports agentbased modeling
and simulation of morphogenesis. Centered on the
physicochemical coupling of cell mechanics with
gene expression and molecular signaling, embryonic
development is viewed here a selforganized
phenomenon emerging from a myriad of cells via
their genetically regulated and regulating
behavior. Cells' mechanical properties (division,
adhesion, motility) are closely correlated with
their spatial location and temporal state of
genetic and molecular dynamics (protein and ligand
concentrations), and affect each other
concurrently. MecaGen is illustrated on
morphogenetic episodes occuring during zebrafish
and sea urchin early development. Exploration of
parameter space is supported by experimental data
from BioEmergences, which allows measuring the
"fitness" of the virtual embryo and validating
hypotheses. Finally, I will also briefly introduce
Morphogenetic Engineering [4], an "Artificial
Life" spinoff field that I founded, which takes
its inspiration from biological development to
create programmable and reproducible robotic,
software or network architectures by decentralized
selfassembly of elementary agents [5].
Selected References
[1] http://www2.die.upm.es/im/papers/Science2010.pdf
[2] http://doursat.free.fr/docs/Castro_et_al_2014_atlas_PLoS.pdf
[3] http://doursat.free.fr/docs/Delile_Doursat_Peyrieras_2013_MecaGen_chapter.pdf
[4] http://doursat.free.fr/docs/Doursat_Sayama_Michel_2013_MorphEng_NACO.pdf
[5] http://doursat.free.fr/docs/Doursat_Sanchez_2014_mapdevo_SoRo.pdf
.
February
25, 2015
121pm, 182 George St, Room 110
Scaling
and Generalizing Variational Inference
David Blei
Professor of Statistics and Computer Science
Columbia University
Latent variable models have become a key tool for
the modern statistician, letting us express
complex assumptions about the hidden structures
that underlie our data. Latent variable models
have been successfully applied in numerous fields
including natural language processing, computer
vision, electronic medical records, genetics,
neuroscience, astronomy, political science,
sociology, the digital humanities, and many
others.
The central computational problem in latent
variable modeling is posterior inference, the
problem of approximating the conditional
distribution of the latent variables given the
observations. Posterior inference is central to
both exploratory tasks, where we investigate
hidden structures that underlie our data, and
predictive tasks, where we use the inferred
structures to generalize about future data.
Approximate posterior inference algorithms have
revolutionized Bayesian statistics, revealing its
potential as a usable and generalpurpose language
for data analysis.
Bayesian statistics, however, has not yet reached
this potential. First, statisticians and
scientists regularly encounter massive data sets,
but existing approximate inference algorithms do
not scale well. Second, most approximate inference
algorithms are not generic; each must be adapted
to the specific model at hand. This often requires
significant modelspecific analysis, which
precludes us from easily exploring a variety of
models.
In this talk I will discuss our recent research on
addressing these two limitations. First I will
describe stochastic variational inference, an
approximate inference algorithm for handling
massive data sets. Stochastic inference is easily
applied to a large class of Bayesian models,
including timeseries models, factor models, and
Bayesian nonparametric models. I will demonstrate
its application to probabilistic topic models of
text conditioned on millions of articles.
Stochastic inference opens the door to scalable
Bayesian computation for modern data analysis.
Then I will discuss black box variational
inference. Black box inference is a generic
algorithm for approximating the posterior. We can
easily apply it to many models with little
modelspecific derivation and few restrictions on
their properties. Black box inference performs
better than similarly generic sampling algorithms,
such as MetropolisHastings inside Gibbs, and can
be composed with stochastic inference to handle
massive data. I will demonstrate its use on a
suite of nonconjugate models of longitudinal
healthcare data.
This is joint work based on these two papers:
M. Hoffman, D. Blei, J. Paisley, and C. Wang.
Stochastic variational inference. Journal of
Machine Learning Research, 14:13031347.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf
R. Ranganath, S. Gerrish, and D. Blei. Black box
variational inference. Artificial Intelligence and
Statistics, 2014.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf
March
4, 2015
121pm, 182 George St, Room 110
From
Pixels to Local Layers: Exploring Flexible
Representations for Motion Estimation
Deqing Sun
Postdoctoral Research Fellow in Computer Science
Harvard University
We live in a dynamic world where motion is
ubiquitous. To exist in such an environment,
robots and other intelligent agents should have
the ability to perceive and understand motion.
Estimating image motion and segmenting the scenes
into coherently moving regions are two closely
related problems but are often treated separately.
Motion actually provides an important cue to
identify surfaces in a scene, while segmentation
may provide the proper support for motion
estimation. Despite decades of research efforts,
current methods still tend to produce large errors
near motion boundaries and in occlusion regions
and falsely merge foreground objects with the
background.
I will start from a probabilistic layered model
for joint motion estimation and segmentation. We
order each moving object (layer) in depth and
explicitly model the occlusions between layers. We
model the segmentation using thresholded
spatiotemporally coherent support functions and
the motion using globally coherently but locally
flexible priors. Our model enforces that scene
structures (segmentation), instead of motion,
should persist over time. Our method achieves
promising results on both the Middlebury optical
flow benchmark and the MIT layer segmentation
dataset, particularly in occlusion regions.
Noting that "global" layered models cannot capture
mutual or selfocclusions or deal with too many
layers, I will introduce a local layering
representation that breaks the scenes into local
layers and jointly models the motion and occlusion
relationship between local layers. By retaining
uncertainty on both the motion and the occlusion
relationship, we can avoid local minima common to
motiononly or occlusiononly approaches. Our
method can handle motion and occlusion well for
both challenging synthetic and real sequences.
Finally, I will show an application of motion
estimation to interactive intrinsic video editing.
We introduce a fast and temporally consistent
algorithm to decompose video sequences into their
reflectance and illumination components. One key
observation is that reflectance is an intrinsic
property of physical surfaces and tend to persist
over time, while lighting may vary. The temporally
consistent decomposition results allow
illuminationaware video editing, such as
retexturing and lightingaware compositing.
Joint work with Michael J. Black, Erik B.
Sudderth, Ce Liu, Hanspeter Pfister, Nicolas
Bonneel, Kalyan Sunkavalli, James Tompkin, and
Sylvain Paris
March
18, 2015
121pm, 182 George St, Room 110
Using
evolutionary sequence variation to make
inferences about protein structure and
function
Lucy Colwell
Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK
The evolutionary
trajectory of a protein through sequence space
is constrained by its function. Collections of
sequence homologs record the outcomes of
millions of evolutionary experiments in which
the protein evolves according to these
constraints. The explosive growth in the number
of available protein sequences raises the
possibility of using the natural variation
present in homologous protein sequences to infer
these constraints and thus identify residues
that control different protein phenotypes.
Because in many cases phenotypic changes are
controlled by more than one amino acid, the
mutations that separate one phenotype from
another may not be independent, requiring us to
understand the correlation structure of the
data.
The challenge is to distinguish true
interactions from the noisy and undersampled
set of observed correlations in a large multiple
sequence alignment. To address this we build a
maximum entropy model of the protein sequence,
constrained by the statistics of the multiple
sequence alignment, to infer residue pair
interactions. We translate these interactions
into pairwise distance constraints between amino
acids and use them to generate all atom
structural models. Using proteins of known
structure we show that correlations between
amino acids at different sites in a protein
contain sufficient information to predict low
resolution tertiary protein structure of both
globular and transmembrane proteins. We then
apply our method to predict de novo the
structure of 11 medically important
transmembrane proteins of unknown structure. In
addition we are able to predict protein
quaternary structure and alternative
conformations. The next step requires
development of a theoretical inference framework
that enables the relationship between the amount
of available input data and the reliability of
structural predictions to be better understood.
April
1, 2015
121pm, 182 George St, Room 110
Adaptive
Bayesian Estimation of Conditional Densities
Andriy Norets
Associate Professor
Department of Economics
Brown University
Nonparametric
estimation of conditional distributions is
important in empirical work across many fields.
The Bayesian approach to this problem has
several attractive properties. It does not
require fixing a bandwidth or similar tuning
parameters. Instead, it provides estimates of
the objects of interest where the tuning
parameters are averaged out with respect to
their posterior distribution. Also, the Bayesian
approach performs well in outofsample
prediction and Monte Carlo exercises. In this
talk, I will discuss theoretical properties of
Bayesian nonparametric models and provide an
explanation for their excellent performance in
applications.
I will focus on mixtures of Gaussian densities
with covariate dependent mixing weights and a
variable number of mixture components for which
a prior on positive integers is specified.
Conditional on the number of mixture components,
the mixing weights are modelled by a multinomial
logit with a common scale parameter. This model
is closely related to mixtureofexperts also
known as smooth mixtures in econometrics. The
main theoretical result of the talk is that the
posterior in this model contracts at a minimax
rate up to a logarithmic factor. The assumed
prior distribution does not depend on the
smoothness level of the true conditional
density. Thus, the obtained posterior
contraction rate is adaptive across all
smoothness levels.
I will also briefly discuss some applications
and estimation methods for nonparametric
Bayesian modelling of conditional densities.
Background papers:
1. "Adaptive Bayesian Estimation of Conditional
Densities" by Norets and Pati,
http://arxiv.org/pdf/1408.5355.pdf
2. "Posterior Consistency in Conditional Density
Estimation by Covariate Dependent Mixtures" by
Norets and Pelenis, Econometric Theory, 2014
http://www.econ.brown.edu/fac/Andriy_Norets/papers/consmixreg.pdf
May
6, 2015
121pm, 182 George St, Room 110
Two
Recent Information Theoretic Variations on the
Theme of Patterns in Security
Muriel Medard
Cecil H. Green Professor
Electrical Engineering and Computer Science
Department
Massachusetts Institute of Technology
We overview two
different sets of results based upon the effect
of patterns in security. In the first part, we
consider limits of inference, a problem that
emerges when we seek to ascertain what limits to
privacy we can expect when machine learning
algorithms, whose theoretical basis often relies
on principal inertia components, are applied to
mining publicly available data that may be
related, in loosely known ways, to private data.
Lower bounds for the average probability of
error of estimating a hidden variable X given an
observation of a correlated random variable Y,
and Fano’s inequality in particular, play a
central role in information theory. We present a
lower bound for the average estimation error
based on the marginal distribution of X and the
principal inertias of the joint distribution
matrix of X and Y, providing thus limits to
privacy. Furthermore, we investigate how to
answer a fundamental question in inference and
privacy: given an observation Y, can we estimate
a function f(X) of the hidden random variable X
with an average error below a certain threshold?
We provide a general method for answering this
question using an approach based on
ratedistortion theory.
In the second part, we consider recent results
on guesswork, the characterization of the
process sequences such as passwords. We note
that, what may appear as being even slight
differences in distributions of these sequences
may lead to differences that are exponential in
guesswork, leading to possibly surprising
results, such as the failure of the oftassumed
uniformity of compressed sources, and the fact
that friendly jamming of an intended user may be
advantageous. We conclude with our
recently defined notion of inscrutability rate,
used to quantify the asymptotic difficulty of
guessing U out of V secret strings.
Unexpectedly, the inscrutability rate of any
finiteorder Markov stringsource with hidden
statistics remains the same as the unhidden
case, i.e., the asymptotic value of hiding the
statistics per each symbol is vanishing.
Joint work with Ahmad Beirami, Robert
Calderbank, Mark Christiansen, Ken Duffy, Flavio
du Pin Calmon, Stefano Tessaro, Mayank Varia
Fall 2014
October
1, 2014
121pm, 182 George St, Room 110
Statistical mechanics
for real biological networks
William Bialek
Professor
Department of Physics
Princeton University
It is an old dream that ideas from statistical
physics could help us understand the phenomena
that emerge from biological networks, be they
networks of genes, networks of neurons, or
networks of organisms. In recent
years, it has become possible to make increasingly
accurate, simultaneous measurements on the states
of (almost) all the nodes in such networks.
I'll discuss the efforts that my colleagues and I
are making to connect these data to statistical
physics models. The key idea is the (quite
old) maximum entropy principle: we try to
build models that are consistent with some
measured properties of the network (e.g., the
correlations among the states of pairs of
elements) but otherwise have as little structure
as possible. I will use the example of a
flock of birds to explain how this works, and to
explain our surprise that it works so
well. Statistical mechanics teaches us
that, as systems become large, the parameter space
breaks up into phases, and this also is true
for families of maximum entropy
models. Thus, we can ask where
real networks are in the phase diagram of possible
networks. For the flock, we'll see that the
system is poised very close to a critical
surface. We can go though a similar (but
much more complex) analysis for a network of
neurons in the vertebrate retina, and surprisingly
we find the same answer  the system seems to be
close to criticality, and we can detect hints of
criticality in other systems as well. It
seems that we are either seeing signs of something
general, or we are fooling ourselves, and I'll
outline a path to telling the difference.
October
8, 2014
121pm, 182 George St, Room 110
A
Simple PatternTheoretic Representation
Oren Freifeld
Postdoctoral Associate
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
Reasoning over
realworld signals consists of three conceptual
steps: representation, modeling and inference.
Here, we focus on a specific choice of
representation, but one that is strongly
motivated by modeling and inference
considerations.
The transformational approach, a cornerstone in
Ulf Grenander's Pattern Theory, replaces the
representation of objects with the
representation of transformations acting on the
objects. Particularly, diffeomorphisms, a
special class of transformations, often play a
pivotal role. Typically, the choice of a
diffeomorphism space involves a tradeoff: such
spaces range from ones that are simple (that is
a good thing) but of limited expressiveness, to
those that are highlyexpressive (i.e., are able
to capture a wide range of patterns) but are
complicated and computationallydemanding. Thus,
despite their mathematical elegance and
sophistication, the applicability of the latter
has been somewhat limited, especially for large
data sets or when time is of the essence.
Moreover, owing to their complexity, the
application of modern approaches to modeling
(e.g., Bayesian nonparametric models) and
inference (e.g., MCMC) to highlyexpressive
spaces of diffeomorphisms presents significant
mathematical and algorithmic challenges. This is
typified by the case of analysisbysynthesis
methods, wherein multiple evaluations of
likelihood functions (or cost functions in the
deterministic setting) rely on computing
multiple transformations.
In this talk I will present our recent,
workinprogress, efforts to facilitate
tractable modeling and inference for such
spaces. We propose a novel space of
transformations which is not only
highlyexpressive but also (embarrassingly)
simple. Moreover, it supports highlyaccurate
and fast computations of transformations thus
enabling rapid likelihood evaluations. Finally,
the representation also lends itself to
multiscale modeling and easy constructions and
evaluations of priors. While the
representation has certain interesting
differentialgeometric aspects, the talk will
focus on the practical issues of implementation
(requiring only basic ideas from linear algebra
and firstorder differential equations) and the
implications for modeling and inference.
Preliminary results demonstrate the
applicability of the proposed approach in
computervision tasks such as image warping or
registration, and machinelearning tasks such as
monotonic regression, timewarp analysis, and
modeling CDF or histogramvalued data.
This is joint work with Soren Hauberg, Julian
Straub, Kayhan Batmanghelich, and John Fisher.
October
22, 2014
121pm, 182 George St, Room 110
Algorithms
for Interpretable Machine Learning
Cynthia Rudin
Associate Professor of Statistics
Sloan School of Management and
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
Possibly *the* most
important obstacle in the deployment of
predictive models is the fact that humans simply
do not trust them. If it is known exactly which
variables were important for the prediction and
how they were combined, this information can be
very powerful in helping to convince people to
believe (or not believe) the prediction and make
the right decision. In this talk I will discuss
algorithms for making these nonblack box
predictions including:
1) "Bayesian Rule Lists"  This algorithm builds
a decision list using a probabilistic model over
permutations of IFTHEN rules. It competes with
the CART algorithm.
2) "Supersparse Linear Integer Models"  This
algorithm produces scoring systems, which are a
type of model that is widely used in the medical
community. It proposes an alternative to the
Lasso method.
I will show applications to healthcare,
including an alternative to the CHADS_2 score,
which is one of the most widely used scoring
systems in medicine. Our model was trained on
over 1000 times the amount of data as CHADS_2
and is more accurate, but is just as
interpretable. I will also show preliminary
joint work with the MGH Sleep Lab on diagnosing
forms of sleep apnea.
At the end of the talk, I will discuss random
cool applications of machine learning being
worked on in the Prediction Analysis Lab.
links:
Building Interpretable Classifiers with Rules
using Bayesian Analysis
http://web.mit.edu/rudin/www/LethamRuMcMa14.pdf
This is joint work with my student Ben Letham
and colleagues Tyler McCormick and David
Madigan.
Method and Models for Interpretable Linear
Classification
http://arxiv.org/abs/1405.4047
This is joint work with my student Berk Ustun.
Other links are on my webpage under
"Interpretable Predictive Models"
http://web.mit.edu/rudin/www/PredicticsPapers.html
November
5, 2014
121pm, 182 George St, Room 110
CrossStudy
Validation versus Randomized CrossValidation
as Data Accumulate
LoBin Chang
Visiting Assistant Professor
Department of Applied Mathematics and Statistics
& The Center for Imaging Science
Johns Hopkins University
In recent years
"reproducibility" has emerged as a key factor in
evaluating predictors of disease phenotypes. In
particular, "validation" is undermined when
error rates on data collected from new studies
exceed those originally reported, which is
hardly surprising for a heterogeneous
population. In this talk, I will provide a
statistical formulation in the large sample
limit: subpopulations are modeled as components
of a mixture and all error rates are optimal
(Bayes) for a twoclass problem. For any number
m of studies, the error rate in crossstudy
validation exceeds that in ordinary randomized
crossvalidation, the latter (averaged)
increases with m, and both converge to the
optimal rate.
November
12, 2014
121pm, 182 George St, Room 110
Multiscale
models for shapes and images
Pedro Felzenszwalb
Associate Professor
School of Engineering and Department of Computer
Science
Brown University
Markov models are widely used in a variety of
applications in many different fields. In
computer vision Markov models are commonly used as
priors for shapes and images. Markov models
capture regularities of objects such as sequences
or images by modeling their local
properties. This often leads to tractable
learning and inference problems. However it
is clear that in many applications realistic
priors need to capture nontrivial longrange
correlations and highorder properties of
objects. Such models can be difficult to
estimate and often lead to intractable inference
problems. In this talk I will discuss an
approach for defining models using a multiscale
representation of an object. The idea
involves modeling local properties of an object at
multiple resolutions. This leads to a
natural lowdimensional parameterization for
highorder models. The resulting models also
have structure that can lead to efficient
inference algorithms. I will discuss two
instances of the approach in the context of
computer vision. One instance involves
modeling curves for recognizing shapes and for
detecting shapes in images. Another instance
involves modeling images to recover hidden
structure from noisy measurements.
November
20, 2014 (Thursday)
Joint Pattern Theory / LCDS Seminar
45pm, Wilson Hall Room
102
Geometric
graphbased methods for high dimensional data
Andrea Bertozzi
Professor
Department of Mathematics
UCLA
We present new methods for segmentation of large
datasets with graph based structure. The method
combines ideas from classical nonlinear PDEbased
image segmentation with fast and accessible linear
algebra methods for computing information about
the spectrum of the graph Laplacian. The goal of
the algorithms is to solve semisupervised and
unsupervised graph cut optimization problems. I
will present results for image processing
applications such as image labeling and
hyperspectral video segmentation, and results from
machine learning and community detection in social
networks, including modularity optimization posed
as a graph total variation minimization problem.
December
3, 2014
121pm, 182 George St, Room 110
Hypothesisguided
dimensionality reduction and its application
to largescale neuroscience
John Cunningham
Assistant Professor
Department of Statistics
Columbia University
Like many fields, neuroscience is experiencing
dramatic increases in the quantity and complexity
of recorded data, with the goal of similarly
dramatic scientific and technological advances.
Critical to realizing these ambitions are the
analysis methods to interrogate large datasets
jointly. Many questions in the field reduce
to a question of "hypothesisguided"
dimensionality reduction: given a large number of
noisy timeseries, what shared lowdimensional
process best describes the data in terms of that
hypothesis? I will begin by briefly describing a
few applications of this dimensionality reduction
in neuroscience. I will then introduce a generic,
hypothesisguided linear dimensionality reduction
solver using an optimization program over the
Stiefel and Grassman manifolds. To demonstrate the
utility of this solver, I will apply it to neural
recordings from the motor cortex. Here we
show that the seemingly hopeless complexity of
neural responses can be simply described by a
lowdimensional dynamical system at the level of
the neural population. Finally, I will
describe a critical statistical test of the
significance of this hypothesis, verifying that
the inferred structure is not a trivial
consequence of highdimensional data.
http://stat.columbia.edu/~cunningham/
Spring 2014
February
12, 2014
121pm, 182 George St, Room 110
Challenging issues in
Likelihood Inference on Mixture Models
Daeyoung Kim
Assistant Professor
Department of Mathematics and Statistics
University of Massachusetts Amherst
Modern computing power and algorithms have greatly
increased interest in mixture models as an
effective tool for modeling heterogeneity in data.
Statistical inference for finite mixture models
can be done by the method of maximum likelihood
(ML) which provides, in a single package, methods
for point estimation, hypothesis testing and
construction of confidence sets. Despite
attractive features of finite mixture models,
there are several challenges for likelihood based
inference. In this talk we will address two
problems: empirical identifiability on the mixture
parameters and multiple local maximizers in the
mixture likelihood.
February
26, 2014
121pm, 182 George St, Room 110
Productivity and
Reuse in Language: Nonparametric Bayesian
Models of Lexical Acquisition
Timothy O'Donnell
Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology
A muchcelebrated aspect of language is the way in
which it allows us to express and comprehend an
unbounded number of thoughts. This property is
made possible because language consists of several
combinatorial systems which can productively build
novel forms using a large inventory of stored,
reusable parts: the lexicon.
For any given language, however, there are many
potentially storable units of structure, each
giving rise to many more ways of forming novel
expressions than are actually used in practice.
For example, English contains suffixes which are
highly productive and generalizable (e.g., ness;
LadyGagaesqueness, pinescentedness) and suffixes
which can only be reused in specific words, and
cannot be generalized (e.g., th; truth, width,
warmth). How are such differences in
generalizability and reusability represented? What
are the basic, stored building blocks at each
level of linguistic structure? When is productive
computation licensed and when is it not? How can
the child acquire these systems of knowledge?
I will discuss several mathematical models of
productivity (computation) and reuse (storage), at
different levels of linguistic structure. These
models all treat the the problem as a tradeoff
between storage of frequent (sub)structures and
the need to productively generalize to novel
cases. The computation/storage tradeoff provides
an illuminating new perspective on standard
Bayesian tradeoff between fit to data and
simplicity/generalizability of hypotheses. This
tradeoff is formalized in these models using tools
from Bayesian nonparametric statistics.
The distinction between rules and forms that
generalize and those that don't gives rise to
several deep questions about the nature of the
linguistic system. While the tradeoff between
productivity and reuse is consistent across
different levels of linguistic structure, and can
be modeled using Bayesian nonparametrics,
linguisticallyinformed representations at each
level are crucial for correctly capturing the
empirical patterns in the data. Working with such
rich, structured representations leads to
significant inference challenges, which I will
also briefly discuss.
March
12, 2014
121pm, 182 George St, Room 110
Robust inference on
parameters via particle filters and sandwich
covariance matrices
Neil Shephard
Professor
Departments of Economics and Statistics
Harvard University
Likelihood based estimation of the parameters of
state space models can be carried out via a
particle filter. In this paper we show how to make
valid inference on such parameters when the model
is incorrect. In particular we develop a
simulation strategy for computing sandwich
covariance matrices which can be used for
asymptotic likelihood based inference. These
methods are illustrated on some simulated data.
April
2, 2014
121pm, 182 George St, Room 110
Can connectomics help
us understand neural computation? Insights from
the fly visual system
Dmitri Chklovskii
Group Leader
Janelia Farm Research Campus
Animal behavior arises from computations in
neuronal circuits, but our understanding of these
computations has been frustrated by the lack of
detailed synaptic connection maps, or connectomes.
For example, despite intensive investigations over
half a century, the neuronal implementation of
local motion detection in the insect visual system
remains elusive. We developed a semiautomated
pipeline using electron microscopy to reconstruct
a connectome, containing 379 neurons and 8,637
chemical synaptic contacts, within the Drosophila
optic medulla. By matching reconstructed neurons
to examples from light microscopy, we assigned
neurons to cell types and assembled a connectome
of the repeating module of the medulla. Within
this module, we identified cell types constituting
a motion detection circuit, and showed that the
connections onto individual motionsensitive
neurons in this circuit were consistent with their
direction selectivity. Our identification of cell
types involved in motion detection allowed
targeting of extremely demanding
electrophysiological recordings by other labs.
Preliminary results from such recordings show time
delays confirming our findings. This demonstrates
that connectomes can provide key insights into
neuronal computations.
April
9, 2014
121pm, 182 George St, Room 110
Division of Applied Mathematics / Center for
Vision Research Seminar
The computational
magic of the ventral stream: a theory (and why
some deep architectures work)
Tomaso Poggio
Professor
NSF Center for Brains, Minds and Machines
McGovern Institute
Computer Science and Artificial Intelligence
Laboratory
Brain Sciences Department
Massachusetts Institute of Technology
Following on an idea of Stu Geman, I believe that
the present phase of Machine Learning is
characterized by supervised learning algorithms
relying on large sets of labeled examples (n
→ ∞). The next phase is likely to focus on
algorithms capable of learning from very few
labeled examples (n → 1), like humans seem
able to do. I will introduce a new approach to
this problem describing the underlying theory,
based on the unsupervised, automatic learning of a
"good" representation for supervised learning,
characterized by small sample complexity (n).
We consider the case of visual object recognition,
though the theory applies to other domains. The
starting point is the conjecture, that we can
prove in specific cases, that image
representations which are invariant to
translation, scaling and other transformations can
considerably reduce the sample complexity of
learning. We prove that an invariant and unique
(discriminative) signature can be computed for
each image patch. A module performing filtering
and pooling, like the simple and complex cells
described by Hubel and Wiesel, can compute such
estimates. The theory extends existing deep
learning convolutional architectures for image and
speech recognition. It also suggests that the main
computational goal of the ventral stream of visual
cortex is to provide a hierarchical representation
of new objects/images which is invariant to
transformations, stable, and discriminative for
recognition — and that this representation may be
continuously learned in an unsupervised way during
development and visual experience.
April
16, 2014
121pm, 182 George St, Room 110
The Blended
Paradigm: A Bayesian approach to handling
outliers and misspecified models
Steven MacEachern
Professor
Department of Statistics
Ohio State University
Bayesian methods have proven themselves to be
enormously successful across a wide range of
scientific problems, with analyses ranging from
the simple onesample problem to complicated
hierarchical models. They have many
welldocumented advantages over competing
methods. However, Bayesian methods run into
difficulties for two major and prevalent classes
of problemshandling data sets with outliers and
dealing with model misspecification. In both
cases, standard Bayesian analyses fall prey to the
hubris that is an integral part of the Bayesian
paradigm. The large sample behavior of the
analysis is driven by the likelihood. We
propose the use of restricted likelihood as a
single solution to both of these problems.
When working with restricted likelihood, we
summarize the data, x, through a set of
(insufficient) statistics T(x) and update our
prior distribution with the likelihood of T(x)
rather than the likelihood of x. By choice
of T(x), we retain the main benefits of Bayesian
methods while reducing the sensitivity of the
analysis to selected features of the data.
The talk will motivate the blended paradigm,
discuss properties of the method and choice of
T(x), cover the main computational strategies for
its implementation, and illustrate its benefits.
April
23, 2014
121pm, 182 George St, Room 110
Climate Informatics:
Recent Advances and Challenge Problems for
Machine Learning in Climate Science
Claire Monteleoni
Assistant Professor
Department of Computer Science
George Washington University
The threat of climate change is one of the
greatest challenges currently facing society.
Given the profound impact machine learning has
made on the natural sciences to which it has been
applied, such as the field of bioinformatics,
machine learning is poised to accelerate discovery
in climate science. Our recent progress on climate
informatics reveals that collaborations with
climate scientists also open interesting new
problems for machine learning. I will give an
overview of challenge problems in climate
informatics, and present recent work from my
research group in this nascent field.
A key problem in climate science is how to combine
the predictions of the multimodel ensemble of
global climate models that inform the
Intergovernmental Panel on Climate Change (IPCC).
I will present three approaches to this problem.
Our Tracking Climate Models (TCM) work
demonstrated the promise of an algorithm for
online learning with expert advice, for this task.
Given temperature predictions from 20 IPCC global
climate models, and over 100 years of historical
temperature data, TCM generated predictions that
tracked the changing sequence of which model
currently predicts best. On historical data, at
both annual and monthly timescales, and in future
simulations, TCM consistently outperformed the
average over climate models, the existing
benchmark in climate science, at both global and
continental scales. We then extended TCM to take
into account climate model predictions at higher
spatial resolutions, and to model geospatial
neighborhood influence between regions. Our second
algorithm enables neighborhood influence by
modifying the transition dynamics of the Hidden
Markov Model from which TCM is derived, allowing
the performance of spatial neighbors to influence
the temporal switching probabilities for the best
climate model at a given location. We recently
applied a third technique, sparse matrix
completion, in which we create a sparse
(incomplete) matrix from climate model predictions
and observed temperature data, and apply a matrix
completion algorithm to recover it, yielding
predictions of the unobserved temperatures.
April
30, 2014
121pm, 182 George St, Room 110
Division of Applied Mathematics / Center for
Vision Research Seminar
Computational
Anatomy, High Throughput NeuroImaging
Informatics, and the BrainCloud
Michael Miller
Professor
Biomedical Engineering and Electrical and Computer
Engineering
Johns Hopkins University
Brain parsing is a core technology in
Computational Anatomy supporting high throughput
indexing of brains. This is allowing us, with the
Susumu Mori laboratory, to build several brain
clouds in which brains can be searched, retrieved
and machine learning can be performed. We
will discuss the notions of brain parsing and
"geodesic positioning" central to Computational
Anatomy, allowing for the understanding of complex
medical imagery in terms of the neuroscience
ontology at 1mm scale.
The second half of the talk will focus on our
medical findings in two of the several
neurodegenerative diseases being studied
worldwide, dementia of the Alzheimer’s type, and
Huntington’s disease. We will describe our results
in preclinical dementia in the BIOCARD project led
by Marilyn Albert, attempting to characterize the
onset and trajectory of biomarker change in
temporal lobe structures  amygdala, hippocampus,
and entorhinal cortex – prior to conversion to
clinical Alzheimer’s symptomology. Secondly, we
will discuss results in Huntington’s disease in
the PREDICTHD project, led by University of Iowa,
examining motor and subcortical structure change
as a function of genetic load.
Fall 2013
September
11, 2013
121pm, 182 George St, Room 110
Parameter estimation
robust to lowfrequency contamination
Adam McCloskey
Assistant Professor
Department of Economics
Brown University
We provide methods to robustly estimate the
parameters of stationary ergodic shortmemory time
series models in the potential presence of
additive lowfrequency contamination. The types of
contamination covered include level shifts
(changes in mean) and monotone or smooth time
trends, both of which have been shown to bias
parameter estimates towards regions of persistence
in a variety of contexts. The estimators presented
here minimize trimmed frequency domain
quasimaximum likelihood (FDQML) objective
functions without requiring specification of the
lowfrequency contaminating component. We provide
two approaches, allowing for either thin or
heavytailed data. When proper sample
sizedependent trimmings are used, the FDQML
estimators are consistent and asymptotically
normal, asymptotically eliminating the presence of
any spurious persistence. These asymptotic results
also hold in the absence of additive lowfrequency
contamination, enabling the practitioner to
robustly estimate model parameters without prior
knowledge of whether contamination is present.
Popular time series models that fit into the
framework of this article include ARMA, stochastic
volatility, GARCH and ARCH models. We explore the
finite sample properties of the trimmed FDQML
estimators of the parameters of some of these
models, providing practical guidance on trimming
choice. Empirical estimation results suggest that
a large portion of the apparent persistence in
certain volatility time series may indeed be
spurious.
Joint work with Jonathan B. Hill (UNCChapel
Hill).
October
9, 2013
121pm, 182 George St, Room 110
Model selection in a
large compositional space
Roger Grosse
Massachusetts Institute of Technology
We often build complex probabilistic models by
"composing" simpler models  using one model to
generate the latent variables for another model.
This allows us to express complex distributions
over the observed data and to share statistical
structure between different parts of a model. I'll
present a space of matrix decomposition models
defined by the composition of a small number of
motifs of probabilistic modeling, including
clustering, low rank factorizations, and binary
latent factor models. This compositional structure
can be represented by a contextfree grammar whose
production rules correspond to these motifs. By
exploiting the structure of this grammar, we can
generically and efficiently infer latent
components and estimate predictive likelihood for
nearly 2500 model structures using a small toolbox
of reusable algorithms. Using a greedy search over
this grammar, we automatically choose the
decomposition structure from raw data by
evaluating only a small fraction of all models.
The proposed method typically finds the correct
structure for synthetic data and backs off
gracefully to simpler models under heavy noise. It
learns sensible structures for datasets as diverse
as image patches, motion capture, 20 Questions,
and U.S. Senate votes, all using exactly the same
code.
October
23, 2013
121pm, 182 George St, Room 110
Highdimensional
statistics
Sahand Negahban
Assistant Professor
Department of Statistics
Yale University
The focus of my research is to develop
theoretically sound methods, which are both
computationally and statistically efficient, for
extracting information from large datasets. A
salient feature of my work has been to understand
how hidden lowcomplexity structure in large
datasets can be used to develop computationally
and statistically efficient methods for extracting
meaningful information for highdimensional
estimation problems. My work borrows from and
improves upon tools of statistical signal
processing, machine learning, probability and
convex optimization.
November
6, 2013
121pm, 182 George St, Room 110
Dirichlet process
mixture inconsistency for the number of
components, and dimension mixture models
Jeffrey Miller
Brown University
Note: The first half will essentially be a
practice talk for my NIPS 2013 presentation.
For data assumed to come from a finite mixture
with an unknown number of components, it has
become common to use Dirichlet process mixtures
not only for density estimation, but also for
inferences about the number of components. The
typical approach is to use the posterior
distribution on the number of components occurring
so far, that is, the posterior on the number of
clusters in the observed data. However, it turns
out that this posterior is not consistent; it does
not converge to the true number of components.
Motivated by this finding, we examine an
alternative approach to Bayesian nonparametric
models. Many of the commonlyused nonparametric
models are infinitedimensional, such as Dirichlet
or PitmanYor process mixtures (DPMs/PYMs),
Hierarchical Dirichlet processes (HDPs), and
Indian buffet processes (IBPs). A less common but
very natural type of Bayesian nonparametric model
is constructed by taking a family of
finitedimensional models and putting a prior on
the dimension  that is, taking a mixture of
finitedimensional models. Interestingly, this
approach gives rise to combinatorial stochastic
processes that closely parallel those of
DPMs/PYMs, HDPs, and IBPs, and has certain
advantages over the infinitedimensional approach.
November
20, 2013
121pm, CIT 134
Interactive
Demonstrations in the Brown Robotics Lab
Chad Jenkins
Associate Professor
Department of Computer Science
Brown University
This meeting will be a tour of the Brown Robotics
Lab in the CIT Building. This tour will
consist of interactive demonstrations of the PR2
robot for perception and physical manipulation of
tabletop objects using data from visual
streams. This demo will be complemented with
a description of the underlying computational
methods, such as processing of point clouds from
RGBD data, object recognition, and adjustable
autonomy for humanrobot interaction. Time
permitting, demos of other robots, such as the
Rethink Baxter, Suitable Beam, and AR.Drone
quadrotor helicopter, will occur with a
description of their visionrelated technical
challenges.
December
4, 2013
121pm, 182 George St, Room 110
Big data in
neuroscience: Where is the information?
Joachim Buhmann
Professor
Department of Computer Science
ETH Zurich
Neuroscience provides data at very different
scales: microscopic imagery (ssTEM) measures
information on the structure of neurons, diffusion
tensor imaging tracks fibers between brain areas
and the hemispheres, fMRI data measures spatial
activity patterns to estimate dynamic causal
models. At all three length scales, machine
learning methods extract complex structures from
vast amount of data. Model selection plays an
important role in controlling the complexity of
the explanation. Furthermore, the amount of
supervision information is very scarce since human
annotators display a high error rate on these
ambiguous data sources. We advocate a maximum
entropy approach to interpret such a data source
and their model complexity is regularized by an
information theoretic model selection principle.
Spring 2013
January
30, 2013
121pm, 182 George St, Room 110
Composite Likelihood
Nancy Reid
University Professor
Department of Statistics
University of Toronto
In complex multivariate settings, it is sometimes
much simpler to construct a 'likelihood' from
lower dimensional marginal or conditional
distributions: the general construction was called
composite likelihood in Lindsay (1988). Under mild
conditions the resulting composite maximum
likelihood estimate is consistent and
asymptotically normal, with asymptotic variance of
the 'sandwich' form that also arises in
misspecified models. A recent workshop at the
Banff International Research Station on composite
likelihood had as one goal to identify important
directions for future research in the theory of
composite likelihood, in software development, and
in methodology for different areas of application.
I will give an overview of the progress made at
this workshop and highlight areas where more
research is needed.
February
20, 2013
121pm, 182 George St, Room 110
Nonstationary
Modeling Through Dimension Expansion
Luke Bornn
Assistant Professor
Department of Statistics
Harvard University
If atmospheric, agricultural, and other
environmental systems share one underlying theme
it is complex spatial structures, being influenced
by such features as topography and weather.
Ideally we might model these effects directly;
however, information on the underlying causes is
often not routinely available. Hence, when
modeling environmental systems there exists a need
for a class of spatial models which does not rely
on the assumption of stationarity.
In this talk, we propose a novel approach to
modeling nonstationary spatial fields. The
proposed method works by expanding the geographic
plane over which these processes evolve into
higher dimensional spaces, transforming and
clarifying complex patterns in the physical plane.
By combining aspects of multidimensional scaling,
group lasso, and latent variable models, a
dimensionally sparse projection is found in which
the originally nonstationary field exhibits
stationarity. Following a comparison with existing
methods in a simulated environment, dimension
expansion is studied on a classic testbed data
set historically used to study nonstationary
models. Following this, we explore the use of
dimension expansion in modeling air pollution in
the United Kingdom, a process known to be strongly
influenced by rural/urban effects, amongst others,
which gives rise to a nonstationary field.
February
27, 2013
121pm, 182 George St, Room 110
Bayesian model
sampling in reinforcement learning
Michael Littman
Professor
Department of Computer Science
Brown University
Reinforcement learning (RL) is a subfield of
machine learning concerned with decision makers
that adapt their behavior given utilitybased
feedback. While temporal difference
methods dominate the field, there are a growing
number of researchers examining Bayesian
statistics as a way of learning about unfamiliar
environments while modeling their own uncertainty
and addressing the classic
exploration/exploitation dilemma. I will
introduce the RL problem and provide some
background into existing Bayesian methods, as well
as presenting current efforts from our lab to
generalize these approaches to broader model
classes and more efficient computation.
March
20, 2013
121pm, 182 George St, Room 110
Sparse superposition
codes: Communication by regression
Andrew Barron
Professor
Department of Statistics
Yale University
A solution is presented to the problem of
provable, fast, reliable, capacityachieving
communications for the additive Gaussian noise
channel. This is an important real communications
problem for which the fundamental limits of
reliability were initially established in the
capacity theorem of Shannon in 1948. Not till the
1990s were practical highrate schemes developed
and these form an essential part of the cell phone
revolution. However, the demonstration of their
reliability is only empirical. The codes we
develop here are sparse superposition codes based
on a highdimensional regression model, and they
have fast encoders and decoders based on iterative
regression fits. In this presentation we describe
the framework permitting theoretical demonstration
of the desired properties. The error probability
is shown to scale favorably with the size of the
code, namely, the error probability is
exponentially small for every fixed communication
rate below capacity. Remaining challenges are
discussed in understanding how the complexity
scales when the rate approaches capacity. This
work is joint with Antony Joseph and Sanghee Cho.
April
3, 2013
121pm, 182 George St, Room 110
Learning to behave by
reading
Regina Barzilay
Associate Professor
Department of Electrical Engineering and Computer
Science
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
In this talk, I will address the problem of
grounding linguistic analysis in control
applications, such as game playing and robot
navigation. We assume access to natural language
documents that describe the desired behavior of a
control algorithm (such as game strategy guides).
Our goal is to demonstrate that knowledge
automatically extracted from such documents can
dramatically improve performance of the target
application. First, I will present a reinforcement
learning algorithm for learning to map natural
language instructions to executable actions. This
technique has enabled automation of tasks that
until now have required human participation  for
example, automatically configuring software by
consulting howto guides. Next, I will present a
MonteCarlo search algorithm for game playing that
incorporates information from game strategy
guides. In this framework, the task of text
interpretation is formulated as a probabilistic
model that is trained based on feedback from
MonteCarlo search. When applied to the
Civilization strategy game, a languageempowered
player outperforms its traditional counterpart by
a significant margin.
April
10, 2013
121pm, 182 George St, Room 110
Encoding binary
neural codes in networks of thresholdlinear
neurons
Carina Curto
Assistant Professor
Department of Mathematics
University of NebraskaLincoln (UNL)
Networks of neurons in the brain encode preferred
patterns of neural activity via their synaptic
connections. Despite receiving considerable
attention, the precise relationship between
network connectivity and encoded patterns is still
poorly understood. Here we consider this problem
for networks of thresholdlinear neurons whose
computational function is to learn and store a set
of binary patterns (e.g., a neural code) as
"permitted sets" of the network. We
introduce a simple Encoding Rule that selectively
turns "on" synapses between neurons that coappear
in one or more patterns. The rule uses
synapses that are binary, in the sense of having
only two states ("on" or "off"), but also
heterogeneous, with weights drawn from an
underlying synaptic strength matrix S. Our
main results precisely describe the stored
patterns that result from the Encoding Rule 
including unintended "spurious" states  and give
an explicit characterization of the dependence on
S. As a consequence, we find that certain neural
codes are "natural" in the context of these
networks; i.e., the structure of the code closely
matches the structure of emerging spurious states,
allowing the full code to be accurately learned
from a highly undersampled set of patterns.
Interestingly, many commonly observed neural codes
in cortical and hippocampal areas are natural in
this sense. As an application, we construct
networks that encode hippocampal place field codes
nearly exactly, following presentation of only a
small fraction of patterns. To obtain our results,
we prove new theorems using classical ideas from
convex and distance geometry, such as
CayleyMenger determinants, revealing a novel
connection between these areas of mathematics and
coding properties of neural networks.
This is joint work with Anda Degeratu and Vladimir
Itskov.
April
17, 2013
121pm, 182 George St, Room 110
Towards a general
theory of human learning and reasoning
Charles Kemp
Assistant Professor
Department of Psychology
Carnegie Mellon University
People learn and reason about animals, spatial
relations, kinsfolk, and many other domains, and
solve a broad range of inductive problems within
each of these domains. The full set of
domains and the full set of inductive problems
within these domains can be collectively described
as the conceptual universe. I will present a
systematic characterization of the conceptual
universe that helps to clarify the relationships
between familiar inductive problems such as
property induction, categorization, and stimulus
generalization, and that introduces new inductive
problems for psychological investigation. I will
illustrate the framework using case studies that
include behavioral and computational studies of
inductive reasoning, and a formal analysis of
kinship classification across cultures.
May
8, 2013
121pm, 182 George St, Room 110
Scaleinvariance and
metrics on spaces of plane curves
Matt Feiszli
The study of metrics and distances on spaces of
curves and surfaces is an active area of research
with applications in vision and imaging. We will
present metrics on spaces of plane curves using
techniques from harmonic, complex, and geometric
analysis. Their behavior can be best understood
through notion of scale, and we will discuss
geometric ways to understand their multiscale
behavior. We will discuss their application to
problems in shape deformation and shape matching,
and demonstrate scaleinvariant approaches to
curve matching on standard test datasets.
Fall 2012
October
3, 2012
121pm, 182 George St, Room 110
Recovering Large
Networks via Optimizing Nonlikelihood Functions
Rossi Luo
Assistant Professor
Department of Biostatistics and Center for
Statistical Sciences
Brown University
Graphical models are used to describe
the relationships between multiple variables, and
a class of undirected graphical models can be
uncovered from estimating covariance matrices.
Popular approaches include optimizing
(regularized) likelihood functions, especially in
the setting when the sample size is much smaller
than the number of variables. In this high
dimensional setting, I will describe a few
nonlikelihood optimization approaches to estimate
large covariance and inverse covariance matrices.
These approaches are based on exploring algebraic
properties of these matrices and low dimensional
assumptions, such as sparsity and low rank.
Convergence rates of these estimators are obtained
for finite samples, when the underlying
distribution has either exponential or polynomial
tails. All the proposed methods can be formulated
as convex optimization problems. A firstorder
algorithm is proposed to solve one of the
nonsmooth optimization problems, and its global
iterative complexity is also proved. Numerical
merits of these methods are demonstrated using
simulation and real datasets from brain sciences.
October
24, 2012
121pm, 182 George St, Room 110
Active Learning under
Margin Assumptions
Sivan Sabato
Postdoctoral Researcher
Microsoft Research New England
We derive and analyze a new,
efficient, poolbased active learning algorithm
for halfspaces. Most previous algorithms show
exponential improvement in the label complexity
assuming that the distribution over the instance
space is close to uniform. This assumption rarely
holds in practical applications. Instead, we study
the label complexity under a largemargin
assumptiona much more realistic condition, as
evident by the success of marginbased algorithms
such as SVM. Our algorithm is computationally
efficient and comes with formal guarantees on its
label complexity. The analysis is based on new
results for submodular optimization. The
algorithm also naturally extends to the
nonseparable case and to nonlinear kernels. We
further show experimentally that our approach
yields superior label complexity to previous
active learners on real problems.
Joint work with Alon Gonen and Shai
ShalevShwartz.
November
7, 2012
121pm, 182 George St, Room 110
Natural image
statistics and image restoration
Yair Weiss
Professor
School of Computer Science and Engineering
The Hebrew University of Jerusalem
Learning a statistical model of natural images is
a longstanding topic of research in disciplines
ranging from engineering to computational
neuroscience. Somewhat embarrassingly, however,
algorithms based on these statistical models do
not give stateof theart performance in image
restoration, and are outperformed by "block
matching" methods that do not have an explicit
probabilistic model. In this work, we show that
many widely used statistical models of images are
actually not very good density models and a
simple, unconstrained, Gaussian Mixture Model
(GMM) can give much higher likelihood to unseen
images. Using the learned GMM, we obtain
stateoftheart image restoration performance and
by examining what the GMM has learned we obtain
new insights into the statistics of natural images
that are not captured by most existing models.
Joint work with Daniel Zoran.
November
14, 2012
121pm, 182 George St, Room 110
Linking signaling
pathways and dynamic regulatory networks
Anthony Gitter
Postdoctoral Researcher
Microsoft Research New England
Adaptation to everchanging environmental
conditions is vital to the survival of all
organisms. In many stress responses upstream
proteins detect an environmental stimulus and
propagate signals to the nucleus where
transcription factors activate or inhibit genes.
Although these upstream proteins and temporal
changes in gene expression can be readily
observed, the pathways and transcription factors
involved remain hidden. We present a strategy for
integrating largescale data to discover the
signaling and transcriptional components of stress
response. Our model captures the dynamics of
transcription factor activity and the directed
pathways that activate these regulators, which are
inferred from an undirected protein interaction
network. Because our approach requires little
conditionspecific data, it is widely applicable
to many conditions and species. Joint work with
Miri Carmi, Naama Barkai, and Ziv BarJoseph.
Spring 2012
February
1, 2012
121pm, 182 George St, Room 110
Statistical analysis
of populations with interacting and interfering
units
Edo Airoldi
Assistant Professor
Department of Statistics, and FAS Center for
Systems Biology
Harvard University
A number of scientific endeavors of
current national and international interest
involve populations with interacting and/or
interfering units. In these problems, a collection
of partial measurements about patterns of
interaction and interference (e.g., social
structure and familial relations) is available, in
addition to the more traditional measurements
about unitlevel outcomes and covariates. Formal
statistical models for the analysis of this type
of data have emerged as a major topic of interest
in diverse areas of study. Probability models on
networks date back to 1959. Along with empirical
studies in social psychology and sociology from
the 1960s, these early works generated an active
community and a substantial literature in the
1970s. This effort moved into the statistical
literature in the late 1970s and 1980s, and the
past decade has seen a burgeoning literature in
statistical physics and computer science. The
growth of the World Wide Web and the emergence of
online social networking websites such as Facebook
and LinkedIn, and a host of more specialized
professional networking communities has
intensified interest in the study of networks,
structured measurements and interference. In this
talk, I will review a few ideas and open areas of
research that are central to this burgeoning
literature, placing emphasis on inference and
other core statistical issues. Topics include
elements of sampling and inference from
nonignorable (network sampling) designs, and
semiparametric modeling, with hints to the
applications to social, biological and information
networks that motivate these statistical problems.
February
8, 2012
121pm, 182 George St, Room 110
Exploiting Sparse
Structure by Spectral Connectivity Analysis
Ann Lee
Associate Professor
Department of Statistics
Carnegie Mellon University
For naturally occurring data, the
dimension of the given input space is often very
large while the data themselves have a low
intrinsic dimensionality. Spectral kernel methods
are nonlinear techniques for transforming data
into a coordinate system that efficiently reveals
the underlying structure  in particular, the
"connectivity"  of the data. In this talk, I
will focus on one particular technique 
diffusion maps  but the analysis can be used for
other spectral methods as well. I will give
examples of various applications of the method in
highdimensional regression and density
estimation. I will also present a new extension of
the diffusion framework to comparing distributions
in highdimensional spaces with an application to
image retrieval and texture discrimination. (Part
of this work is joint with R.R. Coifman, D. Liu,
C. Schafer and L. Wasserman.)
March
21, 2012
121pm, 182 George St, Room 110
The tree of life and
the evolution of genome function
Casey Dunn
Manning Assistant Professor
Ecology and Evolutionary Biology
Brown University
There are estimated to be about 10
million species on the planet, though much of what
we know about genome function is the result of
work in about 10 species  the laboratory model
organisms. Work on model organisms has been
extraordinarily productive over the past 100
years, but our focus on such a small number of
organisms means we have only looked through a
keyhole at the diversity genome function in life
on earth. Many interesting biological phenomena
remain out of reach, we know little about gene
function in the context of macroevolution and
ecosystem interactions, and we have only observed
gene function within narrow environmental
conditions. We are now on the cusp of being able
to do functional genomics research in the other 10
million species on the plant, right in their
natural environment without growing them in the
lab. But to do so will require a complete
retooling of data acquisition, entirely new
analysis approaches, and a shift in perspective
that embraces differences between species rather
than obsess over global invariants. Here I outline
some of these challenges, including the new
analysis challenges we face.
March
28, 2012
121pm, 182 George St, Room 110
What is the neural
foundation of the vision process?
Christoph von der Malsburg
Senior Fellow
Frankfurt Institute of Advanced Studies
The current consensus version of the neural
basis of vision  or of cognition in general 
makes it difficult to model some of the central
operations to be performed during perception and
learning. I will offer the dynamic link
architecture as an alternative physical basis for
cognitive processes, will discuss it as a natural
framework for vision and other cognitive
processes, and will illustrate and support the
argument with the help of a concrete model of
invariant object recognition.
April
18, 2012
121pm, 182 George St, Room 110
LP Relaxations for
Global Models in Natural Language Processing
Sebastian Riedel
Research Scientist
University of Massachusetts Amherst
Much work in NLP relies on pipelines of
classifiers. For example, when mining biomedical
text we usually first extract proteins and other
entities, and then determine their relations to
each other. Such pipelines suffer from
cascading errors: once an upstream processor errs,
downstream processors often cannot recover. This
has led to work that tackles NLP problems in a
more global and joint manner.
In this talk I will first present our work on
addressing biomedical event extraction as global
optimization problem. This approach achieved
stateoftheart results on several benchmark
datasets, and ranked first in the latest
international competition on this task. I will
illustrate the enabling technology behind our
global approach: Linear Programming (LP)
relaxations of the global optimization problem. In
particular, I will show how we divide event
extraction into tractable subproblems and apply
dual decomposition to efficiently find LP optima.
In the second part of my talk I will illustrate
our ongoing work on addressing the computational
challenge that arises from a superlinear number of
variables in our LP relaxations. I will show how
we use row and column generation to reduce runtime
of a global dependency parsing model by 8 times,
without loss of optimality guarantees. Our method
relies on upper bounds for the reduced costs of
variables which we derive from "grammar constants"
of our probabilistic model.
April
25, 2012
121pm, 182 George St, Room 110
Hallucinating scene
detail and recognizing scene attributes
James Hays
Assistant Professor
Department of Computer Science
Brown University
In this talk I will discuss two recent
projects. First I will present a massively data
driven approach to superresolution. In contrast
to recent image enhancement literature, we show
that image databases can be more useful than
internal image statistics if the right
representations are used. Our superresolution
method can insert scenespecific textures and
transitions beyond the capabilities of prior work.
Second, I will introduce the new SUN Attribute
Database. We use crowdsourcing to discover a
taxonomy of attributes and label 14,000 images
from 700 scene categories with more than one
hundred attributes. We show the first results for
attribute recognition and argue that attributes
are a natural way to describe the space of scenes.
May
9, 2012
121pm, 182 George St, Room 110
Computational
Challenges in Molecular Medicine
Donald Geman
Professor
Department of Applied Mathematics and Statistics
The Johns Hopkins University
Despite the "omics" revolution, advances in
clinical practice through fundamental research in
systems biology are rare. First, I will talk
about the technological, mathematical and
translational barriers encountered in attempting
to apply statistical learning to cancer genomics
using highthroughput transcriptional data.
Then I will propose a methodology based on
comparisons of molecular counts, and argue that
these are easy to interpret, can account for
combinatorial interactions among genes and gene
products and might even support a mechanistic
interpretation for the underlying decision
rules. These ideas will be illustrating by
predicting disease phenotypes, the site of origin
of metastatic tumors and pathway deregulation.
May
16, 2012
121pm, Metcalf Auditorium (101)
Division of Applied Mathematics & Center
for Vision Research Seminar
Lessons from
Photographing and Identifying the Word's Plant
Species
Peter Belhumeur
Professor
Department of Computer Science
Columbia University
Columbia University, the University of
Maryland, and the Smithsonian Institution are
working on visual recognition software to help
identify species from photographs. I will discuss
our work on developing Leafsnap  the first in a
series of electronic field guides. As part of this
work, we have completed photographing close to one
third of the world's plant species and have begun
capturing beautiful highresolution images of live
specimens. Our work has led us in new research
directions for the visual recognition of human
faces, dog breeds, and bird species, including the
adoption of centuriesold techniques from taxonomy
for the process of labeling images with visual
attributes and object parts. In particular, I will
show that it is possible to automatically
recognize a wide range of visual attributes and
parts in images and use them in numerous
applications of computer vision.
Fall 2011
September
14, 2011
121pm, 182 George St, Room 110
Nonparametric
Priors for Segmentation of Medical Images
Polina Golland
Associate Professor
Department of Electrical Engineering and Computer
Science, and CSAIL
Massachusetts Institute of Technology
We propose a nonparametric
probabilistic model for prior knowledge in
segmentation of medical images. The resulting
inference algorithms register individual training
images to the new image, transfer the segmentation
labels and fuse them to obtain the final
segmentation of the test subject. Our generative
model yields previously proposed label fusion
algorithms as special cases, and also leads to a
new variant that aggregates evidence for the
segmentation label locally. We demonstrate the
advantages of our approach in two clinical
applications: segmentation of neuroanatomical
structures and segmentation of the left heart
atrium whose shape varies significantly across the
population.
October
5, 2011
121pm, 182 George St, Room 110
Dynamic
regulation of decision threshold by frontal
cortex and basal ganglia
Michael Frank
Associate Professor
Department of Cognitive, Linguistic, &
Psychological Sciences
Department of Psychiatry and Human Behavior
Brown University
The basal ganglia and frontal cortex
interact to support rewardbased decision making.
While medial prefrontal cortex (mPFC) is
implicated in processing uncertainty and
reinforcement conflict, the subthalamic
nucleus (STN) is thought to act as a brake on
corticostriatal function, preventing premature
impulsive responding. Simulations using a neural
circuit model combined with drift diffusion model
analysis show that the STN acts to temporarily
increase the decision threshold as a function of
conflicting cortical decision plans, and
that this same decision threshold modulation
accounts for human behavioral data. We tested
these posited neural mechanisms in experiments
with electrophysiological recordings over mPFC and
in the STN (intraoperative recordings from
Parkinson's patients undergoing deep brain
stimulation surgery). We show a relationship
between mPFC activity and decision threshold
adjustment which is altered by STN
manipulation. These data support the
hypothesis that frontalbasal ganglia
communication regulates decision processes as a
function of reinforcement conflict.
October
12, 2011
121pm, 182 George St, Room 110
Spectral
Methods for Learning Graphical Models
Sham Kakade
Senior Research Scientist
Microsoft Research New England
Associate Professor of Statistics
The Wharton School, University of Pennsylvania
This work presents a methodology for
learning graphical models with hidden nodes
through algebraic techniques (in particular,
matrix decomposition and spectral methods), using
independent samples of the observed variables. The
talk focuses on tree models, and covers two
aspects of the underlying learning problem:
parameter estimation and structural learning. The
underlying idea is to utilize the spectral
decomposition of the second moment matrix to
reveal the latent structure.
The first part is concerned with parameter
estimation. Here, we present an an efficient and
provably correct algorithm for learning HMMs (i.e.
recovering the correct HMM dynamics), with a
sample complexity depending on some mild
conditions of the underlying system. The algorithm
is also simple, employing only a singular value
decomposition and matrix multiplications, and does
not suffer from local minimum issues in nonconvex
optimization (such as for more traditional
approaches, including the EM algorithm), and it
handles high dimensional observations and long
range dependencies more easily. The method can be
extended to estimating parameters for nonlinear
systems and general tree structured graphical
models with unobserved nodes.
The second part is concerned with structural
learning, where we provide the Spectral Recursive
Grouping algorithm, an efficient and simple
procedure for recovering the underlying tree
topology of a broad class of multivariate tree
models with hidden nodes. Exact recovery of the
tree structure can be established based on certain
natural dependencies on statistical and structural
properties of the underlying joint distribution.
Join work with: Daniel Hsu, Tong Zhang; Anima
Anandkumar, Kamalika Chaudhuri, Le Song
October
19, 2011
121pm, 182 George St, Room 110
Statistical
challenges in neural data analysis
Liam Paninski
Associate Professor
Department of Statistics
Columbia University
Systems and circuit neuroscience have
recently experienced something of a renaissance,
driven by a remarkable explosion in the
development of groundbreaking new experimental
tools. These methods have in turn opened up a
variety of challenging statistical problems, in
which we must often perform some highdimensional
inference under strong computational constraints
(for example, in some cases realtime processing
is required). This talk will review some recent
progress on three exemplary problems, each of
fundamental neuroscientific importance: 1) Optimal
filtering and smoothing of voltage signals on
large, complex dendritic trees; 2) Optimal
decoding of sensory information from the activity
of large neural populations, and 3) Inference of
connectivity in large neuronal networks given
limited, noisy observations.
November
2, 2011
121pm, Barus & Holley Room 190
Perceptual
Fragments: BottomUp and TopDown Use of Shape
in Object Recognition
Benjamin Kimia
Professor
Division of Engineering
Brown University
The bottomup “segmentation followed
by recognition” strategy has for some time now
given way to featurebased discriminative
recognition with significant success. As the
number of categories and exemplars per category
increases, however, lowlevel features are no
longer sufficiently discriminative, motivating the
construction and use of more complex features. It
is argued here that these complex features will
necessarily be encoding shape and this in turn
requires curves and regions, thus reviving aspects
of bottomup segmentation strategies. We suggest
that the demise of segmentation was due to
prematurely committing to a grouping in face of
ambiguities and propose a framework for
representing multiple grouping options in a
containment graph. Specifically, we use contour
symmetry to partition the image into atomic
fragments and define transforms to iteratively
grow these atomic fragments into mote distinctive
perceptual fragments, the nodes of the containment
graph. We also briefly present a fragmentbased
language for generating shapes and the use of
fragments in topdown category recognition. The
bottomup and topdown processes are then
integrated by interaction through the midlevel
representation of perceptual fragments.
November
30, 2011
121pm, 182 George St, Room 110
Exploring the role of
ventral premotor cortex in reachtograsp
movements: neural trajectories through spike
train similarity space
Carlos VargasIrwin
Postdoctoral Research Associate in
Neuroscience
Brown University
Dimensionality reduction applied to
neural ensemble data has led to the concept of a
'neural trajectory', a lowdimensional
representation of how the state of the network
evolves over time. Here we present a novel neural
trajectory extraction algorithm which combines
spike train distance metrics (Victor and
Purpura, 1996) with dimensionality reduction based
on local neighborhood statistics (van der Maaten
and Hinton, 2008.) . We apply this technique to
describe and quantify the activity of primate
ventral premotor cortex neuronal ensebles in the
context of a cued reaching and grasping task with
instructed delay.
References:
Victor and Purpura. Nature and precision of
temporal coding in visual cortex: a metricspace
analysis. J Neurophysiol. 1996 Aug;76(2):131026.
L.J.P. van der Maaten and G.E. Hinton. Visualizing
Data using tSNE. Journal of Machine Learning
Research, 9(Nov):2579–2605, 2008.
20102011
October
6,
2010
121pm, CIT 241 (Swig
Boardroom) *** change in location
***
Learning
Probabilistic Models with Deep Hierarchical
Structures
Ruslan Salakhutdinov
Postdoctoral Fellow
Department of Brain and Cognitive Sciences and
CSAIL
Massachusetts Institute of Technology
Building intelligent systems that are capable of
extracting higherorder knowledge from
highdimensional data and successfully
transferring that knowledge to learning new
concepts lies at the core of solving many AI
related tasks, including object recognition,
speech perception, and language understanding.
Theoretical and biological arguments strongly
suggest that building such systems requires models
with deep architectures that involve many layers
of nonlinear processing.
In this talk I will first introduce a broad class
of probabilistic generative models called Deep
Boltzmann Machines (DBMs) that contain many layers
of latent variables. I will describe a new
learning algorithm for this class of models that
uses variational methods and Markov chain Monte
Carlo (MCMC). This new learning algorithm, in
addition to a bottomup pass, can incorporate
topdown feedback, which allows DBMs to better
propagate uncertainty about ambiguous inputs. I
will show that these deep models can learn
interesting representations and can be
successfully applied in many application domains,
including information retrieval, object
recognition, and nonlinear dimensionality
reduction. In the second part of the talk, I will
describe new ways of developing more complex
systems that combine Deep Boltzmann Machines with
more structured hierarchical Bayesian models. I
will show how these hybrid models can learn a deep
hierarchical structure for sharing knowledge
across hundreds of visual categories, which allows
efficient learning of new categories from few,
even just one, examples  a problem known as
'oneshot learning'.
October
20, 2010
121pm, 182 George St, Room 110
Discovering
Influential Variables: A Partition Retention
Approach
Herman Chernoff
Professor Emeritus
Department of Statistics, Harvard University
Department of Applied Mathematics, MIT
A general approach, pioneered by S.H. Lo and T.
Zheng, the method of Partition Retention, is
designed to identify those variables among many
possible explanatory variables, which have an
influence on a dependent variable. This method can
be successful when the influence depends on the
interaction of several of these variables and the
marginal influences of the individual variables
are negligible. In those cases where the
number of candidates is huge, thousands, this
method involves the idea of resuscitating
apparently "dead" or useless variables which
depend on interactions. The method is
computer intensive and involves measuring
influence of a small group of candidates, from
which the least influential are dropped one at a
time until only promising ones are retained.
This procedure is repeated many times with small
randomly selected subgroups of the candidate
variables, and attention is focused on those which
are retained most often.
Wednesday,
November 3, 2010
121pm, 182 George St, Room 110
Bottomup and
topdown processing in visual perception
Thomas Serre
Assistant Professor
Department of Cognitive, Linguistic, &
Psychological Sciences
Brown University
Perception involves a complex interaction between
feedforward sensorydriven information and
feedback attentional, memory, and executive
processes that modulate such feedforward
processing. A mechanistic understanding of
feedforward and feedback integration is a
necessary step towards elucidating key aspects of
visual and cognitive functions and dysfunctions.
In this talk, I will describe a computational
framework for the study of visual perception. I
will present computational as well as experimental
evidence suggesting that bottomup and topdown
processes make a distinct and essential
contribution to the recognition of complex visual
scenes. A feedforward hierarchical architecture
may provide a satisfactory account of "immediate
recognition" corresponding to the first few
hundred milliseconds of visual processing.
However, such an architecture may be limited in
recognizing complex visual scenes. I will show how
attentional mechanisms and cortical feedback may
help improve object recognition performance in
complex cluttered scenes.
Wednesday,
November 17, 2010
121pm, 182 George St, Room 110
Point process
adaptive filters and the analysis of ensemble
neural spiking activity
Uri Eden
Assistant Professor
Department of Mathematics and Statistics
Boston University
Although it is well known that brain areas
receive, process and transmit information via
sequences of sudden, stereotyped electrical
impulses, called action potentials or spikes, most
analyses of neural data ignore the localized
nature of these events. The theory of point
processes offers a unified approach to modeling
the firing properties of spiking neural systems,
and assessing goodnessoffit between a neural
model and observed spiking data.
We develop a point process modeling framework and
state space estimation algorithms to describe and
track the evolution of dynamic representations
from large neural ensembles. This allows us
to derive a toolbox of estimation algorithms and
adaptive filters to address questions of static
and dynamic encoding and decoding. In our
analysis of these filtering algorithms, we draw
analogies to wellstudied linear estimation
algorithms for continuous valued processes, such
as the Kalman filter and its discrete and
continuous time extensions.
These methods will be illustrated in the analysis
of spatially specific spiking activity in rat
hippocampus. Using simple point process models, we
are able to accurately characterize the localized
spiking activity of these neurons as a function of
the animal's position in its environment, track
changes in their firing properties, reconstruct
the animal's past movements, and predict its
future behavior from a population of spiking
neurons.
February 9, 2011
121pm, 182 George St, Rm 110
Division of Applied Mathematics & Center
for Vision Research Seminar
Learning on
Analytic Manifolds
Fatih Porikli
Senior Principal Research Scientist and
Technical Manager, Imaging Group
Mitsubishi Electric Research Labs, Cambridge, MA
A large number
of natural phenomena can be formulated on
analytic manifolds. More specifically in
computer vision, such underlying notions emerge
in multifactor analysis including feature
selection, pose estimation, structure from
motion, appearance tracking, and shape
embedding. Unlike Euclidean spaces, analytic
manifolds does not exhibit global homeomorphism,
thus, differential geometry is applicable only
locally. This prevents application of
conventional inference and learning methods,
which require vector norms.
Recently we introduced appearance based object
descriptors and motion transformations that
exhibit Riemannian manifold structure on
positive definite matrices that enable
projections of the original problems onto
tangent spaces. In this manner, we do not need
to flatten the underlying manifold or discover
its topology. This talk will demonstrate
entertaining results of manifold learning on
human detection, regression tracking, unusual
event analysis, and affine pose estimation.
February 16,
2011
121pm, 182 George St, Rm 110
Patterns of
Thought (in humanoid robots)
Yiannis Aloimonos
Professor
Department of Computer Science
University of Maryland
Patterns of
thought, for a cognitive system that interprets
a scene or performs a task, consist of
perceptual, motor and linguistic representations
of the world, controlled by an attention
mechanism. The role of the attention
mechanism is to facilitate parsing of the
different representations according to a grammar
of action. In this talk I will introduce,
for the first time, an action grammar. It
is really a disguised version of Chomsky’s
Universal Grammar adapted in a minimalist
framework. Parsing requires an attention
mechanism comprising visual human and object
filters, active segmentation routines and a
context implementing system utilizing
language. I will describe our developments
in learning visual filters, achieving robust
segmentation techniques and integrating vision
with language and our implementations of these
ideas in the field of Humanoid Robotics and the
general area of video interpretation.
March 9, 2011
121pm, 182 George St, Rm 110
A history of
applying principal component analyses to human
population genetic data
Sohini
Ramachandran
Assistant Professor
Department of Ecology and Evolutionary Biology,
and
Center for Computational Molecular Biology
Brown University
Nearly 30 years
ago, Luca CavalliSforza and colleagues
pioneered the use of PCA in population genetics,
producing maps that summarized observed human
genetic variation across many continental
regions. Patterns in these maps were
interpreted as signatures of specific migration
events. As genetic datasets have grown in
the number of markers used and as sequencing
technology has improved, PCA has been widely
applied to population genetic datasets. I
will present a review of how PCA is used in
population genetics and discuss the conclusions
that have been drawn via this method, as well as
other observed genetic patterns in geographic
space in my research on both the inference of
past migrations using extant human genetic data
and the genetic underpinnings of disease.
March 16, 2011
121pm, 182 George St, Rm 110
Solving
Inference, Optimization, and Constraint
Satisfaction Problems with the Divide &
Concur and Belief Propagation MessagePassing
Algorithms
Jonathan Yedidia
Distinguished Member Research Staff
Mitsubishi Electric Research Labs,
Cambridge, MA
In
"probabilistic inference" problems one tries to
estimate the true state of some quantities of
interest, given only noisy or ambiguous sensor
measurements of related quantities. Such
problems arise in a very wide variety of fields,
with applications in communications, signal
processing, computer vision, speech and audio
recognition, machine learning, physics, and
robotics.
In this talk, I will describe and compare two
particularly important algorithms that can solve
probabilistic inference problems, as well as
related constraint satisfaction and optimization
problems: the celebrated "belief propagation"
(BP) algorithm; and the "divide and concur"
(D&C) algorithm. I will show that the
D&C algorithm, which was recently introduced
by Gravel and Elser for solving nonconvex
constraint satisfaction problems, can also be
understood as an messagepassing algorithm that
can be used on an even wider class of inference
and optimization problems than BP.
Although less wellknown, the D&C algorithm
has some notable advantages compared with BP, in
that it more naturally deals with problems with
continuous valued variables, or with variables
that lack local evidence. Another
advantage of D&C is that its
"differencemap" dynamics enables it to avoid
"traps."
If time permits, I will also describe a new
decoder (developed with Yige Wang and Stark
Draper) for lowdensity parity check codes that
combines the ideas of the D&C and BP
algorithms. This "differencemap belief
propagation" (DMBP) decoder significantly
improves errorfloor performance compared with
stateoftheart BP decoders, while maintaining
a similar computational complexity.
April 6, 2011
121pm, 182 George St, Rm 110
Division of Applied Mathematics & Center
for Vision Research Seminar
A
HighThroughput Screening Approach to
BiologicallyInspired Object Recognition
David Cox
Principal Investigator, Visual Neuroscience
Laboratory
The Rowland Institute at Harvard
Biological
visual systems are currently unrivaled by
artificial systems in their ability to recognize
faces and objects in highly variable and
cluttered realworld environments.
Biologicallyinspired computer vision systems
seek to capture key aspects of the computational
architecture of the brain, and such approaches
have proven successful across a range of
standard object and face recognition
tasks. However, while many models of biological
object recognition share a common set of
"broadstroke" properties, the performance of
any one model depends strongly on the choice of
parameters in a particular instantiation of that
model. Since the number of such parameters
(explicit or implicit) is typically large, and
the computational cost of evaluating one
particular parameter set is high, the space of
possible model instantiations goes largely
unexplored. Thus, when a model fails to approach
the abilities of biological visual systems, we
are left uncertain whether this failure is
because we are missing a fundamental idea, or
because the correct "parts" have not been tuned
correctly, assembled at sufficient scale, or
provided with enough training.
In this talk, I'll present a highthroughput
search approach for exploring the range of
biologicallyinspired visual models, that
emphasizes simplicity, careful evaluation
procedures, and scale. Systems discovered
using this approach are shown to perform
surprisingly well across a range of visual
problem domains and achieve stateoftheart
performance on a collection of challenging
realworld face recognition datasets.
Finally, I'll discuss current limitations and
future opportunities of this approach and argue
for renewed attention to data set quality and
model scalability.
April 20, 2011
121pm, 182 George St, Rm 110
Selffolding
of polyhedra experiments and a little theory
Govind Menon
Associate Professor
Division of Applied Mathematics
Brown University
A fascinating
development in materials chemistry is the
construction of `large' components by
selfassembly. I will describe some fascinating
experiments of David Gracias' lab, and some
preliminary mathematical work on mesoscale
polyhedra that selffold. Many problems are
open, and the main purpose of this talk is to
advertise some of them.
May 4, 2011
121pm, 182 George St, Rm 110
Logistic
Regression on Data Streams
Kevin Kochanek
Research
Mathematician
US Department of Defense
The emergence of highvolume data
sources in fields such as finance,
telecommunications, and biomedicine has led to
a surge of interest in streaming data analysis.
In this talk we consider the problem of
constructing generalized linear models on data
streams. We also present a novel streaming
adaptive histogramming algorithm to facilitate
our approach that is also useful in other
contexts.
May 11, 2011
121pm, 182 George St, Rm 110
Division of Applied Mathematics &
Center for Vision Research Seminar
Statistical
structures of natural scenes and neural ensembles
activities
TaiSing Lee
Associate Professor
Computer Science Department and Center for the
Neural Basis of Cognition
Carnegie Mellon University
Statistical structures of
the natural environment could provide important
clues for understanding the neural representations
and functional circuitry underlying visual
perception. We studied the statistical structures
of 3D scenes, and evaluated whether these
structures are encoded in the tuning properties
and functional connectivities of neurons in the
early visual cortex. We found that
information encoded in the distribution of
disparitytuned neurons at the population level
matches the statistical structures of natural
scenes in a number of aspects: the distribution of
tuning curves, the correlation in tunings to
different visual cues, and the functional
connectivity of neurons. We also found that the
ensemble activities of neurons support the notion
of a probabilistic population code that allows the
encoding of statistical priors of the natural
scenes, the uncertainty and the posterior
distribution of inferred perceptual
attributes. These findings provide
preliminary empirical insights on how pattern
theory can be realized at the level of neuronal
population and neuronal circuits.
20092010
September 16, 2009
121pm, 182 George St, Rm 110
Organizational
meeting
& Conditional inference for nonstationary
data
Matthew Harrison
Assistant Professor
Division of Applied Mathematics
Brown University
This is a new
time and a new format for the pattern theory
seminar, so we will begin with an organizational
meeting. Bring ideas for things you'd like
to see in our seminar series. With the
remaining time, I will discuss a semiparametric
conditional inference perspective for modeling
data that have fast (and interesting) dynamics
confounded with slow, nonstationary (and not so
interesting) dynamics. This is work in
progress with Asohan Amarasingham and Stuart
Geman.
September 23,
2009
121pm, 182 George St, Rm 110
Latent
SocioSpatial Process Model for Social Networks
Crystal Linkletter
Assistant Professor
Department of Community Health and Center for
Statistical Sciences
Brown University
With concerns of bioterrorism, the
advent of new epidemics that spread with
persontoperson contact, such as SARS, and the
rapid growth of online social networking
websites, there is currently great interest in
building statistical models that emulate social
networks. Stochastic network models can provide
insight into social interactions and increase
understanding of dynamic processes that evolve
through society. A major challenge in developing
any stochastic social network model is the fact
that social connections tend to exhibit unique
inherent dependencies. For example, they tend to
show a lot of clustering and transitive behavior,
heuristically described as “a friend of a friend
is a friend.” It might be reasonable to expect
that covariate similarities, or “closeness” in
social space, should somehow be related to the
probability of connection for some social network
data. The relationship between covariates and
relations is likely to be complex, however, and
may in fact be different in different regions of
the covariate space. Here, we present a new
sociospatial process model that smoothes the
relationship between covariates and connections in
a sample network using relatively few parameters,
so the probabilities of connection for a
population can be inferred and likely social
network structures generated. Having a predictive
social network model is an important step toward
the exploration of disease transmission models
that depend on an underlying social network.
September 30,
2009
121pm, 182 George St, Rm 110
Learning
maximumentropy
models of salience via EM
Micha Elsner
Graduate Student
Department of Computer Science
Brown University
Coreference resolution is the task of
linking a nominal expression (such as a pronoun)
to its antecedent in text or speech data. We
propose a maximum entropy model which ranks the
syntactic environments in which antecedents tend
to occur. We learn this model in an unsupervised
way using the Expectation/Maximization (EM)
algorithm. As an initial application, we describe
a consistent, probabilistic reformulation of the
pronoun resolution model given in (Charniak+Elsner
09); we then extend this to a highprecision model
predicting coreference between noun phrases with
the same head word. This is joint work with Eugene
Charniak and Mark Johnson; it is very much still
in progress, so comments and suggestions are
welcome.
October
7, 2009
121pm, 182 George St, Rm 110
The probabilistic language
of thought
Noah Goodman
Research Scientist
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology
Logic and probability (aka. 'symbols and
statistics', 'structured representation and graded
inference') are key themes of cognitive science
that have long had an uneasy coexistence. I will
describe a Probabilistic Language of Thought
approach that brings them together into
compositional representations with probabilistic
meaning. This provides a view of cognition in
which mental representations describe causal
"working models" of the world, that can be used
for reasoning and learning by probabilistic
inference. Using this framework I will investigate
human concept learning, beginning with simple
categorization tasks and extending to acquisition
of abstract concepts. I will describe a model of
learning the meanings of number words that
predicts the staged progression exhibited by
children (including the striking conceptual
reorganization associated with the "cardinal
principle" transition). I will then briefly
describe how a probabilistic language of thought
explains phenomena of human reasoning, including
both classic syllogistic reasoning tasks and more
complex reasoning involving causality and agency.
October
14,
2009
121pm, 182 George
St, Rm 110
SIFT Flow: Dense Scene Alignment and Its
Applications
Ce Liu
Postdoctoral Researcher
Microsoft Research New England
In this talk I will introduce dense scene
alignment and how it can be applied to a number of
computer vision problems varying from satellite
image registration to object recognition and scene
parsing. We propose SIFT flow that establishes
dense, semantically meaningful correspondence
between two images across scenes by matching
pixelwise SIFT features. Using SIFT flow, we
develop a new framework for image parsing by
transferring metadata information, such as
annotation, motion and depth, from the images in a
large database to an unknown query image. We
demonstrate this framework using new applications
such as predicting motion from a single image and
motion synthesis via object transfer. Based on
SIFT flow, we introduce a nonparametric scene
parsing system using label transfer, with very
promising experimental results suggesting that our
system outperforms stateoftheart techniques
based on training classifiers.
October
21,
2009
121pm, 182 George
St, Rm 110
Recent Advances on the Geometry of the
Riemannian Manifold of Landmarks
Mario Micheli
Assistant Adjunct Professor
Department of Mathematics
University of California, Los Angeles
In the past few years there has been a growing
interest, in diverse scientific communities, in
endowing "shape spaces" with Riemannian metrics,
so to be able to measure similarities between
shapes and perform statistical analysis on data
sets (e.g. for object recognition, target
detection and tracking, classification, and
automated medical diagnostics). The geometry of
such spaces has started to emerge only very
recently; in this talk we will explore the
sectional curvature for the Riemannian manifold of
landmark points (which is one of the simplest, in
that it is finitedimensional) and discuss its
effects on applications.
October
28,
2009
121pm, 182 George
St, Rm 110
Statistical Analysis of Climate
Ecosystem Dynamics
Surajit Ray
Assistant Professor
Department of Mathematics & Statistics
Boston University
Functional approaches to modeling dynamics of
biological systems, trends in financial cycle,
seasonal measurements of spectral bands in remote
sensing, are becoming increasingly popular as a
data analysis tool.Clustering and classification
is often an important final objective of
functional data analysis. But most model based
clustering techniques rely on the assumptions of
normal or T distributions and as such are not
appropriate for clustering functional data, which
often lie on nonlinear manifolds. In this talk we
will present a novel model based clustering
techniques for analyzing functional data. Our
method is based on the modal clustering
methodology developed by Li, Ray and Lindsay
(2007), but new visualization, computational and
inferential techniques need to be designed
especially for functional data. The talk will also
focus on parallelization of modal clustering.
Application of functional clustering to remote
sensing data will also be discussed.
*November 2, 2009
(Monday)
3:304:30
pm [refreshments beginning at 3:15]
121 South Main St, Rm 245
Center for Statistical Sciences /
Division of Applied Mathematics Seminar
Application
of Heteroskedastic Spatial Models to Computer
Experiments
Richard A. Davis
Howard Levene Professor of Statistics
Department of Statistics
Columbia University
We consider modeling a deterministic computer
response as a realization from a stochastic
heteroskedastic process (SHP), which incorporates
a spatiallycorrelated volatility process into the
traditional spatiallycorrelated Gaussian process
(GP) model. Unconditionally, the SHP is a
stationary nonGaussian process, with stationary
GP as a special case. Conditional on a latent
process, the SHP is a nonstationary GP. The
sample paths of this process offer more modeling
flexibility than those produced by a traditional
GP, and can better reflect prediction uncertainty.
GP prediction error variances depend only on the
locations of inputs, while SHP can reflect local
inhomogeneities in a response surface through
prediction error variances that depend on both
input locations and output responses.
We use maximum likelihood for inference, which is
complicated by the high dimensionality of the
latent process. Accordingly, we develop an
importance sampling method for likelihood
computation and use a lowrank kriging
approximation to reconstruct the latent process.
Responses at unobserved locations can be predicted
using empirical best predictors or by empirical
best linear unbiased predictors. Prediction error
variances are also obtained. In examples with
simulated and real computer experiment data, the
SHP model is superior to traditional GP models.
(This is joint work with Jay Breidt, Wenying
Huang, and Ke Wang.)
*November 6, 2009
(Friday)
4:005:30
pm [refreshments beginning at 3:45 pm]
Marcuvitz Auditiorium (Room 220), Sidney Frank
Hall, 185 Meeting St.
Division of Applied Mathematics / Center for
Vision Research Seminar
Understanding
Visual Scenes
Antonio Torralba
Esther and Harold E. Edgerton
Associate Professor
Department of Electrical Engineering and Computer
Science
Massachusetts Institute of Technology
Human visual scene understanding is remarkable:
with only a brief glance at an image, an abundance
of information is available  spatial structure,
scene category and the identity of main objects in
the scene. In traditional computer vision, scene
and object recognition are two visual tasks
generally studied separately. However, it is
unclear whether it is possible to build robust
systems for scene and object recognition, matching
human performance, based only on local
representations. Another key component of machine
vision algorithms is the access to data that
describe the content of images. As the field moves
into integrated systems that try to recognize many
object classes and learn about contextual
relationships between objects, the lack of large
annotated datasets hinders the fast development of
robust solutions. In the early days, the first
challenge a computer vision researcher would
encounter would be the difficult task of
digitizing a photograph. Even once a picture was
in digital form, storing a large number of
pictures (say six) consumed most of the available
computational resources. In addition to the
algorithmic advances required to solve object
recognition, a key component to progress is access
to data in order to train computational models for
the different object classes. This situation has
dramatically changed in the last decade,
especially via the internet, which has given
computer vision researchers access to billions of
images and videos. In this talk I will describe
recent work on visual scene understanding that try
to build integrated models for scene and object
recognition, emphasizing the power of large
database of annotated images in computer vision.
November 11,
2009
121pm, 182 George
St, Rm 110
Generative Models for Image Analysis
LoBin Chang
Graduate Student
Division of Applied Mathematics
Brown University
A probabilistic grammar for the
grouping and labeling of parts and objects, when
taken together with pose and partdependent
appearance models, constitutes a generative scene
model and a Bayesian framework for image analysis.
To the extent that the generative model generates
features, as opposed to pixel intensities, the
posterior distribution (i.e. the conditional
distribution on part and object labels given the
image) is based on incomplete information; feature
vectors are generally insufficient to recover the
original intensities. I will propose a way to
learn pixellevel models for the appearances of
parts. I will demonstrate the utility of the
models with some experiments in Bayesian image
classification.
November
18, 2009
121pm, 182 George
St, Rm 110
Spectral Filtering Approaches to Machine
Learning
Lorenzo Rosasco
Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology
In this talk we discuss a
computational framework for learning complex
high dimensional data. The proposed methods
rely on estimating regularized version of
datadependent kernel matrices, where
regularization is achieved filtering out the
unstable components corresponding to small
eigenvalues. In many cases these matrices
can be interpreted as empirical versions of
underlying covariance matrices, integral operators
or closely related objects, such as diffusion
operators. We will describe different
strategies to implement regularization and discuss
the corresponding computational properties.
The sample complexity of the methods is analyzed
and shown to be optimal for a large class of
problems. Experimental results in the
context of supervised learning classification and
vector fields estimation will be presented.
*November 20, 2009 (Friday)
4:005:30 pm,
182 George St, Rm 110 [refreshments
beginning at 3:45]
Applied Mathematics / Center for
Statistical Sciences Seminar
Error controls for multiple hypothesis
testing
Zhiyi Chi
Associate Professor
Department of Statistics
University of Connecticut
Multiple hypothesis testing is challenging when
signals in data are weak, the distributions of
noise and signals are not completely known, or
there is statistical dependency in data. I
will present some theoretical results on these
issues. I will show that selfnormalized
large deviations can be useful in dealing with
experiment designs for weak signals, linear
programming can be used to evaluate pvalues when
data distributions are not completely known, and
techniques similar to nonlinear filtering can be
used to analyze the performance of multiple
testing on hidden states of Markov models that may
be nonstationary.
February 3,
2010
121pm, 182 George St, Rm 110
An analysis
of connectivity in neuronal population
recordings
Asohan Amarasingham
Postdoctoral Fellow
Center for Molecular and Behavioral Neuroscience
Rutgers University
Little is known about how neurons interact through
their spike trains in natural conditions. A chief
barrier is experimental: simultaneous neuronal
recordings are extremely sparse. But recent
advances are expanding the number of neurons that
can be observed simultaneously in a behaving
animal. We will describe a recent study in which
we combined simultaneous recordings of medial
prefrontal cortex neurons in behaving rats with
careful statistical analysis to assess the
evidence for functional signatures of anatomical
and physiological connectivity in that neuronal
population. We will use the scientific setting as
a context in which to explore some basic issues in
neurophysiological data analysis, with particular
emphasis on spike trainresampling techniques.
This is joint work with Shigeyoshi Fujisawa,
Gyorgy Buzsaki, Matthew Harrison, and Stuart
Geman.
Cancelled (winter
storm)
February 10, 2010
121pm, 182 George St, Rm 110
Detailed
Human Shape and Pose from Images
Alex Balan
Graduate Student
Department of Computer Science
Brown University
Automating the process of measuring human shape
characteristics and estimating body postures from
images lies at the core of many applications in
computer vision, computer graphics, robotics and
biomechanics. In its most general form, this
problem is severely underconstrained. It can be
made more tractable by employing simplifying
assumptions and relying on domain specific
knowledge, or by engineering the environment
appropriately.
In this thesis we demonstrate that using a
datadriven model of the human body supports the
recovery of both human shape and articulated pose
from images, and has many benefits over previous
body models. Specifically, we represent the body
using a recently proposed triangulated mesh model
called SCAPE which employs a lowdimensional, but
detailed, parametric model of shape and
posedependent deformations. We show that the
parameters of the SCAPE model can be estimated
directly from image data in a variety of imaging
conditions.
We first consider the case of multiple calibrated
and synchronized camera views and assume the
subject wears tightfitting clothing. We define a
cost function between image silhouettes and a
hypothesized mesh and formulate the problem as an
optimization over the body shape and pose
parameters. Second, we relax the tightfitting
clothing assumption and develop a robust method
that accounts for the fact that observed
silhouettes of clothed people do not provide tight
bounds on the true 3D shape. We think of these
silhouettes as providing only weak constraints,
but collect many of them while observing the
subject in many poses. These, together with strong
constraints from regions detected as skin, can be
combined with a prior expectation of typical
shapes to infer the most likely shape model under
the clothes. Results on a novel database of
thousands of images of clothed and "naked"
subjects suggest that this method may be accurate
enough for biometric shape analysis from images.
*February 12, 2010 (Friday)
4:005:30pm,
Barus & Holley 190 [refreshments begin at
3:45pm]
Division
of Applied Mathematics / Center for Vision
Research Seminar
Learning
Hierarchies of Sparse Visual Features
Yann LeCun
Silver Professor of Computer Science and
Neural Science
The Courant Institute of Mathematical Sciences and
Center for Neural Science
New York University
Intelligent tasks such object recognition,
auditory scene analysis, or language understanding
require the construction of good internal
representations of the world. Internal
representations (or "features") must be invariant
(or robust) to irrelevant variations of the input,
but must preserve the information relevant to the
task. An important goal of our research, and an
important challenge for Machine Learning over the
next few years, is to devise methods that can
automatically learn good internal representations
from labeled and unlabeled data. Theoretical and
empirical evidence suggest that the visual world
is best represented by a multistage hierarchy, in
which features in successive stages are
increasingly global, invariant, and abstract. The
main question is how can one train such deep
architectures from unlabeled data and limited
amounts of labeled data.
We describe a class of methods to train
multistage systems in which each stage performs a
series of convolutions followed by simple
nonlinearities. The unsupervised learning phase
is based on sparse coding methods, but includes a
feedforward predictor that gives a quick
approximation of the sparse code. A number of such
stages are stacked and trained sequentially in an
unsupervised manner. The entire system is then
refined in a supervised manner.
An application to categorylevel object
recognition with invariance to pose and
illumination will be described. By stacking
multiple stages of sparse features, and refining
the whole system with supervised training,
statetheart accuracy can be achieved on standard
datasets with very few labeled samples. A
realtime demo will be shown. Another application
to visionbased navigation for offroad mobile
robots will be shown. After a phase of offline
unsupervised learning, the system autonomously
learns to discriminate obstacles from traversable
areas at long range using labels produced with
stereo vision for nearby areas.
This is joint work with YLan Boureau, Karol
Gregor, Raia Hadsell, Koray Kavakcuoglu, and
Marc'Aurelio Ranzato.
February 17,
2010
121pm, 182 George St, Rm 110
Message
Passing Algorithms for Compressed Sensing
Andrea Montanari
Assistant Professor
Department of Electrical Engineering and
Department of Statistics
Stanford University
Compressed sensing aims to undersample certain
highdimensional signals, yet accurately
reconstruct them by exploiting signal
characteristics. Accurate reconstruction is
possible when the object to be recovered is
sufficiently sparse in a known basis.
Currently, the best known sparsityundersampling
tradeoff is achieved when reconstructing by convex
optimization  which is expensive in important
largescale applications.
Fast iterative thresholding algorithms have been
intensively studied as alternatives to convex
optimization for largescale problems.
Unfortunately known fast algorithms offer
substantially worse sparsityundersampling
tradeoffs than convex optimization.
We introduce a simple costless modification to
iterative thresholding making the
sparsityundersampling tradeoff of the new
algorithms equivalent to that of the corresponding
convex optimization procedures. The new
iterativethresholding algorithms are inspired by
belief propagation in graphical models.
Our empirical measurements of the
sparsityundersampling tradeoff for the new
algorithms agree with theoretical calculations. We
show that a state evolution formalism correctly
derives the true sparsityundersampling tradeoff.
There is a surprising agreement between earlier
calculations based on random convex polytopes and
this new, apparently very different theoretical
formalism.
This is based on joint work with D.L. Donoho, A.
Maleki and with M. Bayati.
March 3, 2010
121pm, 182 George St, Rm 110
Topic
Models: Priors, Stop Words and Languages
Hanna Wallach
Senior Postdoctoral Research Associate
Department of Computer Science
University of Massachusetts Amherst
In this talk, I will compare several classes of
structured priors for topic models, and show that
an asymmetric Dirichlet prior over the
documenttopic distributions has substantial
advantages over a symmetric prior, while an
asymmetric prior over the topicword distributions
provides no real benefit. This combination of
priors substantially increases the robustness of
topic models to the highly skewed word frequency
distributions common in natural language. I will
also present two topic models that cannot easily
rely on standard stop word pruning techniques and
therefore require such robustness. The first is a
topicbased predictive language model that
incorporates both ngram statistics and latent
topic variables, while the second is a polylingual
topic model that discovers topics aligned across
multiple languages from massive collections of
interlinked documents.
March 10, 2010
121pm, 182 George St, Rm 110
RespondentDriven
Sampling
for Networks: Degrees of Uncertainty with
Uncertain Degrees
Joe Blitzstein
Assistant Professor
Department of Statistics
Harvard University
Respondentdriven sampling (RDS) is a network
sampling design (introduced by Heckathorn) which
is widely and increasingly being used to study
properties of individuals in a social network,
e.g., HIV prevalance. The standard estimators make
many strong assumptions, including that the degree
d of an individual in the sample is known exactly
and that this individual is equally likely to
recruit any of his or her d neighbors in the
social network.
Classical issues of modelbased vs. designbased
inference arise here, with new challenges from
having also to account for the network structure.
We explore the biasvariance tradeoff encountered
in this setting, and show how the uncertainty in
the measurement of degrees propagates into
uncertainty in the RDS estimates. We then show how
Bayesian and importance sampling ideas can be used
to give obtain more reliable estimates.
*March 12, 2010
(Friday)
45:30pm, 182 George St, Rm
110
Graphs and
polytopes: learning structures with linear
programming relaxations
Tommi Jaakkola
Professor
Department of Electrical Engineering and Computer
Science
Massachusetts Institute of Technology
Combinatorial problems are commonplace in
modeling. They arise in predicting likely values
for variables (e.g., selection/orientation of
residues in protein design) or learning the model
structure itself. In recent years, a lot of effort
has gone into developing linear programming (LP)
relaxations for finding the most likely values for
the variables in a given model. In this talk, I
will focus on the related combinatorial problem of
learning the model structure, i.e., solving for
the highest scoring Bayesian network (BN) graph
given data. This structure learning problem can be
viewed as an inference problem where the variables
specify the choice of parents for each node in the
graph. The key combinatorial difficulty arises
from the global constraint that the graph
structure has to be acyclic. We cast the structure
learning problem as a linear program over the
polytope defined by valid structures. In relaxing
this problem, we maintain an outer bound
approximation to the polytope and iteratively
tighten it by searching over a new class of valid
constraints. If an integral solution is found, it
is guaranteed to be the optimal graph structure.
When the relaxation is not tight, the fast dual
algorithms we develop remain useful in combination
with a branch and bound method. I will also
illustrate the strengths and weaknesses of the
approach in the context of sample structure
learning problems.
The talk covers joint work with David Sontag, Amir
Globerson, and Marina Meila.
March 17, 2010
121pm, 182 George St, Rm 110
Learning in
Social Networks with Signals of Bounded
Informativeness
Ilan Lobel
Postdoctoral Researcher
Microsoft Research New England
We study the (perfect Bayesian) equilibrium of a
model of learning over a general social network.
Each individual receives a signal about the
underlying state of the world, observes the past
actions of a stochasticallygenerated neighborhood
of individuals, and chooses one of two possible
actions. The stochastic process generating the
neighborhoods defines the network topology (social
network). We characterize purestrategy equilibria
for arbitrary stochastic and deterministic social
networks and determine conditions under which
there will be asymptotic learning, that is, the
conditions under which, as the social network
becomes large, individuals converge (in
probability) to taking the right action. In
previous work, we showed that when the likelihood
ratio of the signals is unbounded, there is
asymptotic learning when there is some minimal
amount of "expansion in observations". In this
talk, we show that with signals of bounded
likelihood ratio, asymptotic learning is achieved
if three conditions hold: expanding observations
holds, there exists a subsequence of agents with
unpersuasive neighborhoods and there exists a
uniform lower bound on the probability of
observing the history of past actions. This talk
is based on joint work with Daron Acemoglu (MIT
Economics), Munther Dahleh (MIT EECS) and Asu
Ozdaglar (MIT EECS).
March 24, 2010
121pm, 182 George St, Rm 110
Unsupervised
Part of Speech Tagging: From graphical models to
statistical models to "biological" models
Michael Lamar
Graduate student
Division of Applied Mathematics
Brown University
A longstanding problem in computational
linguistics is the automatic extraction of part of
speech categories from a corpus. We
introduce this problem and discuss several purely
contextual, unsupervised approaches to this
problem beginning with Hidden Markov Models which,
until recently, have set the standard for
performance. We then formulate two new
alternative approaches. The first is a
purely statistical treatment which incorporates
singular value decomposition and clustering.
The second is a selforganizing model inspired by
simple biological processes. We compare
these new approaches to the HMM's and discuss why
they achieve better performance.
April 7, 2010
121pm, 182 George St, Rm 110
Geometry of
the space of 2D shapes equipped with the
WeilPetersson metric
Sergey Kushnarev
Graduate Student
Division of Applied Mathematics
Brown University
The study of planar simple closed curves (or "2D
shapes") and their similarities is a central
problem in the field of computer vision. It arises
in the task of characterizing and classifying
objects from their observed silhouette. Defining
natural distance between 2D shapes creates a
metric on the infinitedimensional space of
shapes. In this talk I will describe one
particular metric, which comes from the conformal
mapping of the 2D shapes, via the theory of
Teichmuller spaces. In this space every simple
closed curve (or a 2D shape) is represented by a
smooth selfmap of a circle. I will talk
about a specific class of solitonlike geodesics
on the space of shapes, called teichons. Some
numerical examples of geodesics and effects of the
curvature will be demonstrated.
*April 9, 2010 (Friday)
45:30pm, 182 George St, Rm
110
Analysis of
Molecular Networks
Mark Gerstein
Albert L. Williams Professor of Biomedical
Informatics
Departments of Molecular Biophysics and
Biochemistry and of Computer Science
Yale University
My talk will be concerned with understanding
protein function on a genomic scale. My lab
approaches this through the prediction and
analysis of biological networks, focusing on
proteinprotein interaction and
transcriptionfactortarget ones. I will describe
how these networks can be determined through
integration of many genomic features and how they
can be analyzed in terms of various topological
statistics. In particular, I will discuss a number
of recent analyses: (1) Improving the prediction
of molecular networks through systematic
trainingset expansion; (2) Showing how the
analysis of pathways across environments
potentially allows them to act as biosensors; (3a)
Analyzing the structure of the regulatory network
indicates that it has a hierarchical layout with
the "middlemanagers" acting as information
bottlenecks; (3b) Showing these middle managers
tend be arranged in various "partnership"
structures giving the hierarchy a "democratic
character"; (4) Showing that most human variation
occurs on the periphery of the protein interaction
network; and (5) Developing useful webbased tools
for the analysis of networks (TopNet and tYNA).
http://networks.gersteinlab.org
http://topnet.gersteinlab.org
The tYNA platform for comparative interactomics: a
web tool for managing, comparing and mining
multiple networks. KY Yip, H Yu, PM Kim, M
Schultz, M Gerstein (2006) Bioinformatics 22:
296870.
Analysis of Diverse Regulatory Networks in a
Hierarchical Context: Consistent Tendencies for
Collaboration in the Middle Levels. N Bhardwaj et
al. PNAS (2010, in press)
Positive selection at the protein network
periphery: evaluation in terms of structural
constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S
A 104: 202749.
Training Set Expansion: An Approach to Improving
the Reconstruction of Biological Networks from
Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics
Quantifying environmental adaptation of metabolic
pathways in metagenomics. T Gianoulisa, J Raes, P
Patel, R Bjornson, J Korbel, I Letunic, T Yamada,
A Paccanaro, L Jensen, M Snyder, P Bork, M
Gerstein (2009) PNAS
April 14, 2010
121pm, 182 George St, Rm 110
Preserving
knowledge through media transitions: ushering
the heritage of India into the digital age
Peter Scharf
Director of The Sanskrit Library and Senior
Lecturer in Sanskrit
Department of Classics
Brown University
Human beings express knowledge in various modes:
through images in visual art; through movement in
dance, theatrical performance, and gestures; and
through speech in spoken language. Each of
these means of expression includes means to encode
knowledge, and each is used to express knowledge
originally encoded in one of the others. In
particular, writing represents knowledge
originally encoded in speech. Certain media
dominate as the primary methods for the
transmission of detailed information at different
times and places. Oral tradition dominated
the tradition of Sanskrit in India in the first
and second millennia B.C.E. Writing overtook
orality in the first millennium C.E. and dominated
until replaced by printing in the 19th
century. At the dawn of the 21st century,
the digital medium is replacing printing as the
dominant means of knowledge transmission.
While brittle editions of printed Sanskrit books
are removed from easy access to library annexes,
the Sanskrit Library is working to usher the
heritage of India into the digital age.
Scharf designed an encoding scheme to represent
Sanskrit accurately and revised the Unicode
Standard to include characters necessary for the
representation of the ancient Vedic texts.
He directed an NSFfunded project 20062009 to
build a digital Sanskrit library that integrates
lexical resources, linguistic software, and
machinereadable texts. Now he directs an
NEHfunded project to produce highquality digital
images of manuscripts at Brown University and the
University of Pennsylvania and to develop
wordspotting technology to facilitate searching
even in digital images not amenable to optical
character recognition (OCR).
*April 16, 2010 (Friday)
45:30pm, 182 George St, Rm
110
Division of Applied Mathematics / Center for
Statistical Sciences Seminar
MultiResolution
Inference
of Stochastic Models from Partially Observed
Data
Samuel Kou
Professor
Department of Statistics
Harvard University
Stochastic models, diffusion models in particular,
are widely used in science, engineering and
economics. Inferring the parameter values from
data is often complicated by the fact that the
underlying stochastic processes are only partially
observed. Examples include inference of discretely
observed diffusion processes, stochastic
volatility models, and double stochastic Poisson
(Cox) processes. Likelihood based inference faces
the difficulty that the likelihood is usually not
available even numerically. Conventional approach
discretizes the stochastic model to approximate
the likelihood. In order to have desirable
accuracy, one has to use highly dense
discretization. However, dense discretization
usually imposes unbearable computation burden. In
this talk we will introduce the framework of
Bayesian multiresolution inference to address
this difficulty. By working on different
resolution (discretization) levels simultaneously
and by letting the resolutions talk to each other,
we substantially improve not only the
computational efficiency, but also the estimation
accuracy. We will illustrate the strength of the
multiresolution approach by examples.
April 21, 2010
121pm, 182 George St, Rm 110
Segmentation
of Image Ensembles via Latent Atlases
Tammy RiklinRaviv
Postdoctoral Associate
Department of Electrical Engineering and Computer
Science
Massachusetts Institute of Technology
The images acquired via medical imaging modalities
are frequently subject to low signaltonoise
ratio, bias field and partial volume effects.
These artifacts, together with the naturally low
contrast between image intensities of some
neighboring structures, make the extraction of
regions of interest (ROIs) in clinical images a
challenging problem. Probabilistic atlases,
typically generated from comprehensive sets of
manually labeled examples, facilitate the analysis
by providing statistical priors for tissue
classification and structure segmentation.
However, the limited availability of training
examples that are compatible with the images to be
segmented renders the atlasbased approaches
impractical in many cases. In the talk I
will present a generative model for joint
segmentation of corresponding regions of interest
in a collection of aligned images that does not
require labeled training data. Instead, the
evolving segmentation of the entire image set
supports each of the individual segmentations.
This is made possible by iteratively inferring a
subset of the model parameters, called the spatial
parameters, as part of the joint segmentation
processes. These spatial parameters are defined in
the image domain and can be viewed as a latent
atlas. Our latent atlas formulation is based on
probabilistic principles, but we solve it using
partial differential equations and energy
minimization criteria. We evaluate the method
successfully for the segmentation of cortical and
subcortical structures within different
populations and of brain tumors in a
singlesubject multimodal longitudinal
experiment.
*April 23, 2010
(Friday)
45:30pm, 182 George St, Rm
110
A Generative
Model Approach to Fraud Detection
Brian Lucena
Chief Mathematician
Guardian Analytics Inc.
Abstract: Standard discriminative approaches rely
on seeing lots of examples of both classes in
order to distinguish between the two.
However, in the specific case of online banking
fraud, these techniques are hampered by two major
practical concerns. First, there is a huge
asymmetry in the amount of data available about
each class – hundreds of millions of legitimate
transactions and perhaps a few hundred fraudulent
ones. Second, fraudsters quickly change
their tactics, and so even those few data points
on fraud may not be very useful in predicting
future fraud. Consequently, we take a
modelbased approach to exploit the natural
differences between individual users. This
talk will describe the FraudMAP system and discuss
several interesting problems that arise in its
development.
*May 10, 2010
(Monday)
4:005:30pm,
182 George St, Rm 110
Shared
Segmentation of Natural Scenes using Dependent
PitmanYor Processes
Erik Sudderth
Assistant Professor
Department of Computer Science
Brown University
We explore statistical frameworks for the
simultaneous, unsupervised segmentation and
discovery of visual object categories from image
databases. Examining a large set of manually
segmented scenes, we show that object frequencies
and segment sizes both follow power law
distributions, which are well modeled by the
PitmanYor (PY) process. This generalization
of the Dirichlet process leads to learning
algorithms which discover an unknown set of
objects, and segmentation methods which
automatically adapt their resolution to each
image. Generalizing previous applications of
PY priors, we use nonMarkov Gaussian processes to
infer spatially contiguous segments which respect
image boundaries. Using a novel family of
variational approximations, our approach produces
segmentations which compare favorably to
stateoftheart methods, while simultaneously
discovering categories shared among natural
scenes.
