Abstracts
Wednesday, Oct 9, 2019
121pm, 182 George St, Room 110
Knockoffs or perturbations, that is a question
Jun S Liu
Professor
Department of Statistics
Harvard University
Simultaneously finding multiple influential variables and controlling the false discovery rate (FDR) for linear regression models is a fundamental problem with a long history. Researchers recently have proposed and examined a few innovative approaches surrounding the idea of creating “knockoff” variables (like spikeins in biological experiments) to control FDR. As opposed to creating knockoffs, a classical statistical idea is to introduce perturbations and examine the impacts. We introduce here a perturbationbased Gaussian Mirror (GM) method, which creates for each predictor variable a pair of perturbed “mirror variables” by adding and subtracting a randomly generated Gaussian random variable, and proceeds with a certain regression method, such as the ordinary leastsquare or the Lasso. The mirror variables naturally lead to a test statistic highly effective for controlling the FDR. The proposed GM method does not require strong conditions for the covariates, nor any knowledge of the noise level and relative magnitudes of dimension p and sample size n. We observe that the GM method is more powerful than many existing methods in selecting important variables, subject to the control of FDR especially under the case when high correlations among the covariates exist. Additionally, we provide a method to reliably estimate a confidence interval and upper bound for the number of false discoveries. If time permits, I will also discuss a simpler bootstraptype perturbation method for estimating FDRs, which is also more powerful than knockoff methods when the predictors are reasonably correlated. The presentation is based on joint work with Xing Xin and Chenguang Dai.
Apr 16, 2019 (Tuesday)
11am12pm, 170 Hope St, Room 108
A Representational Theory of Grid Cells
Ying Nian Wu
Professor
Department of Statistics
UCLA
Imagine walking in your living room in the dark. Purely based on your selfmotion, you can calculate your selfposition, and you can also plan your path to light switch. The grid cells in your brains play a key role in these tasks. The grid cells were discovered by Dr. MayBritt Moser and Dr. Edvard Moser, who won the 2014 Nobel Prize for their surprising discovery that such cells fire at regular hexagon grids imposed on the spatial environment. In this talk, I shall present a representational theory of grid cells, where the selfposition is represented by a vector and the selfdisplacement is represented by a matrix that rotates the vector. I will show that hexagon grid patterns can be learned from simulated trajectories. I will also discuss a similar representational scheme for primary visual cortex. The talk is based on joint work with Ruiqi Gao, Jianwen Xie and SongChun Zhu.
Feb 13, 2019
121pm, 182 George St, Room 110
A combinatorial/algebraic topological approach to nonlinear dynamics
Konstantin Mischaikow
Professor
Department of Mathematics
Rutgers University
Motivated by the increase in data driven science I will discuss a combinatorial/algebraic topological approach to characterizing nonlinear dynamics. In particular, I will describe how order theory can be used to efficiently and effectively organize the decomposition of dynamics and how algebraic topological tools can be used to characterize the structure of the dynamics. I will then propose a definition of nonlinear dynamics based on these structures.
To demonstrate the effectiveness of this approach I will consider several problems from systems and synthetic biology. I will focus on identification and rejection of network models for these types of systems based on functional form and time series data.
Jan 30, 2019
121pm, 182 George St, Room 110
Statistical models of large graphs and networks
Peter Orbanz
Associate Professor
Department of Statistics
Columbia University
Relational data is, roughly speaking, any form of data that can be represented as a graph: A social network, user preference data, proteinprotein interactions, etc. A recent body of work, by myself and others, aims to develop a statistical theory of such data for problems where a single graph is observed (such as a small part of a large social network). Keywords include graphons, edgeexchangeable and sparse exchangeable graphs, and many latent variable models used in machine learning. I will summarize the main ideas and results of this theory: How and why the exchangeability assumptions implicit in commonly used models for such data may fail; what can be done about it; what we know about convergence; and implications of these results for methods popular in machine learning, such as graph embeddings and empirical risk minimization.
Dec 5, 2018
121pm, 182 George St, Room 110
Hidden Physics Models: Machine Learning of NonLinear Partial Differential Equations
Maziar Raissi
Assistant Professor (Research)
Division of Applied Mathematics
Brown University
A grand challenge with great opportunities is to develop a coherent framework that enables blending conservation laws, physical principles, and/or phenomenological behaviours expressed by differential equations with the vast data sets available in many fields of engineering, science, and technology. At the intersection of probabilistic machine learning, deep learning, and scientific computations, this work is pursuing the overall vision to establish promising new directions for harnessing the longstanding developments of classical methods in applied mathematics and mathematical physics to design learning machines with the ability to operate in complex domains without requiring large quantities of data. To materialize this vision, this work is exploring two complementary directions: (1) designing dataefficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and nonlinear differential equations, to extract patterns from highdimensional data generated from experiments, and (2) designing novel numerical algorithms that can seamlessly blend equations and noisy multifidelity data, infer latent quantities of interest (e.g., the solution to a differential equation), and naturally quantify uncertainty in computations. The latter is aligned in spirit with the emerging field of probabilistic numerics.
Nov 7, 2018
121pm, 182 George St, Room 110
Visual Program Induction
Daniel Ritchie
Assistant Professor of Computer Science Brown University
Many elements of our visual world can be compactly described by programs: the repetitive structures in graphical patterns, the symmetric parts of furniture and other humanmade objects, and so on. In this talk, I'll present some recent work on automatically inferring such programs from raw perceptual inputs (2D images or unstructured 3D volumes). Our methods use deep neural networks which learn to perform this inference from a large number of supervised examples. As it is rare to find (and expensive to construct) large corpora of paired (image, program) data in the real world, we train these models primarily using synthetic data. I'll show visual program inference results for three domains: 2D figures, partbased 3D objects, and simple 3D scenes.
Oct 24, 2018
121pm, 182 George St, Room 110
Approximate Message Passing Algorithms for High Dimensional Statistical Estimation
Cynthia Rush
Assistant Professor of Statistics Columbia University
Approximate Message Passing or AMP is a class of low complexity, scalable algorithms used to solve highdimensional noisy linear regression problems where the goal is to estimate a vector x from a noisy measurement y = Ax + noise. AMP has the attractive feature that its performance, measured for example by the squared error loss, can be tracked accurately by a scalar iteration referred to as state evolution. In this talk, I will present recent performance guarantees for analysis of this algorithm under various problem conditions and I will introduce recent applications in statistical inference and estimation.
Oct 17, 2018
121pm, 182 George St, Room 110
Unsupervised Learning for Large Scale Medical Image Analysis
Adrian Dalca
Postdoctoral Fellow CSAIL, Massachusetts Institute of Technology Massachusetts General Hospital, Harvard Medical Schools
New approaches to the analysis of medical images can lead to new insights into complex diseases. In particular, there is a wealth of untapped clinical knowledge to be gained from the millions of diagnostic, low resolution clinical scans acquired every day as part of standard medical practice. In this talk, I will present machine learning techniques that can enable the use of clinical image collections in broad medical research studies. First, I define image imputation – the statistical inference of unobserved image anatomy. I describe an imputation method that exploits anatomical structure in collections of clinicallyacquired low resolution scans, and learns to synthesize missing anatomy. This method enables subsequent studies not previously possible with clinical scans. Second, I present an unsupervised learning algorithm for deformable medical image registration that is up to 20,000 times faster than the current state of the art, while preserving topological guarantees. This enables, for the first time, efficient processing of image collections comprised of tens of thousands of scans. Finally, I demonstrate analysis methods enabled by these algorithms that are currently being used in studies across twelve hospital sites to derive new insights about patients who have suffered a stroke.
Oct 3, 2018
121pm, 182 George St, Room 110
Variable Prioritization in “Black Box” Statistical Methods
Lorin Crawford
Assistant Professor of Biostatistics
Brown University
A consistent theme of the work done in the Crawford Lab is to take modern computational approaches and develop theory that enable their interpretations to be related back to classical genomic principles. The central aim of this talk is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel, interpretable, and computationally efficient way to summarize the relative importance of predictor variables. Methodologically, we develop the “RelATive cEntrality” (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We will illustrate RATE through Bayesian Gaussian process regression; although, the proposed innovations apply to other nonlinear methods (e.g. deep neural networks). It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and applications to real genomewide association mapping studies, we show that applying RATE enables an explanation for this improved performance.
May 2, 2018
121pm, 182 George St, Room 110
The synchronization problem for Kuramoto oscillators and beyond
Javier Morales
U Maryland
Collective phenomena such as aggregation, flocking, and synchronization are ubiquitous in natural biological, chemical, and mechanical systemse.g., the flashing of fireflies, chorusing of crickets, synchronous firing of cardiac pacemakers, and metabolic synchrony in yeast cell suspensions. The Kuramoto model introduced by Yoshiki Kuramoto is one of the first theoretical tools developed to understand such phenomena and has recently gained extensive attention in the physical and mathematical community. Moreover, it has become the starting point of several generalizations that have applications ranging from opinion dynamics to the development of humanmade interacting multiagent system of UAVs and data clustering. In this talk, we will review the state of the art for the synchronization problem of the Kuramoto model at the kinetic and particle level. Additionally, we will introduce new developments and variational techniques for the dynamics of this model and some of its variants and its generalization.
April 18, 2018
121pm, 182 George St, Room 110
Machine Learning in a Setting of Ordinal Distance Information
Matthaus Kleindessner
Rutgers
In a typical machine learning scenario we are given numerical dissimilarity values between objects (or feature representations of objects, from which such dissimilarity values can readily be computed). In the relaxed setting of ordinal distance information we are only provided with binary answers to comparisons of dissimilarity values such as d(A,B)<d(A,C) instead. This setting has attracted interest in recent years, mainly due to the possibility of simple collection of such data by means of crowdsourcing.
My talk will have two parts. First, I will talk about a result that states the asymptotic uniqueness of ordinal embeddings. Ordinal embedding, up to now, is the standard approach to machine learning in a setting of ordinal distance information. The idea is to map the objects of a data set to points in a Euclidean space such that the points preserve the given ordinal distance information as well as possible (with respect to the Euclidean interpoint distances). Second, I will introduce datadependent kernel functions that can be evaluated given only ordinal distance information about a data set. They provide a generic alternative to the ordinal embedding approach and avoid some of its drawbacks. For both works, I want to address a number of open questions.
April 11, 2018
121pm, 182 George St, Room 110
Recent Advances in Elastic Functional and Shape Data Analysis
Anuj Srivastava
Florida State University
Functional and shape data analysis (FSDA) is fast becoming an important research area, due to its broad applications
in many branches of science, including biostatistics and bioinformatics. An essential component of FSDA is
registration of points across functional objects. Without proper registration, the results are often inferior and difficult to interpret.
The current practice in FSDA community is to treat registration as a preprocessing step, using offtheshelf
alignment procedures, and follow it up with statistical analysis of the resulting data. In contrast, Elastic FSDA
is a more comprehensive approach, where one solves for the registration and statistical inferences in
a simultaneous fashion. The key idea here is to use Riemannian metrics with appropriate invariance properties, to form
objective functions for alignment and to develop statistical models involving functional data. While
these elastic metrics are complicated in general, we have developed a family of squareroot transformations
that map these metrics into simpler Euclidean representations, thus enabling more standard statistical procedures.
Specifically, we have developed techniques for elastic functional PCA and elastic regression models
involving functional variables. I will demonstrate this ideas using imaging data in neuroscience and bioinformatics,
where biological structures can often be represented as functions (curves or surfaces) on intervals or spheres.
Examples of curves include DTI fiber tracts and chromosomes while examples of surfaces include subcortical
structures (hippocampus, thalamus, putamen, etc). Statistical goals here include shape analysis and modeling
of these structures and to use their shapes in medical diagnosis.
March 21, 2018
121pm, 182 George St, Room 110
Measure Transport for Bayesian Inference: theory and applications.
Daniele Bigoni
MIT
Measure transport is a valuable tool for characterizing, sampling and manipulating multivariate nonGaussian target distributions. This method has a broad range of applications  e.g., the solution of Bayesian inverse problems, as well as filtering and smoothing of dynamical systems. The transport maps framework seeks a deterministic parametric map that pushes forward a tractable reference distribution to a potentially complex target distribution of interest. The construction of highdimensional maps may be challenging due to the curse of
dimensionality. In many cases, though, one can leverage a number of sources of lowdimensional structure: marginal independence, smoothness, separability, conditional independence, to just name a few. In this seminar we will outline the transport map framework and some of the key ingredients useful to tackle highdimensional problems. The presentation will be accompanied by examples of Bayesian inference problems in geophysics and finance.
February 7, 2018
121pm, 182 George St, Room 110
Causes and Consequences of Human Genomic Variation
Sohini Ramachandran
Brown University
Abstract: Research in the Ramachandran lab addresses problems in population genetics and evolutionary theory, generally using humans as a study system. Our goal is to infer modern human evolutionary history — how mutation, natural selection, and population histories have interacted to produce observed genetic variation in presentday humans — from genomic data alone. Because of advances in sequencing technology, populationgenetic problems are inherently quantitative problems: How can we identify subpopulations in the human species? Which of the millions of common DNA variants predict an individual’s risk for a disease of interest? Can we classify beneficial mutations versus selectively neutral (socalled “junk”) mutations? I’ll discuss our recent approaches for answering these questions, in which we draw on topic models, optimization, information theory, and supervised learning.
December 6, 2017
121pm, 182 George St, Room 110
Advanced Editing, Exploration, and Interaction for Video
James Tompkin
Brown
Video best represents how we visually perceive, and with ubiquitous cameras everyone can capture imagery of our world. However, our tools to edit, explore, and interact with video often require laborious work or expert skill, which makes it merely a consumption medium for most people. With advances in image understanding through computer graphics, vision, and interaction, we can change how we think about video: First, to transform video from a rigid and inaccessible medium into a malleable and creative one, so that nonexperts can make sophisticated contentbased edits. Second, to transform video from a linear medium viewed sequentially into novel interactive contentbased and contextaware visual explorations. Third, to transform video cameras from passive observers into essential tools for realtime creation and interaction with digital imagery.
November 29, 2017
121pm, 182 George St, Room 110
Sequential Bayesian inference via lowdimensional couplings
Youseff Marzouk
MIT
Integration against an intractable probability measure is among the fundamental challenges of Bayesian inference. A useful approach to this problem seeks a deterministic coupling of the measure of interest with a tractable "reference" measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Approximate transports can also be used to "precondition" standard Monte Carlo schemes. Yet characterizing a suitable transport mape.g., representing, constructing, and evaluating itgrows challenging in high dimensions.
We establish links between the conditional independence structure of the target measure and the existence of certain lowdimensional couplings, induced by transport maps that are sparse or decomposable. Our analysis not only facilitates the construction of couplings in highdimensional settings, but also suggests new inference methodologies. We will highlight applications of our theory to nonlinear statespace models, where sparse and/or decomposable transports yield new variational algorithms for filtering, smoothing, and sequential parameter inference.
This is joint work with Alessio Spantini and Daniele Bigoni.
November 8, 2017
121pm, 182 George St, Room 110
Toward perceptually consistent stereo
Todd Zickler
Harvard University
There are two sources of shape information in a pair of binocular stereo images. One is the correlation (matching) signal from surfaces that are visible to both cameras. The other is the decorrelation (antimatching) signal from regions that are visible to one camera but occluded in the other. Vision science has repeatedly shown that both types of information are used in the visual cortex, and that people can perceive depth even when correlation cues are absent or very weak, a capability that remains absent from most computational stereo systems.
I will describe two research directions that (hopefully) move us toward computational stereo algorithms that are more consistent with these perceptual phenomena. Both directions are based on representing a depth map as a piecewise smooth function over the visual field, with a flexible notion of smoothness. One is a scanline algorithm that naturally combines correlation and decorrelation cues, and that matches human perception on a collection of wellknown perceptual stimuli. The other is a 2D algorithm that efficiently exploits piecewise smoothness, but so far without incorporating decorrelation cues. At the end of the talk, I hope to discuss how insights from these two different directions might be combined. The talk is based on two papers:
1. Juialing Wang, Daniel Glasner and Todd Zickler:
ICCV
2017
http://vision.seas.harvard.edu/stereo/
2. Ayan Chakrabarti, Ying Xiong, Steven J. Gortler and Todd Zickler
CVPR 2015. http://ttic.uchicago.edu/~ayanc/consensus/
November 1, 2017
121pm, 182 George St, Room 110
Sifting through Measures on Networks: From a Theoretical Framework to an Empirical Guide
Tina EliassiRad
Northeastern University
In this talk, I will discuss two problems on network data. (1) Measuring tiestrength: Given a set of people and a set of events attended by them, how should we measure connectedness or tie strength between each pair of persons? The underlying assumption is that attendance at mutual events produces an implicit social network between people. I will describe an axiomatic solution to this problem. (2) Measuring similarity between networks: Given two networks (without known nodecorrespondences), how should we measure similarity between them? This problem occurs frequently in many realworld applications such as transfer learning, reidentification, and change detection. I will present an empirical guide on how to select a networksimilarity method, and discuss some promising results based on topological data analysis.
October 25, 2017
121pm, 182 George St, Room 110
Feature Driven Exploration of Images Using Topology
Attila Gyulassy
Scientific Computing and Imaging Institute
University of Utah
Two  and threedimensional images arise in a multitude of disciplines, from computer simulations to experimental sources such as microscopy. A key step to scientific discovery is extracting semantic information from image data. This talk presents practical application of Morse theory to extract and reason with features intrinsic to images. We will discuss both practical algorithms for computing the MorseSmale complex as well as realworld applications drawing on imaged and simulated examples from material science, basic energy research, geologic exploration, and neuroscience.
October 11, 2017
121pm, 182 George St, Room 110
Multireference Alignment with Nonperiodic Distribution is Easier
Nir Sharon
Program of Applied and Computational Mathematics
Princeton University
Multireference alignment is the problem of estimating a signal from its circularly translated copies in the presence of noise. Recently, It was shown that if the translations are drawn from the uniform distribution, the number of samples needed is proportional to 1/SNR^3. We prove that for any nonperiodic distribution, the sample complexity is actually 1/SNR^2. This rate is optimal and can be achieved by a simple spectral algorithm. We further propose and analyze two additional algorithms based on nonconvex optimization and semidefinite programming.
September 20, 2017
121pm, 182 George St, Room 110
Graphbased Bayesian learning: continuum limits and algorithms
Daniel SanzAlonso
Division of Applied Mathematics
Brown University
The principled learning of functions from data is at the core of statistics, machine learning and artificial intelligence. The aim of this talk is to present some new theoretical and methodological developments concerning the graphbased, Bayesian approach to semisupervised learning. I will show suitable scalings of graph parameters that provably lead to robust Bayesian solutions in the limit of large number of unlabeled data. The analysis relies on a careful choice of topology and in the study of the the spectrum of graph Laplacians. Besides guaranteeing the consistency of graphbased methods, our theory explains the robustness of discretized function space MCMC methods in semisupervised learning settings. This is joint work with Nicolas Garcia Trillos, Zachary Kaplan, and Thabo Samakhoana.
May 3, 2017
121pm, 182 George St, Room 110
PCA from noisy linearly transformed measurements
Amit Singer
Professor
Department of Mathematics
Princeton University
We consider the problem of estimating the covariance of X from measurements of the form y_i = A_i*x_i + e_i (for i = 1, . . . , n) where x_i are i.i.d. unobserved samples of X, A_i are given linear operators, and e_i represent noise. Our estimator is constructed efficiently via a simple linear inversion using conjugate gradient performed after eigenvalue shrinkage motivated by the spiked model in high dimensional PCA. Applications to lowrank matrix completion, 2D image denoising, and 3D structure classification in single particle cryoelectron microscopy will be discussed.
April 26, 2017
121pm, 182 George St, Room 110
Numerical Gaussian Processes (Physics Informed Learning Machines)
Maziar Raissi
Postdoctoral Research Associate
Division of Applied Mathematics
Brown University
We introduce the concept of numerical Gaussian processes, which we define as Gaussian processes with covariance functions resulting from temporal discretization of timedependent partial differential equations. Numerical Gaussian processes, by construction, are designed to deal with cases where: (1) all we observe are noisy data on blackbox initial conditions, and (2) we are interested in quantifying the uncertainty associated with such noisy data in our solutions to timedependent partial differential equations. Our method circumvents the need for spatial discretization of the differential operators by proper placement of Gaussian process priors. This is an attempt to construct structured and dataefficient learning machines, which are explicitly informed by the underlying physics that possibly generated the observed data. The effectiveness of the proposed approach is demonstrated through several benchmark problems involving linear and nonlinear timedependent operators. In all examples, we are able to recover accurate approximations of the latent solutions, and consistently propagate uncertainty, even in cases involving very long time integration.
April 19, 2017
121pm, 182 George St, Room 110
Inference in Dynamical Systems
Sayan Mukherjee
Professor
Department of Statistical Science
Duke University
We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Furthermore, we show how some wellstudied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures. We also relate Bayesian inference to the thermodynamic formalism in tracking dynamical systems. We state conditions for consistency of a Gibbs distribution using classic ideas from dynamical systems such as topological entropy and pressure.
March 8, 2017
121pm, 182 George St, Room 110
Tropical Coordinates on the Space of Persistence Barcodes
Sara Kalisnik Verovsek
Postdoctoral Research Associate
Department of Mathematics
Data Science Initiative
Brown University
In the last two decades applied topologists have developed numerous methods for ‘measuring’ and building combinatorial representations of the shape of the data. The most famous example of the former is persistent homology. This adaptation of classical homology assigns a barcode, i.e. a collection of intervals with endpoints on the real line, to a finite metric space. Unfortunately, barcodes are not welladapted for use by practitioners in machine learning tasks. We can circumvent this problem by assigning numerical quantities to barcodes and these outputs can then be used as input to standard algorithms. I will talk about maxplus polynomials and tropical rational functions that can be used as coordinates on the space of barcodes. All of these are stable with respect to the standard distance functions (Bottleneck, Wasserstein) used on the barcode space.
February 8, 2017
121pm, 182 George St, Room 110
Deformation models for image and shape matching
Ronen Basri
Professor
Department of Computer Science and Applied Mathematics
Weizmann Institute of Science
Modeling deformations is important for various applications in computer vision, graphics and geometry processing. In this talk I will describe our recent progress in modeling deformations. In particular, I will describe methods for computing bounded distortion transformations, locally injective maps whose differentials' conformal distortion is bounded. Toward this end, we developed a convex framework for solving optimization problems over matrices that involve functionals and constraints expressed in terms of their extremal singular values. In addition, I will describe methods for computing physicallymotivated elastic maps between shapes. We have applied these methods in a number of challenging problems, including feature matching between images related by nonrigid deformation, nonrigid registration of shape models, and computing extremal quasiconformal maps.
February 1, 2017
121pm, 182 George St, Room 110
Nonequilibrium transitions between metastable patterns in populations of motile bacteria
Eric VandenEijnden
Professor
Courant Institute of Mathematical Sciences
New York University
Active materials can selforganize in many more ways than their equilibrium counterparts. For example, selfpropelled particles whose velocity decreases with their density can display motilityinduced phase separation (MIPS), a phenomenon building on a positive feedback loop in which patterns emerge in locations where the particles slow down. Here, we investigate the effects of intrinsic fluctuations in the system's dynamics on MIPS, using a field theoretic description building on results by Cates and collaborators. We show that these fluctuations can lead to transitions between metastable patterns. The pathway and rate of these transitions is analyzed within the realm of large deviation theory, and they are shown to proceed in a very different way than one would predict from arguments based on detailedbalance and microscopic reversibility. Specifically, we show that these transitions involve fluctuations in diffiusivity of the bacteria followed by fluctuations in their population, in a specific sequence. The method of analysis proposed here, including its numerical component, can be used to study noiseinduced nonequilibrium transitions in a variety of other nonequilibrium setups, and leads to predictions that are verifiable experimentally.
November 30, 2016
121pm, 182 George St, Room 110
How highorder image statistics shape cortical visual processing
Jonathan Victor
Professor
Brain and Mind Research Institute
Department of Neurology
Cornell University
Several decades of work have suggested that Barlow's principle of efficient coding is a powerful framework for understanding retinal design principles. Whether a similar notion extends to cortical visual processing is less clear, as there is no "bottleneck" comparable to the optic nerve, and much redundancy has already been removed. Here, we present convergent psychophysical and physiological evidence that regularities of highorder image statistics are indeed exploited by central visual processing, and at a surprising level of detail.
The starting point is a study of natural image statistics (Tkacic et al., 2010), in which we showed that highorder correlations in certain specific spatial configurations are informative, while highorder correlations in other spatial configurations are not: they can be accurately guessed from lowerorder ones. We then construct artificial images (visual textures) composed either of informative or uninformative correlations. We find that informative highorder correlations are visually salient, while the uninformative correlations are nearly imperceptible. Physiological studies in macaque visual cortex identify the locus of the underlying computations. First, neuronal responses in macaque V1 and V2 mirror the psychophysical findings, in that many neurons respond differentially to the informative statistics, while few respond to the uninformative ones. Moreover, the differential responses largely arise in the supragranular layers, indicating that the computations are the result of intracortical processing.
We then consider low and highorder local image statistics together, and apply a dimensionreduction (binarization) to cast them into a 10dimensional space. We determine the perceptual isodiscrimination surfaces within this space. These are wellapproximated by ellipsoids, and the principal axes of the ellipsoids correspond to the distribution of the local statistics in natural images. Interestingly, this correspondence differs in specific ways from the predictions of a model that implements efficient coding in an unrestricted manner. These deviations provide insights into the strategies that underlie the representation of image statistics..
November 16, 2016
121pm, 182 George St, Room 110
SingleChannel MultiSpeaker Separation using Deep Clustering
John Hershey
Senior Principal Research Scientist
Mitsubishi Electric Research Laboratories (MERL)
Cambridge, Massachusetts
The human auditory system gives us the extraordinary ability to converse in the midst of a noisy throng of party goers. Solving this socalled cocktail party problem has proven extremely challenging for computers, and separating and recognizing speech in such conditions has been the holy grail of speech processing for more than 50 years. Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering, producing unprecedented speakerindependent singlechannel separation performance on twospeaker and threespeaker mixtures. In this framework, a neural network is trained to assign an embedding vector to each element of a multidimensional signal, such that clustering the embeddings yields a desired segmentation of the signal. In the cocktailparty problem, the embeddings are assigned to each timefrequency (TF) index of the shorttime Fourier transform (STFT) of the mixture of speech signals. Clustering these embeddings yields an assignment of each TF bin to one of the inferred sources. These assignments are used as a masking function to extract the dominant parts of each source. This method has produced remarkable performance, reaching levels of improvement  over 10 dB SNR gain  that were previously unobtainable even in simpler speech enhancement tasks. Extensions to the model allow for endtoend training to optimize the whole architecture for best signal quality. We also evaluate our method using automatic speech recognition (ASR), and show that it can reduce the word error rate (WER) from 89.1
November 9, 2016
121pm, 182 George St, Room 110
Joint Scientific Computing and Pattern Theory Seminar
Solution uncertainty quantification for differential equations
Oksana Chkrebtii
Assistant Professor
Department of Statistics
Ohio State University
When models are defined implicitly by systems of differential equations without a closed form solution, small local errors in finitedimensional solution approximations can propagate into large deviations from the true underlying state trajectory. Inference for such models relies on a likelihood approximation constructed around a numerical solution, which underestimates posterior uncertainty. This talk will introduce and discuss progress in a new formalism for modeling and propagating discretization uncertainty through the Bayesian inferential framework, allowing exact inference and uncertainty quantification for discretized differential equation models.
November 2, 2016
121pm, 182 George St, Room 110
Robust Bayesian inference via coarsening
Tamara Broderick
Assistant Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practiceat least in part due to the difficulty of calculating robustness measures from MCMC draws. We demonstrate that, by contrast to MCMC, variational Bayes (VB) techniques are readily amenable to robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for meanfield variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a wellknown major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVBboth for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).
October 5, 2016
121pm, 182 George St, Room 110
Robust Bayesian inference via coarsening
Jeffrey Miller
Assistant Professor
Department of Biostatistics
Harvard University
The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure, particularly when the data set is large. We introduce a simple, coherent approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the observed data exactly, one conditions on the event that the model generates data close to the observed data, with respect to a given statistical distance. When closeness is defined in terms of relative entropy, the resulting "coarsened posterior" can be approximated by simply raising the likelihood to a certain fractional power, making the method computationally efficient and easy to implement in practice. We illustrate with real and simulated data, and provide theoretical results.
September 28, 2016
121pm, 182 George St, Room 110
Title: New Perspectives on Importance Sampling
Daniel Sanz Alonso
Postdoctoral Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University
Importance sampling is a building block of many algorithms in computational statistics, perhaps most notably particle filters. It is the importance sampling step that often limits the accuracy of these algorithms. In this talk I will introduce a new way of understanding importance sampling based on information theory. I will argue that the fundamental problem facing algorithms based on importance sampling can be understood in terms of the distance between certain measures. The results give new understanding on the potential use of importance sampling and particle filters in high (possibly infinite) dimensional spaces.
September 21, 2016
121pm, 182 George St, Room 110
Structurebased comparisons for sequential data
Katherine Kinnaird
Postdoctoral Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University
We present aligned hierarchies, a lowdimensional representation for sequential data streams. The aligned hierarchies encode all hierarchical decompositions of repeated elements from a highdimensional and noisy sequential data stream in one object. These aligned hierarchies can be embedded into a classification space with a natural notion of distance. We motivate our discussion through the lens of Music Information Retrieval (MIR), constructing aligned hierarchies by finding, encoding, and synthesizing all repeated structure present in a song. For a data set of digitized scores, we conducted experiments addressing the fingerprint task, a song comparison task in MIR, that achieved perfect precisionrecall values and provide a proof of concept for the aligned hierarchies.
We also introduce aligned subhierarchies and aligned subdecompositions. Both derived from the aligned hierarchies, these structure based representations for songs can be embedded into classification spaces and can address additional MIR tasks. We will compare properties of the aligned hierarchies, aligned subhierarchies, and the aligned subdecompositions.
September 14, 2016
121pm, 182 George St, Room 110
Discovering the Nature of Nonlinear Relationships
Joshua Vogelstein
Assistant Professor
Department of Biomedical Engineering
Institute for Computational Medicine
Johns Hopkins University
As data collection is becoming easier, it is becoming increasingly difficult and important to answer questions of the form: is one property (e.g., clouds) related to another (e.g., grass wetness). Only If we can determine that these two properties are related toor statistically dependent onone another does it make sense to further investigate the nature of this relationship. Unfortunately, reliably establishing such a relationship can be challenging, especially when the properties themselves are complex and the relationship is nonlinear. We here describe a procedure, called Multiscale Generalized Correlation (MGC), that addresses these challenges. Our key insight is that if two properties are related, comparisons between measurements of similar pairs of the first property (e.g., similarly shaped clouds) should be correlated with the comparisons between corresponding pairs of the second property (grass wetness under those clouds). We demonstrate the statistical and computational efficiency of MGC in both simulations and theory. We then apply it to detect the presence and nature of the relationships between brain activity and personality, brain shape and disorder, and brain connectivity and creativity. Finally, we demonstrate that MGC does not suffer from the false positives that have plagued conventional dependence tests. Our open source implementation of MGC is applicable to fundamental questions confronting science, government, finance, and many other disciplines.
May 18, 2016
121pm, 182 George St, Room 110
Learning Lattice Operators used in Computer Vision Tasks
Junior Barrera
Professor of Computer Science
University of Sao Paulo, Brazil
The transformation of images by lattice operators has some interesting properties: they keep in the discrete and quantized image domain; theoretical results guarantee that any transformation can be implemented by a lattice operator; lattice operators can be implemented through simple elementary lattice operators; lattice operators can be learned through discrete machine learning techniques. In this lecture, we present the general algebraic representation of lattice operator and its application in the learning of these operators. Several approaches for efficient learning (i.e., from a relatively small sample in a reasonable processing time) of complex operators are presented. Between them is the Ucurve problem, which is a new combinatorial optimization problem useful for approaching the classical feature selection problem. Another remarkable result is the multiresolution learning, witch, for a fixed sample, avoids over fitting and gives decreasing estimated errors for an increasing number of parameters. The techniques are illustrated by the learning of several kinds of binary and gray scale image transformations: noise reduction; interpolation; enhancement; segmentation; texture recognition; shape recognition; etc. All the results presented refer to already published papers, but some open problems are pointed and discussed.
April 20, 2016
121pm, 182 George St, Room 110
Learning Causal Graphical Models of LargeScale Systems
David Jensen
Professor of Computer Science
Director of the Knowledge Discovery Laboratory
University of Massachusetts Amherst
Effective methods for inferring causal dependence from observational data have been developed within both computer science and quantitative social science. Methods in computer science have focused on the correspondence between casual graphical models and observed patterns of statistical association. Methods in social science have focused on templates for causal inference often called quasiexperimental designs, including instrumental variables, propensity scores matching, regression discontinuinty designs, and interrupted timeseries designs. Recent work has begun to unify these methods using the framework of causal graphical models, but barriers remain because the formal framework of causal graphical models has been insufficiently expressive to represent some quasiexperimental designs. In this talk, I will introduce many of the known experimental and quasiexperimental designs in the language of directed graphical models, and I will present very recent work on defining additional designs in terms of classes of graphical models that can represent relational and temporal dependence. Finally, I will present two novel designs that have resulted from our work on causal inference in relational data.
March 23, 2016
121pm, 182 George St, Room 110
The Dynamics of the Unconscious Brain Under General Anesthesia
Emery Brown
Warren M. Zapol Professor of Anaesthesia
Massachusetts General Hospital and Harvard Medical School
Edward Hood Taplin Professor of Medical Engineering & Computational Neuroscience
Massachusetts Institute of Technology
General anesthesia is a druginduced, reversible condition comprised of five behavioral states: unconsciousness, amnesia (loss of memory), analgesia (loss of pain sensation), akinesia (immobility), and hemodynamic stability with control of the stress response. Our work shows that a primary mechanism through which anesthetics create these altered states of arousal is by initiating and maintaining highly structured oscillations. We present findings from our human studies of general anesthesia using highdensity EEG recordings and intracranial recordings which have allowed us to give a detailed characterization of the neurophysiology of loss and recovery of consciousness due to propofol. We show how these dynamics change with other anesthetics and with age. We present a neurometabolic model of burst suppression, the profound state of brain inactivation seen in deep states of general anesthesia. We use our characterization of burst suppression to implement a closedloop anesthesia delivery system for control of a medicallyinduced coma. Finally, we demonstrate that the state of general anesthesia can be rapidly reversed by activating specific brain circuits. The success of our research has depended critically on tight coupling of experiments, signal processing research and mathematical modeling.
March 16, 2016
121pm, 182 George St, Room 110
A simple network model for a variety of Delay Match to Sample tasks
Yali Amit
Professor
Department of Statistics and Department of Computer Science
University of Chicago
Delay match to sample experiments (DMS) have inspired much of the modeling work on attractor neural networks. The basic experiment involves showing a target image, removing it, and after a delay showing a cue image: either the original image or a different one. The monkey needs to indicate if the cue is the same or different than the target. Electrophysiological recordings have shown that if the target is a learned one (has been observed multiple times) neurons selective for it maintain activity during the delay between target and cue presentation. This persistent activity is hypothesized to represent 'working' or 'short term' memory. The attractor network model posits that stimulation with learned patterns leads to sustained activity due to the learned recurrent synaptic connections in the network. There are a number of variations on the basic DMS paradigm involving distractors in between target and cue, or repetition detection experiments where a sequence of images is shown and one of them chosen at random is repeated. These different experiments raise several questions that are rarely addressed in the literature. With distractors, how does the network ensure the target sample pattern stays in working memory? How does the network figure out whether the cue matches the target? How does the network avoid distractor repetitions? Additional interesting phenomena have been observed in more recent experiments on repetition detection, including better performance with novel patterns than with learned patterns.
I will present a parsimonious network model of binary neurons and binary synapses and show how all these phenomena can be handled within this framework, using simple adjustments of certain global parameters such as inhibition, noise level and depression rate. Recent experiments also show that average responses to novel stimuli in IT are higher than to learned stimuli. I will show what adjustment needs to be made to accommodate this phenomenon. Time permitting I will discuss how these issues can be reconciled in a two layer network that can also perform classification on real data.
March 2, 2016
121pm, 182 George St, Room 110
The shape space defined by the GromovWasserstein distance
Facundo Memoli
Assistant Professor
Department of Mathematics and Department of Computer Science and Engineering The Ohio State University
The GromovWasserstein distance a variant of the GromovHausdorff distance based on ideas from mass transport provides an intrinsic metric on the collection of all metric measure spaces. I will give an overview of its construction, main properties, lower bounds, and their computation.
February 17, 2016
121pm, 182 George St, Room 110
New applications and algorithms for submodular probabilistic models
Stefanie Jegelka
Assistant Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Many realworld inference problems are, at their core, subset selection problems. Probabilistic models for such scenarios rely on having sufficiently accurate yet tractable distributions over discrete sets. We focus on subfamilies of such distributions whose special mathematical properties are the basis for fast algorithms. As a specific example, Determinantal Point Processes (DPPs) have recently become popular in machine learning, as elegant and tractable probabilistic models of diversity. We explore new applications of DPPs for variational inference over combinatorial objects, such as coupled cascades in a collection of networks, where we are able to leverage combinatorial and convex structure in the problem.
While sampling from DPPs is possible in polynomial time, the associated algorithm is not practical for large data. In the second part of the talk, I will outline ideas for faster sampling that build on new insights for algorithms that compute bilinear inverse forms. These results have applications beyond DPPs, including sensing with Gaussian Processes and submodular maximization.
This is joint work with Chengtao Li, Josip Djolonga, Suvrit Sra and Andreas Krause.
February 3, 2016
121pm, 182 George St, Room 110
Constructing representations using Bayesian nonparametrics and connections between human knowledge, optimal foraging, and random walks on graphs
Joseph Austerweil
Assistant Professor
Department of Cognitive, Linguistic, and Psychological Sciences
Brown University
This talk has two parts (how much detail we go into each will be guided by audience interest).
In the first part of the talk, I outline a statistical framework for constructing representations in a humanlike manner. People still outperform computational methods on a diverse set of problems, ranging from language learning to recognizing objects in a scene. To understand human success on these problems, cognitive scientists appeal to representation as a key explanatory device. From a statistical perspective, representations can be interpreted as structure within a probability distribution, which provide biases to prevent overfitting. Puzzlingly, people are remarkably flexible, changing their representation of the same data depending a number of different factors. I will describe a statistical framework for this intriguing phenomenon, which uses Bayesian nonparametric processes.
In the second part of the talk, I describe recent work relating how optimal animals forage for food in a patchy environment to how people search semantic memory. I will discuss how representing semantic memory as a graph enables random walks to mimic optimal foraging. I will conclude with some recent (and ongoing) work that exploits this relation to perform Bayesian inference for a graph from the first hitting times of multiple walks on that graph.
November 18, 2015
121pm, 182 George St, Room 110
Big data, Google and disease detection: the statistical story
Samuel Kou
Professor
Department of Statistics
Harvard University
Big data collected from the internet have generated significant interest in not only the academic community but also industry and government agencies. They bring great potential in tracking and predicting massive social activities. We focus on tracking disease epidemics in this talk. We will discuss the applications, in particular Google Flu Trends, some of the fallacy and the statistical implications. We will propose a new model that utilizes publicly available online data to estimate disease epidemics. Our model outperforms all previous realtime tracking models for influenza epidemics at the national level of the US. We will also draw some lessons for big data applications.
October 28, 2015
121pm, 182 George St, Room 110
A stochastic model for Gompertzian growth
Benar Svaiter
Professor
IMPA, Brazil
The Gompertz curve has been successfully used for modeling mortality of an aging population since its creation in the 1820's. In the 1960's a time reversed version of this curve was used by Laird to model tumoral and somatic growth. Since them, this is the standard model for tumoral and somatic growth and has been used also for modeling regeneration. Why is it that a model for mortality has been so apt to model growth? We propose a stochastic model for cell divisions based on recent findings on DNA structure and show that Gompertzian growth is the thermodynamic limit of this model after a suitable renormalization.
October 14, 2015
121pm, 182 George St, Room 110
Importance sampling in largescale machine learning problems: why it works and how it can help
Rachel Ward
Assistant Professor
Department of Mathematics
University of Texas at Austin
A recent trend in signal processing and machine learning research is that exact reconstruction is achievable from highly subsampled data by passing to nonlinear, sparsityinducing, reconstruction methods such as l1 minimization. Such guarantees often require strong structural conditions on the data in addition to sparsity, such as incoherence, which render the theory unusable on problems of practical importance. Here, we show that many of these strong assumptions are tied to i.i.d uniform sampling, and can be dropped by allowing weighted, or importance sampling. First, we explain why importance sampling works in this context: it aims to make the inverse problem as wellconditioned as possible given a fixed sample budget. We then discuss several problem domains where importance sampling strategies can be derived explicitly, and outperform stateoftheart sampling strategies used in practice: medical imaging, collaborative filtering, uncertainty quantification, and stochastic gradient methods. Along the way, we derive results at the intersection of applied harmonic analysis and random matrix theory that are of independent interest.
October 7, 2015
121pm, 182 George St, Room 110
Local Shape from Shading with a Generic Constraint
Ben Kunsberg
Prager Assistant Professor
Division of Applied Mathematics
Brown University
Humans have a remarkable ability to infer shape from shading (SFS) information. In computer vision this is often formulated with a Lambertian reflectance function, but it remains underposed and incompletely solved. Abstractly, the intensity in an image is a single valued function and the goal is to uncover the vector valued normal function. This illposedness has resulted in many proposed techniques that are either regularizations or propagations from known values. Our goal is to understand, mathematically and computationally, how we solve this problem.
First, it has been shown psychophysically that our perception (via gauge figure estimates) is remarkably accurate even when the boundary is masked. Thus classical propagating approaches requiring a known values along a boundary, such as that of characteristic curves or fast marching methods, are unlikely to model the visual system's solution.
An alternative approach requires regularization priors (in a Bayesian framework) or energy terms (in a variational framework). However, many of the proposed priors are adhoc and chosen by researchers to optimize performance for a particular test dataset. It is hard to conclude (from solely performance metrics) whether these priors are useful or accurate, e.g. good results are functions of these priors, resolution, the optimization techniques, the test set, and so on.
In this talk, we describe a different approach. We consider the SFS problem on image patches modeled as Taylor polynomials of any order and seek to recover a solution for that patch. We build a bootstrapping tensor framework that allows us to relate a smooth image patch to all of the polynomial surface solutions (under any light source). We then use a generic constraint on the light source to restrict these solutions to a 2D subspace, plus an unknown rotation matrix. We then investigate several special cases where the ambiguity reduces and the solution can be anchored. Interestingly, these anchor solutions relate to those situations in which human performance is also veridical.
September 23, 2015
121pm, 182 George St, Room 110
Continuum limit of total variation on point clouds
Nicolas GarciaTrillos
Prager Assistant Professor
Division of Applied Mathematics
Brown University
We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. We study when is the cut capacity, and more generally total variation, on these graphs a good approximation of the perimeter (total variation) in the continuum setting. We address this question in the setting of $
September 9, 2015
121pm, 182 George St, Room 110
Random matrix theory based analysis of the correlation structure of protein sequences
Lucy Colwell
Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK
Extracting interesting biological information from protein sequences is a grand challenge, with the pace of sequencing increasing by the day. There are many problems in biology where correlations between protein sequences could lead to important information and new tools; my research program aims to find new ways of using these correlations to make biological discoveries. Recently it has been shown by us and others that the correlation structure of large protein sequence alignments contains sufficient information to accurately predict protein 3D structure using approximate graphical modeling and probabilistic inference. The methodology requires inferring a large number of parameter values from a dataset that is highly undersampled, including samples with varying degrees of correlation to one another, violating the assumptions made by the inference methods. For this approach to be useful more generally  both for protein structure prediction and for solving other biological problems  it is essential that we understand how prediction quality depends on the input data available for the protein family of interest. This is because those families for which no homologous crystal structure has been solved typically contain few sequences.
In this talk we first briefly compare the different inference procedures on both exact data simulated from models, and protein sequence data. We then identify a phenomena for protein sequence data in which iterating the model inference procedure allows the inferred model to converge to a more accurate solution, and show how this can be used to accurately predict interacting protein pairs from sequence data alone. This convergence suggests that there is scope to improve the accuracy of model inference, for example by 'cleaning' the sample correlation matrix. The standard random matrix theory approach identifies those few eigenvalues associated with a low rank signal by comparing the spectrum of the sample covariance matrix to the MarcenkoPastur distribution. In the context of protein sequence data the extensive correlations between samples mean that this approach will not work, but if these correlations are known (via a phylogeny) I will show how the Marcenko Pastur equation can be solved to yield the expected empirical spectral distribution caused by phylogeny alone.
May 6, 2015
121pm, 182 George St, Room 110
Two Recent Information Theoretic Variations on the Theme of Patterns in Security
Muriel Medard
Cecil H. Green Professor
Electrical Engineering and Computer Science Department
Massachusetts Institute of Technology
We overview two different sets of results based upon the effect of patterns in security. In the first part, we consider limits of inference, a problem that emerges when we seek to ascertain what limits to privacy we can expect when machine learning algorithms, whose theoretical basis often relies on principal inertia components, are applied to mining publicly available data that may be related, in loosely known ways, to private data. Lower bounds for the average probability of error of estimating a hidden variable X given an observation of a correlated random variable Y, and Fano’s inequality in particular, play a central role in information theory. We present a lower bound for the average estimation error based on the marginal distribution of X and the principal inertias of the joint distribution matrix of X and Y, providing thus limits to privacy. Furthermore, we investigate how to answer a fundamental question in inference and privacy: given an observation Y, can we estimate a function f(X) of the hidden random variable X with an average error below a certain threshold? We provide a general method for answering this question using an approach based on ratedistortion theory.
In the second part, we consider recent results on guesswork, the characterization of the process sequences such as passwords. We note that, what may appear as being even slight differences in distributions of these sequences may lead to differences that are exponential in guesswork, leading to possibly surprising results, such as the failure of the oftassumed uniformity of compressed sources, and the fact that friendly jamming of an intended user may be advantageous. We conclude with our recently defined notion of inscrutability rate, used to quantify the asymptotic difficulty of guessing U out of V secret strings. Unexpectedly, the inscrutability rate of any finiteorder Markov stringsource with hidden statistics remains the same as the unhidden case, i.e., the asymptotic value of hiding the statistics per each symbol is vanishing.
Joint work with Ahmad Beirami, Robert Calderbank, Mark Christiansen, Ken Duffy, Flavio du Pin Calmon, Stefano Tessaro, Mayank Varia
April 1, 2015
121pm, 182 George St, Room 110
Adaptive Bayesian Estimation of Conditional Densities
Andriy Norets
Associate Professor
Department of Economics
Brown University
Nonparametric estimation of conditional distributions is important in empirical work across many fields. The Bayesian approach to this problem has several attractive properties. It does not require fixing a bandwidth or similar tuning parameters. Instead, it provides estimates of the objects of interest where the tuning parameters are averaged out with respect to their posterior distribution. Also, the Bayesian approach performs well in outofsample prediction and Monte Carlo exercises. In this talk, I will discuss theoretical properties of Bayesian nonparametric models and provide an explanation for their excellent performance in applications.
I will focus on mixtures of Gaussian densities with covariate dependent mixing weights and a variable number of mixture components for which a prior on positive integers is specified. Conditional on the number of mixture components, the mixing weights are modelled by a multinomial logit with a common scale parameter. This model is closely related to mixtureofexperts also known as smooth mixtures in econometrics. The main theoretical result of the talk is that the posterior in this model contracts at a minimax rate up to a logarithmic factor. The assumed prior distribution does not depend on the smoothness level of the true conditional density. Thus, the obtained posterior contraction rate is adaptive across all smoothness levels.
I will also briefly discuss some applications and estimation methods for nonparametric Bayesian modelling of conditional densities.
Background papers:
1. "Adaptive Bayesian Estimation of Conditional Densities" by Norets and Pati,
http://arxiv.org/pdf/1408.5355.pdf
2. "Posterior Consistency in Conditional Density Estimation by Covariate Dependent Mixtures" by Norets and Pelenis, Econometric Theory, 2014
http://www.econ.brown.edu/fac/Andriy_Norets/papers/consmixreg.pdf
March 18, 2015
121pm, 182 George St, Room 110
Using evolutionary sequence variation to make inferences about protein structure and function
Lucy Colwell
Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using the natural variation present in homologous protein sequences to infer these constraints and thus identify residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the correlation structure of the data.
The challenge is to distinguish true interactions from the noisy and undersampled set of observed correlations in a large multiple sequence alignment. To address this we build a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair interactions. We translate these interactions into pairwise distance constraints between amino acids and use them to generate all atom structural models. Using proteins of known structure we show that correlations between amino acids at different sites in a protein contain sufficient information to predict low resolution tertiary protein structure of both globular and transmembrane proteins. We then apply our method to predict de novo the structure of 11 medically important transmembrane proteins of unknown structure. In addition we are able to predict protein quaternary structure and alternative conformations. The next step requires development of a theoretical inference framework that enables the relationship between the amount of available input data and the reliability of structural predictions to be better understood.
March 4, 2015
121pm, 182 George St, Room 110
From Pixels to Local Layers: Exploring Flexible Representations for Motion Estimation
Deqing Sun
Postdoctoral Research Fellow in Computer Science
Harvard University
We live in a dynamic world where motion is ubiquitous. To exist in such an environment, robots and other intelligent agents should have the ability to perceive and understand motion. Estimating image motion and segmenting the scenes into coherently moving regions are two closely related problems but are often treated separately. Motion actually provides an important cue to identify surfaces in a scene, while segmentation may provide the proper support for motion estimation. Despite decades of research efforts, current methods still tend to produce large errors near motion boundaries and in occlusion regions and falsely merge foreground objects with the background.
I will start from a probabilistic layered model for joint motion estimation and segmentation. We order each moving object (layer) in depth and explicitly model the occlusions between layers. We model the segmentation using thresholded spatiotemporally coherent support functions and the motion using globally coherently but locally flexible priors. Our model enforces that scene structures (segmentation), instead of motion, should persist over time. Our method achieves promising results on both the Middlebury optical flow benchmark and the MIT layer segmentation dataset, particularly in occlusion regions.
Noting that "global" layered models cannot capture mutual or selfocclusions or deal with too many layers, I will introduce a local layering representation that breaks the scenes into local layers and jointly models the motion and occlusion relationship between local layers. By retaining uncertainty on both the motion and the occlusion relationship, we can avoid local minima common to motiononly or occlusiononly approaches. Our method can handle motion and occlusion well for both challenging synthetic and real sequences.
Finally, I will show an application of motion estimation to interactive intrinsic video editing. We introduce a fast and temporally consistent algorithm to decompose video sequences into their reflectance and illumination components. One key observation is that reflectance is an intrinsic property of physical surfaces and tend to persist over time, while lighting may vary. The temporally consistent decomposition results allow illuminationaware video editing, such as retexturing and lightingaware compositing.
Joint work with Michael J. Black, Erik B. Sudderth, Ce Liu, Hanspeter Pfister, Nicolas Bonneel, Kalyan Sunkavalli, James Tompkin, and Sylvain Paris
February 25, 2015
121pm, 182 George St, Room 110
Scaling and Generalizing Variational Inference
David Blei
Professor of Statistics and Computer Science
Columbia University
Latent variable models have become a key tool for the modern statistician, letting us express complex assumptions about the hidden structures that underlie our data. Latent variable models have been successfully applied in numerous fields including natural language processing, computer vision, electronic medical records, genetics, neuroscience, astronomy, political science, sociology, the digital humanities, and many others.
The central computational problem in latent variable modeling is posterior inference, the problem of approximating the conditional distribution of the latent variables given the observations. Posterior inference is central to both exploratory tasks, where we investigate hidden structures that underlie our data, and predictive tasks, where we use the inferred structures to generalize about future data. Approximate posterior inference algorithms have revolutionized Bayesian statistics, revealing its potential as a usable and generalpurpose language for data analysis.
Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing approximate inference algorithms do not scale well. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand. This often requires significant modelspecific analysis, which precludes us from easily exploring a variety of models.
In this talk I will discuss our recent research on addressing these two limitations. First I will describe stochastic variational inference, an approximate inference algorithm for handling massive data sets. Stochastic inference is easily applied to a large class of Bayesian models, including timeseries models, factor models, and Bayesian nonparametric models. I will demonstrate its application to probabilistic topic models of text conditioned on millions of articles. Stochastic inference opens the door to scalable Bayesian computation for modern data analysis.
Then I will discuss black box variational inference. Black box inference is a generic algorithm for approximating the posterior. We can easily apply it to many models with little modelspecific derivation and few restrictions on their properties. Black box inference performs better than similarly generic sampling algorithms, such as MetropolisHastings inside Gibbs, and can be composed with stochastic inference to handle massive data. I will demonstrate its use on a suite of nonconjugate models of longitudinal healthcare data.
This is joint work based on these two papers:
M. Hoffman, D. Blei, J. Paisley, and C. Wang. Stochastic variational inference. Journal of Machine Learning Research, 14:13031347.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf
R. Ranganath, S. Gerrish, and D. Blei. Black box variational inference. Artificial Intelligence and Statistics, 2014.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf
February 4, 2015
121pm, 182 George St, Room 110
Computational reconstruction and modeling of multicellular dynamics from 3D+time in vivo imaging of animal early embryogenesis. Extension to Artificial Life.
Rene Doursat
Research Scientist and Lecturer
(a) Complex Systems Institute, Paris IledeFrance (ISCPIF), Paris, France
(b) BioEmergences Lab, CNRS (USR 3695), GifsurYvette (Paris), France
(c) Erasmus Mundus Master's Program in Complex Systems Science, Ecole Polytechnique, Palaiseau (Paris), France
The BioEmergences platform [1,2], designed at Nadine Peyrieras's lab (b), provides automated analysis and reconstruction of collective cell movements based on timelapse microscopy of organism development. It offers biologists advanced software tools capable of handling large amounts of 4D data through a workflow of image processing algorithms. Raw voxelbased movies of embryos (zebrafish, sea urchin), containing fluorescent proteins to highlight cell membranes and nuclei, are filtered by an edgepreserving smoothing method, then cell positions are extracted from the local maxima of these images and passed on to shape segmentation and celltracking modules. The output is a cell lineage tree annotated with various quantitative measurements. In parallel to this data reconstruction effort, the MecaGen platform [3], directed by myself, supports agentbased modeling and simulation of morphogenesis. Centered on the physicochemical coupling of cell mechanics with gene expression and molecular signaling, embryonic development is viewed here a selforganized phenomenon emerging from a myriad of cells via their genetically regulated and regulating behavior. Cells' mechanical properties (division, adhesion, motility) are closely correlated with their spatial location and temporal state of genetic and molecular dynamics (protein and ligand concentrations), and affect each other concurrently. MecaGen is illustrated on morphogenetic episodes occuring during zebrafish and sea urchin early development. Exploration of parameter space is supported by experimental data from BioEmergences, which allows measuring the "fitness" of the virtual embryo and validating hypotheses. Finally, I will also briefly introduce Morphogenetic Engineering [4], an "Artificial Life" spinoff field that I founded, which takes its inspiration from biological development to create programmable and reproducible robotic, software or network architectures by decentralized selfassembly of elementary agents [5].
Selected References
[1] http://www2.die.upm.es/im/papers/Science2010.pdf
[2] http://doursat.free.fr/docs/Castro_et_al_2014_atlas_PLoS.pdf
[3] http://doursat.free.fr/docs/Delile_Doursat_Peyrieras_2013_MecaGen_chapter.pdf
[4] http://doursat.free.fr/docs/Doursat_Sayama_Michel_2013_MorphEng_NACO.pdf
[5] http://doursat.free.fr/docs/Doursat_Sanchez_2014_mapdevo_SoRo.pdf .
December 3, 2014
121pm, 182 George St, Room 110
Hypothesisguided dimensionality reduction and its application to largescale neuroscience
John Cunningham
Assistant Professor
Department of Statistics
Columbia University
Like many fields, neuroscience is experiencing dramatic increases in the quantity and complexity of recorded data, with the goal of similarly dramatic scientific and technological advances. Critical to realizing these ambitions are the analysis methods to interrogate large datasets jointly. Many questions in the field reduce to a question of "hypothesisguided" dimensionality reduction: given a large number of noisy timeseries, what shared lowdimensional process best describes the data in terms of that hypothesis? I will begin by briefly describing a few applications of this dimensionality reduction in neuroscience. I will then introduce a generic, hypothesisguided linear dimensionality reduction solver using an optimization program over the Stiefel and Grassman manifolds. To demonstrate the utility of this solver, I will apply it to neural recordings from the motor cortex. Here we show that the seemingly hopeless complexity of neural responses can be simply described by a lowdimensional dynamical system at the level of the neural population. Finally, I will describe a critical statistical test of the significance of this hypothesis, verifying that the inferred structure is not a trivial consequence of highdimensional data.
http://stat.columbia.edu/~cunningham/
November 20, 2014 (Thursday)
45pm, Wilson Hall Room 102
Joint Pattern Theory / LCDS Seminar
Geometric graphbased methods for high dimensional data
Andrea Bertozzi
Professor
Department of Mathematics
UCLA
We present new methods for segmentation of large datasets with graph based structure. The method combines ideas from classical nonlinear PDEbased image segmentation with fast and accessible linear algebra methods for computing information about the spectrum of the graph Laplacian. The goal of the algorithms is to solve semisupervised and unsupervised graph cut optimization problems. I will present results for image processing applications such as image labeling and hyperspectral video segmentation, and results from machine learning and community detection in social networks, including modularity optimization posed as a graph total variation minimization problem.
November 12, 2014
121pm, 182 George St, Room 110
Multiscale models for shapes and images
Pedro Felzenszwalb
Associate Professor
School of Engineering and Department of Computer Science
Brown University
Markov models are widely used in a variety of applications in many different fields. In computer vision Markov models are commonly used as priors for shapes and images. Markov models capture regularities of objects such as sequences or images by modeling their local properties. This often leads to tractable learning and inference problems. However it is clear that in many applications realistic priors need to capture nontrivial longrange correlations and highorder properties of objects. Such models can be difficult to estimate and often lead to intractable inference problems. In this talk I will discuss an approach for defining models using a multiscale representation of an object. The idea involves modeling local properties of an object at multiple resolutions. This leads to a natural lowdimensional parameterization for highorder models. The resulting models also have structure that can lead to efficient inference algorithms. I will discuss two instances of the approach in the context of computer vision. One instance involves modeling curves for recognizing shapes and for detecting shapes in images. Another instance involves modeling images to recover hidden structure from noisy measurements.
November 5, 2014
121pm, 182 George St, Room 110
CrossStudy Validation versus Randomized CrossValidation as Data Accumulate
LoBin Chang
Visiting Assistant Professor
Department of Applied Mathematics and Statistics & The Center for Imaging Science
Johns Hopkins University
In recent years "reproducibility" has emerged as a key factor in evaluating predictors of disease phenotypes. In particular, "validation" is undermined when error rates on data collected from new studies exceed those originally reported, which is hardly surprising for a heterogeneous population. In this talk, I will provide a statistical formulation in the large sample limit: subpopulations are modeled as components of a mixture and all error rates are optimal (Bayes) for a twoclass problem. For any number m of studies, the error rate in crossstudy validation exceeds that in ordinary randomized crossvalidation, the latter (averaged) increases with m, and both converge to the optimal rate.
October 22, 2014
121pm, 182 George St, Room 110
Algorithms for Interpretable Machine Learning
Cynthia Rudin
Associate Professor of Statistics
Sloan School of Management and
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
Possibly *the* most important obstacle in the deployment of predictive models is the fact that humans simply do not trust them. If it is known exactly which variables were important for the prediction and how they were combined, this information can be very powerful in helping to convince people to believe (or not believe) the prediction and make the right decision. In this talk I will discuss algorithms for making these nonblack box predictions including:
1) "Bayesian Rule Lists"  This algorithm builds a decision list using a probabilistic model over permutations of IFTHEN rules. It competes with the CART algorithm.
2) "Supersparse Linear Integer Models"  This algorithm produces scoring systems, which are a type of model that is widely used in the medical community. It proposes an alternative to the Lasso method.
I will show applications to healthcare, including an alternative to the CHADS_2 score, which is one of the most widely used scoring systems in medicine. Our model was trained on over 1000 times the amount of data as CHADS_2 and is more accurate, but is just as interpretable. I will also show preliminary joint work with the MGH Sleep Lab on diagnosing forms of sleep apnea.
At the end of the talk, I will discuss random cool applications of machine learning being worked on in the Prediction Analysis Lab.
links:
Building Interpretable Classifiers with Rules using Bayesian Analysis
http://web.mit.edu/rudin/www/LethamRuMcMa14.pdf
This is joint work with my student Ben Letham and colleagues Tyler McCormick and David Madigan.
Method and Models for Interpretable Linear Classification
http://arxiv.org/abs/1405.4047
This is joint work with my student Berk Ustun.
Other links are on my webpage under "Interpretable Predictive Models"
http://web.mit.edu/rudin/www/PredicticsPapers.html
October 8, 2014
121pm, 182 George St, Room 110
A Simple PatternTheoretic Representation
Oren Freifeld
Postdoctoral Associate
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
Reasoning over realworld signals consists of three conceptual steps: representation, modeling and inference. Here, we focus on a specific choice of representation, but one that is strongly motivated by modeling and inference considerations.
The transformational approach, a cornerstone in Ulf Grenander's Pattern Theory, replaces the representation of objects with the representation of transformations acting on the objects. Particularly, diffeomorphisms, a special class of transformations, often play a pivotal role. Typically, the choice of a diffeomorphism space involves a tradeoff: such spaces range from ones that are simple (that is a good thing) but of limited expressiveness, to those that are highlyexpressive (i.e., are able to capture a wide range of patterns) but are complicated and computationallydemanding. Thus, despite their mathematical elegance and sophistication, the applicability of the latter has been somewhat limited, especially for large data sets or when time is of the essence. Moreover, owing to their complexity, the application of modern approaches to modeling (e.g., Bayesian nonparametric models) and inference (e.g., MCMC) to highlyexpressive spaces of diffeomorphisms presents significant mathematical and algorithmic challenges. This is typified by the case of analysisbysynthesis methods, wherein multiple evaluations of likelihood functions (or cost functions in the deterministic setting) rely on computing multiple transformations.
In this talk I will present our recent, workinprogress, efforts to facilitate tractable modeling and inference for such spaces. We propose a novel space of transformations which is not only highlyexpressive but also (embarrassingly) simple. Moreover, it supports highlyaccurate and fast computations of transformations thus enabling rapid likelihood evaluations. Finally, the representation also lends itself to multiscale modeling and easy constructions and evaluations of priors. While the representation has certain interesting differentialgeometric aspects, the talk will focus on the practical issues of implementation (requiring only basic ideas from linear algebra and firstorder differential equations) and the implications for modeling and inference. Preliminary results demonstrate the applicability of the proposed approach in computervision tasks such as image warping or registration, and machinelearning tasks such as monotonic regression, timewarp analysis, and modeling CDF or histogramvalued data.
This is joint work with Soren Hauberg, Julian Straub, Kayhan Batmanghelich, and John Fisher.
October 1, 2014
121pm, 182 George St, Room 110
Statistical mechanics for real biological networks
William Bialek
Professor
Department of Physics
Princeton University
It is an old dream that ideas from statistical physics could help us understand the phenomena that emerge from biological networks, be they networks of genes, networks of neurons, or networks of organisms. In recent years, it has become possible to make increasingly accurate, simultaneous measurements on the states of (almost) all the nodes in such networks. I'll discuss the efforts that my colleagues and I are making to connect these data to statistical physics models. The key idea is the (quite old) maximum entropy principle: we try to build models that are consistent with some measured properties of the network (e.g., the correlations among the states of pairs of elements) but otherwise have as little structure as possible. I will use the example of a flock of birds to explain how this works, and to explain our surprise that it works so well. Statistical mechanics teaches us that, as systems become large, the parameter space breaks up into phases, and this also is true for families of maximum entropy models. Thus, we can ask where real networks are in the phase diagram of possible networks. For the flock, we'll see that the system is poised very close to a critical surface. We can go though a similar (but much more complex) analysis for a network of neurons in the vertebrate retina, and surprisingly we find the same answer  the system seems to be close to criticality, and we can detect hints of criticality in other systems as well. It seems that we are either seeing signs of something general, or we are fooling ourselves, and I'll outline a path to telling the difference.
April 30, 2014
121pm, 182 George St, Room 110
Division of Applied Mathematics / Center for Vision Research Seminar
Computational Anatomy, High Throughput NeuroImaging Informatics, and the BrainCloud
Michael Miller
Professor
Biomedical Engineering and Electrical and Computer Engineering
Johns Hopkins University
Brain parsing is a core technology in Computational Anatomy supporting high throughput indexing of brains. This is allowing us, with the Susumu Mori laboratory, to build several brain clouds in which brains can be searched, retrieved and machine learning can be performed. We will discuss the notions of brain parsing and "geodesic positioning" central to Computational Anatomy, allowing for the understanding of complex medical imagery in terms of the neuroscience ontology at 1mm scale.
The second half of the talk will focus on our medical findings in two of the several neurodegenerative diseases being studied worldwide, dementia of the Alzheimer’s type, and Huntington’s disease. We will describe our results in preclinical dementia in the BIOCARD project led by Marilyn Albert, attempting to characterize the onset and trajectory of biomarker change in temporal lobe structures  amygdala, hippocampus, and entorhinal cortex – prior to conversion to clinical Alzheimer’s symptomology. Secondly, we will discuss results in Huntington’s disease in the PREDICTHD project, led by University of Iowa, examining motor and subcortical structure change as a function of genetic load.
April 23, 2014
121pm, 182 George St, Room 110
Climate Informatics: Recent Advances and Challenge Problems for Machine Learning in Climate Science
Claire Monteleoni
Assistant Professor
Department of Computer Science
George Washington University
The threat of climate change is one of the greatest challenges currently facing society. Given the profound impact machine learning has made on the natural sciences to which it has been applied, such as the field of bioinformatics, machine learning is poised to accelerate discovery in climate science. Our recent progress on climate informatics reveals that collaborations with climate scientists also open interesting new problems for machine learning. I will give an overview of challenge problems in climate informatics, and present recent work from my research group in this nascent field.
A key problem in climate science is how to combine the predictions of the multimodel ensemble of global climate models that inform the Intergovernmental Panel on Climate Change (IPCC). I will present three approaches to this problem. Our Tracking Climate Models (TCM) work demonstrated the promise of an algorithm for online learning with expert advice, for this task. Given temperature predictions from 20 IPCC global climate models, and over 100 years of historical temperature data, TCM generated predictions that tracked the changing sequence of which model currently predicts best. On historical data, at both annual and monthly timescales, and in future simulations, TCM consistently outperformed the average over climate models, the existing benchmark in climate science, at both global and continental scales. We then extended TCM to take into account climate model predictions at higher spatial resolutions, and to model geospatial neighborhood influence between regions. Our second algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model from which TCM is derived, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best climate model at a given location. We recently applied a third technique, sparse matrix completion, in which we create a sparse (incomplete) matrix from climate model predictions and observed temperature data, and apply a matrix completion algorithm to recover it, yielding predictions of the unobserved temperatures.
April 16, 2014
121pm, 182 George St, Room 110
The Blended Paradigm: A Bayesian approach to handling outliers and misspecified models
Steven MacEachern
Professor
Department of Statistics
Ohio State University
Bayesian methods have proven themselves to be enormously successful across a wide range of scientific problems, with analyses ranging from the simple onesample problem to complicated hierarchical models. They have many welldocumented advantages over competing methods. However, Bayesian methods run into difficulties for two major and prevalent classes of problemshandling data sets with outliers and dealing with model misspecification. In both cases, standard Bayesian analyses fall prey to the hubris that is an integral part of the Bayesian paradigm. The large sample behavior of the analysis is driven by the likelihood. We propose the use of restricted likelihood as a single solution to both of these problems. When working with restricted likelihood, we summarize the data, x, through a set of (insufficient) statistics T(x) and update our prior distribution with the likelihood of T(x) rather than the likelihood of x. By choice of T(x), we retain the main benefits of Bayesian methods while reducing the sensitivity of the analysis to selected features of the data. The talk will motivate the blended paradigm, discuss properties of the method and choice of T(x), cover the main computational strategies for its implementation, and illustrate its benefits.
April 9, 2014
121pm, 182 George St, Room 110
Division of Applied Mathematics / Center for Vision Research Seminar
The computational magic of the ventral stream: a theory (and why some deep architectures work)
Tomaso Poggio
Professor
NSF Center for Brains, Minds and Machines
McGovern Institute
Computer Science and Artificial Intelligence Laboratory
Brain Sciences Department
Massachusetts Institute of Technology
Following on an idea of Stu Geman, I believe that the present phase of Machine Learning is characterized by supervised learning algorithms relying on large sets of labeled examples (n → ∞). The next phase is likely to focus on algorithms capable of learning from very few labeled examples (n → 1), like humans seem able to do. I will introduce a new approach to this problem describing the underlying theory, based on the unsupervised, automatic learning of a "good" representation for supervised learning, characterized by small sample complexity (n).
We consider the case of visual object recognition, though the theory applies to other domains. The starting point is the conjecture, that we can prove in specific cases, that image representations which are invariant to translation, scaling and other transformations can considerably reduce the sample complexity of learning. We prove that an invariant and unique (discriminative) signature can be computed for each image patch. A module performing filtering and pooling, like the simple and complex cells described by Hubel and Wiesel, can compute such estimates. The theory extends existing deep learning convolutional architectures for image and speech recognition. It also suggests that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and discriminative for recognition — and that this representation may be continuously learned in an unsupervised way during development and visual experience.
April 2, 2014
121pm, 182 George St, Room 110
Can connectomics help us understand neural computation? Insights from the fly visual system
Dmitri Chklovskii
Group Leader
Janelia Farm Research Campus
Animal behavior arises from computations in neuronal circuits, but our understanding of these computations has been frustrated by the lack of detailed synaptic connection maps, or connectomes. For example, despite intensive investigations over half a century, the neuronal implementation of local motion detection in the insect visual system remains elusive. We developed a semiautomated pipeline using electron microscopy to reconstruct a connectome, containing 379 neurons and 8,637 chemical synaptic contacts, within the Drosophila optic medulla. By matching reconstructed neurons to examples from light microscopy, we assigned neurons to cell types and assembled a connectome of the repeating module of the medulla. Within this module, we identified cell types constituting a motion detection circuit, and showed that the connections onto individual motionsensitive neurons in this circuit were consistent with their direction selectivity. Our identification of cell types involved in motion detection allowed targeting of extremely demanding electrophysiological recordings by other labs. Preliminary results from such recordings show time delays confirming our findings. This demonstrates that connectomes can provide key insights into neuronal computations.
March 12, 2014
121pm, 182 George St, Room 110
Robust inference on parameters via particle filters and sandwich covariance matrices
Neil Shephard
Professor
Departments of Economics and Statistics
Harvard University
Likelihood based estimation of the parameters of state space models can be carried out via a particle filter. In this paper we show how to make valid inference on such parameters when the model is incorrect. In particular we develop a simulation strategy for computing sandwich covariance matrices which can be used for asymptotic likelihood based inference. These methods are illustrated on some simulated data.
February 26, 2014
121pm, 182 George St, Room 110
Productivity and Reuse in Language: Nonparametric Bayesian Models of Lexical Acquisition
Timothy O'Donnell
Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology
A muchcelebrated aspect of language is the way in which it allows us to express and comprehend an unbounded number of thoughts. This property is made possible because language consists of several combinatorial systems which can productively build novel forms using a large inventory of stored, reusable parts: the lexicon.
For any given language, however, there are many potentially storable units of structure, each giving rise to many more ways of forming novel expressions than are actually used in practice. For example, English contains suffixes which are highly productive and generalizable (e.g., ness; LadyGagaesqueness, pinescentedness) and suffixes which can only be reused in specific words, and cannot be generalized (e.g., th; truth, width, warmth). How are such differences in generalizability and reusability represented? What are the basic, stored building blocks at each level of linguistic structure? When is productive computation licensed and when is it not? How can the child acquire these systems of knowledge?
I will discuss several mathematical models of productivity (computation) and reuse (storage), at different levels of linguistic structure. These models all treat the the problem as a tradeoff between storage of frequent (sub)structures and the need to productively generalize to novel cases. The computation/storage tradeoff provides an illuminating new perspective on standard Bayesian tradeoff between fit to data and simplicity/generalizability of hypotheses. This tradeoff is formalized in these models using tools from Bayesian nonparametric statistics.
The distinction between rules and forms that generalize and those that don't gives rise to several deep questions about the nature of the linguistic system. While the tradeoff between productivity and reuse is consistent across different levels of linguistic structure, and can be modeled using Bayesian nonparametrics, linguisticallyinformed representations at each level are crucial for correctly capturing the empirical patterns in the data. Working with such rich, structured representations leads to significant inference challenges, which I will also briefly discuss.
February 12, 2014
121pm, 182 George St, Room 110
Challenging issues in Likelihood Inference on Mixture Models
Daeyoung Kim
Assistant Professor
Department of Mathematics and Statistics
University of Massachusetts Amherst
Modern computing power and algorithms have greatly increased interest in mixture models as an effective tool for modeling heterogeneity in data. Statistical inference for finite mixture models can be done by the method of maximum likelihood (ML) which provides, in a single package, methods for point estimation, hypothesis testing and construction of confidence sets. Despite attractive features of finite mixture models, there are several challenges for likelihood based inference. In this talk we will address two problems: empirical identifiability on the mixture parameters and multiple local maximizers in the mixture likelihood.
December 4, 2013
121pm, 182 George St, Room 110
Big data in neuroscience: Where is the information?
Joachim Buhmann
Professor
Department of Computer Science
ETH Zurich
Neuroscience provides data at very different scales: microscopic imagery (ssTEM) measures information on the structure of neurons, diffusion tensor imaging tracks fibers between brain areas and the hemispheres, fMRI data measures spatial activity patterns to estimate dynamic causal models. At all three length scales, machine learning methods extract complex structures from vast amount of data. Model selection plays an important role in controlling the complexity of the explanation. Furthermore, the amount of supervision information is very scarce since human annotators display a high error rate on these ambiguous data sources. We advocate a maximum entropy approach to interpret such a data source and their model complexity is regularized by an information theoretic model selection principle.
November 20, 2013
121pm, CIT 134
Interactive Demonstrations in the Brown Robotics Lab
Chad Jenkins
Associate Professor
Department of Computer Science
Brown University
This meeting will be a tour of the Brown Robotics Lab in the CIT Building. This tour will consist of interactive demonstrations of the PR2 robot for perception and physical manipulation of tabletop objects using data from visual streams. This demo will be complemented with a description of the underlying computational methods, such as processing of point clouds from RGBD data, object recognition, and adjustable autonomy for humanrobot interaction. Time permitting, demos of other robots, such as the Rethink Baxter, Suitable Beam, and AR.Drone quadrotor helicopter, will occur with a description of their visionrelated technical challenges.
November 6, 2013
121pm, 182 George St, Room 110
Dirichlet process mixture inconsistency for the number of components, and dimension mixture models
Jeffrey Miller
Brown University
Note: The first half will essentially be a practice talk for my NIPS 2013 presentation.
For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of components occurring so far, that is, the posterior on the number of clusters in the observed data. However, it turns out that this posterior is not consistent; it does not converge to the true number of components.
Motivated by this finding, we examine an alternative approach to Bayesian nonparametric models. Many of the commonlyused nonparametric models are infinitedimensional, such as Dirichlet or PitmanYor process mixtures (DPMs/PYMs), Hierarchical Dirichlet processes (HDPs), and Indian buffet processes (IBPs). A less common but very natural type of Bayesian nonparametric model is constructed by taking a family of finitedimensional models and putting a prior on the dimension  that is, taking a mixture of finitedimensional models. Interestingly, this approach gives rise to combinatorial stochastic processes that closely parallel those of DPMs/PYMs, HDPs, and IBPs, and has certain advantages over the infinitedimensional approach.
October 23, 2013
121pm, 182 George St, Room 110
Highdimensional statistics
Sahand Negahban
Assistant Professor
Department of Statistics
Yale University
The focus of my research is to develop theoretically sound methods, which are both computationally and statistically efficient, for extracting information from large datasets. A salient feature of my work has been to understand how hidden lowcomplexity structure in large datasets can be used to develop computationally and statistically efficient methods for extracting meaningful information for highdimensional estimation problems. My work borrows from and improves upon tools of statistical signal processing, machine learning, probability and convex optimization.
October 9, 2013
121pm, 182 George St, Room 110
Model selection in a large compositional space
Roger Grosse
Massachusetts Institute of Technology
We often build complex probabilistic models by "composing" simpler models  using one model to generate the latent variables for another model. This allows us to express complex distributions over the observed data and to share statistical structure between different parts of a model. I'll present a space of matrix decomposition models defined by the composition of a small number of motifs of probabilistic modeling, including clustering, low rank factorizations, and binary latent factor models. This compositional structure can be represented by a contextfree grammar whose production rules correspond to these motifs. By exploiting the structure of this grammar, we can generically and efficiently infer latent components and estimate predictive likelihood for nearly 2500 model structures using a small toolbox of reusable algorithms. Using a greedy search over this grammar, we automatically choose the decomposition structure from raw data by evaluating only a small fraction of all models. The proposed method typically finds the correct structure for synthetic data and backs off gracefully to simpler models under heavy noise. It learns sensible structures for datasets as diverse as image patches, motion capture, 20 Questions, and U.S. Senate votes, all using exactly the same code.
September 11, 2013
121pm, 182 George St, Room 110
Parameter estimation robust to lowfrequency contamination
Adam McCloskey
Assistant Professor
Department of Economics
Brown University
We provide methods to robustly estimate the parameters of stationary ergodic shortmemory time series models in the potential presence of additive lowfrequency contamination. The types of contamination covered include level shifts (changes in mean) and monotone or smooth time trends, both of which have been shown to bias parameter estimates towards regions of persistence in a variety of contexts. The estimators presented here minimize trimmed frequency domain quasimaximum likelihood (FDQML) objective functions without requiring specification of the lowfrequency contaminating component. We provide two approaches, allowing for either thin or heavytailed data. When proper sample sizedependent trimmings are used, the FDQML estimators are consistent and asymptotically normal, asymptotically eliminating the presence of any spurious persistence. These asymptotic results also hold in the absence of additive lowfrequency contamination, enabling the practitioner to robustly estimate model parameters without prior knowledge of whether contamination is present. Popular time series models that fit into the framework of this article include ARMA, stochastic volatility, GARCH and ARCH models. We explore the finite sample properties of the trimmed FDQML estimators of the parameters of some of these models, providing practical guidance on trimming choice. Empirical estimation results suggest that a large portion of the apparent persistence in certain volatility time series may indeed be spurious.
Joint work with Jonathan B. Hill (UNCChapel Hill).
May 8, 2013
121pm, 182 George St, Room 110
Scaleinvariance and metrics on spaces of plane curves
Matt Feiszli
The study of metrics and distances on spaces of curves and surfaces is an active area of research with applications in vision and imaging. We will present metrics on spaces of plane curves using techniques from harmonic, complex, and geometric analysis. Their behavior can be best understood through notion of scale, and we will discuss geometric ways to understand their multiscale behavior. We will discuss their application to problems in shape deformation and shape matching, and demonstrate scaleinvariant approaches to curve matching on standard test datasets.
April 17, 2013
121pm, 182 George St, Room 110
Towards a general theory of human learning and reasoning
Charles Kemp
Assistant Professor
Department of Psychology
Carnegie Mellon University
People learn and reason about animals, spatial relations, kinsfolk, and many other domains, and solve a broad range of inductive problems within each of these domains. The full set of domains and the full set of inductive problems within these domains can be collectively described as the conceptual universe. I will present a systematic characterization of the conceptual universe that helps to clarify the relationships between familiar inductive problems such as property induction, categorization, and stimulus generalization, and that introduces new inductive problems for psychological investigation. I will illustrate the framework using case studies that include behavioral and computational studies of inductive reasoning, and a formal analysis of kinship classification across cultures.
April 10, 2013
121pm, 182 George St, Room 110
Encoding binary neural codes in networks of thresholdlinear neurons
Carina Curto
Assistant Professor
Department of Mathematics
University of NebraskaLincoln (UNL)
Networks of neurons in the brain encode preferred patterns of neural activity via their synaptic connections. Despite receiving considerable attention, the precise relationship between network connectivity and encoded patterns is still poorly understood. Here we consider this problem for networks of thresholdlinear neurons whose computational function is to learn and store a set of binary patterns (e.g., a neural code) as "permitted sets" of the network. We introduce a simple Encoding Rule that selectively turns "on" synapses between neurons that coappear in one or more patterns. The rule uses synapses that are binary, in the sense of having only two states ("on" or "off"), but also heterogeneous, with weights drawn from an underlying synaptic strength matrix S. Our main results precisely describe the stored patterns that result from the Encoding Rule  including unintended "spurious" states  and give an explicit characterization of the dependence on S. As a consequence, we find that certain neural codes are "natural" in the context of these networks; i.e., the structure of the code closely matches the structure of emerging spurious states, allowing the full code to be accurately learned from a highly undersampled set of patterns. Interestingly, many commonly observed neural codes in cortical and hippocampal areas are natural in this sense. As an application, we construct networks that encode hippocampal place field codes nearly exactly, following presentation of only a small fraction of patterns. To obtain our results, we prove new theorems using classical ideas from convex and distance geometry, such as CayleyMenger determinants, revealing a novel connection between these areas of mathematics and coding properties of neural networks.
This is joint work with Anda Degeratu and Vladimir Itskov.
April 3, 2013
121pm, 182 George St, Room 110
Learning to behave by reading
Regina Barzilay
Associate Professor
Department of Electrical Engineering and Computer Science
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
In this talk, I will address the problem of grounding linguistic analysis in control applications, such as game playing and robot navigation. We assume access to natural language documents that describe the desired behavior of a control algorithm (such as game strategy guides). Our goal is to demonstrate that knowledge automatically extracted from such documents can dramatically improve performance of the target application. First, I will present a reinforcement learning algorithm for learning to map natural language instructions to executable actions. This technique has enabled automation of tasks that until now have required human participation  for example, automatically configuring software by consulting howto guides. Next, I will present a MonteCarlo search algorithm for game playing that incorporates information from game strategy guides. In this framework, the task of text interpretation is formulated as a probabilistic model that is trained based on feedback from MonteCarlo search. When applied to the Civilization strategy game, a languageempowered player outperforms its traditional counterpart by a significant margin.
March 20, 2013
121pm, 182 George St, Room 110
Sparse superposition codes: Communication by regression
Andrew Barron
Professor
Department of Statistics
Yale University
A solution is presented to the problem of provable, fast, reliable, capacityachieving communications for the additive Gaussian noise channel. This is an important real communications problem for which the fundamental limits of reliability were initially established in the capacity theorem of Shannon in 1948. Not till the 1990s were practical highrate schemes developed and these form an essential part of the cell phone revolution. However, the demonstration of their reliability is only empirical. The codes we develop here are sparse superposition codes based on a highdimensional regression model, and they have fast encoders and decoders based on iterative regression fits. In this presentation we describe the framework permitting theoretical demonstration of the desired properties. The error probability is shown to scale favorably with the size of the code, namely, the error probability is exponentially small for every fixed communication rate below capacity. Remaining challenges are discussed in understanding how the complexity scales when the rate approaches capacity. This work is joint with Antony Joseph and Sanghee Cho.
February 27, 2013
121pm, 182 George St, Room 110
Bayesian model sampling in reinforcement learning
Michael Littman
Professor
Department of Computer Science
Brown University
Reinforcement learning (RL) is a subfield of machine learning concerned with decision makers that adapt their behavior given utilitybased feedback. While temporal difference methods dominate the field, there are a growing number of researchers examining Bayesian statistics as a way of learning about unfamiliar environments while modeling their own uncertainty and addressing the classic exploration/exploitation dilemma. I will introduce the RL problem and provide some background into existing Bayesian methods, as well as presenting current efforts from our lab to generalize these approaches to broader model classes and more efficient computation.
February 20, 2013
121pm, 182 George St, Room 110
Nonstationary Modeling Through Dimension Expansion
Luke Bornn
Assistant Professor
Department of Statistics
Harvard University
If atmospheric, agricultural, and other environmental systems share one underlying theme it is complex spatial structures, being influenced by such features as topography and weather. Ideally we might model these effects directly; however, information on the underlying causes is often not routinely available. Hence, when modeling environmental systems there exists a need for a class of spatial models which does not rely on the assumption of stationarity.
In this talk, we propose a novel approach to modeling nonstationary spatial fields. The proposed method works by expanding the geographic plane over which these processes evolve into higher dimensional spaces, transforming and clarifying complex patterns in the physical plane. By combining aspects of multidimensional scaling, group lasso, and latent variable models, a dimensionally sparse projection is found in which the originally nonstationary field exhibits stationarity. Following a comparison with existing methods in a simulated environment, dimension expansion is studied on a classic testbed data set historically used to study nonstationary models. Following this, we explore the use of dimension expansion in modeling air pollution in the United Kingdom, a process known to be strongly influenced by rural/urban effects, amongst others, which gives rise to a nonstationary field.
January 30, 2013
121pm, 182 George St, Room 110
Composite Likelihood
Nancy Reid
University Professor
Department of Statistics
University of Toronto
In complex multivariate settings, it is sometimes much simpler to construct a 'likelihood' from lower dimensional marginal or conditional distributions: the general construction was called composite likelihood in Lindsay (1988). Under mild conditions the resulting composite maximum likelihood estimate is consistent and asymptotically normal, with asymptotic variance of the 'sandwich' form that also arises in misspecified models. A recent workshop at the Banff International Research Station on composite likelihood had as one goal to identify important directions for future research in the theory of composite likelihood, in software development, and in methodology for different areas of application. I will give an overview of the progress made at this workshop and highlight areas where more research is needed.
November 14, 2012
121pm, 182 George St, Room 110
Linking signaling pathways and dynamic regulatory networks
Anthony Gitter
Postdoctoral Researcher
Microsoft Research New England
Adaptation to everchanging environmental conditions is vital to the survival of all organisms. In many stress responses upstream proteins detect an environmental stimulus and propagate signals to the nucleus where transcription factors activate or inhibit genes. Although these upstream proteins and temporal changes in gene expression can be readily observed, the pathways and transcription factors involved remain hidden. We present a strategy for integrating largescale data to discover the signaling and transcriptional components of stress response. Our model captures the dynamics of transcription factor activity and the directed pathways that activate these regulators, which are inferred from an undirected protein interaction network. Because our approach requires little conditionspecific data, it is widely applicable to many conditions and species. Joint work with Miri Carmi, Naama Barkai, and Ziv BarJoseph.
November 7, 2012
121pm, 182 George St, Room 110
Natural image statistics and image restoration
Yair Weiss
Professor
School of Computer Science and Engineering
The Hebrew University of Jerusalem
Learning a statistical model of natural images is a longstanding topic of research in disciplines ranging from engineering to computational neuroscience. Somewhat embarrassingly, however, algorithms based on these statistical models do not give stateof theart performance in image restoration, and are outperformed by "block matching" methods that do not have an explicit probabilistic model. In this work, we show that many widely used statistical models of images are actually not very good density models and a simple, unconstrained, Gaussian Mixture Model (GMM) can give much higher likelihood to unseen images. Using the learned GMM, we obtain stateoftheart image restoration performance and by examining what the GMM has learned we obtain new insights into the statistics of natural images that are not captured by most existing models.
Joint work with Daniel Zoran.
October 24, 2012
121pm, 182 George St, Room 110
Active Learning under Margin Assumptions
Sivan Sabato
Postdoctoral Researcher
Microsoft Research New England
We derive and analyze a new, efficient, poolbased active learning algorithm for halfspaces. Most previous algorithms show exponential improvement in the label complexity assuming that the distribution over the instance space is close to uniform. This assumption rarely holds in practical applications. Instead, we study the label complexity under a largemargin assumptiona much more realistic condition, as evident by the success of marginbased algorithms such as SVM. Our algorithm is computationally efficient and comes with formal guarantees on its label complexity. The analysis is based on new results for submodular optimization. The algorithm also naturally extends to the nonseparable case and to nonlinear kernels. We further show experimentally that our approach yields superior label complexity to previous active learners on real problems.
Joint work with Alon Gonen and Shai ShalevShwartz.
October 3, 2012
121pm, 182 George St, Room 110
Recovering Large Networks via Optimizing Nonlikelihood Functions
Rossi Luo
Assistant Professor
Department of Biostatistics and Center for Statistical Sciences
Brown University
Graphical models are used to describe the relationships between multiple variables, and a class of undirected graphical models can be uncovered from estimating covariance matrices. Popular approaches include optimizing (regularized) likelihood functions, especially in the setting when the sample size is much smaller than the number of variables. In this high dimensional setting, I will describe a few nonlikelihood optimization approaches to estimate large covariance and inverse covariance matrices. These approaches are based on exploring algebraic properties of these matrices and low dimensional assumptions, such as sparsity and low rank. Convergence rates of these estimators are obtained for finite samples, when the underlying distribution has either exponential or polynomial tails. All the proposed methods can be formulated as convex optimization problems. A firstorder algorithm is proposed to solve one of the nonsmooth optimization problems, and its global iterative complexity is also proved. Numerical merits of these methods are demonstrated using simulation and real datasets from brain sciences.
May 16, 2012
121pm, Metcalf Auditorium (101)
Division of Applied Mathematics & Center for Vision Research Seminar
Lessons from Photographing and Identifying the Word's Plant Species
Peter Belhumeur
Professor
Department of Computer Science
Columbia University
Columbia University, the University of Maryland, and the Smithsonian Institution are working on visual recognition software to help identify species from photographs. I will discuss our work on developing Leafsnap  the first in a series of electronic field guides. As part of this work, we have completed photographing close to one third of the world's plant species and have begun capturing beautiful highresolution images of live specimens. Our work has led us in new research directions for the visual recognition of human faces, dog breeds, and bird species, including the adoption of centuriesold techniques from taxonomy for the process of labeling images with visual attributes and object parts. In particular, I will show that it is possible to automatically recognize a wide range of visual attributes and parts in images and use them in numerous applications of computer vision.
May 9, 2012
121pm, 182 George St, Room 110
Computational Challenges in Molecular Medicine
Donald Geman
Professor
Department of Applied Mathematics and Statistics
The Johns Hopkins University
Despite the "omics" revolution, advances in clinical practice through fundamental research in systems biology are rare. First, I will talk about the technological, mathematical and translational barriers encountered in attempting to apply statistical learning to cancer genomics using highthroughput transcriptional data. Then I will propose a methodology based on comparisons of molecular counts, and argue that these are easy to interpret, can account for combinatorial interactions among genes and gene products and might even support a mechanistic interpretation for the underlying decision rules. These ideas will be illustrating by predicting disease phenotypes, the site of origin of metastatic tumors and pathway deregulation.
April 25, 2012
121pm, 182 George St, Room 110
Hallucinating scene detail and recognizing scene attributes
James Hays
Assistant Professor
Department of Computer Science
Brown University
In this talk I will discuss two recent projects. First I will present a massively data driven approach to superresolution. In contrast to recent image enhancement literature, we show that image databases can be more useful than internal image statistics if the right representations are used. Our superresolution method can insert scenespecific textures and transitions beyond the capabilities of prior work. Second, I will introduce the new SUN Attribute Database. We use crowdsourcing to discover a taxonomy of attributes and label 14,000 images from 700 scene categories with more than one hundred attributes. We show the first results for attribute recognition and argue that attributes are a natural way to describe the space of scenes.
April 18, 2012
121pm, 182 George St, Room 110
LP Relaxations for Global Models in Natural Language Processing
Sebastian Riedel
Research Scientist
University of Massachusetts Amherst
Much work in NLP relies on pipelines of classifiers. For example, when mining biomedical text we usually first extract proteins and other entities, and then determine their relations to each other. Such pipelines suffer from cascading errors: once an upstream processor errs, downstream processors often cannot recover. This has led to work that tackles NLP problems in a more global and joint manner.
In this talk I will first present our work on addressing biomedical event extraction as global optimization problem. This approach achieved stateoftheart results on several benchmark datasets, and ranked first in the latest international competition on this task. I will illustrate the enabling technology behind our global approach: Linear Programming (LP) relaxations of the global optimization problem. In particular, I will show how we divide event extraction into tractable subproblems and apply dual decomposition to efficiently find LP optima.
In the second part of my talk I will illustrate our ongoing work on addressing the computational challenge that arises from a superlinear number of variables in our LP relaxations. I will show how we use row and column generation to reduce runtime of a global dependency parsing model by 8 times, without loss of optimality guarantees. Our method relies on upper bounds for the reduced costs of variables which we derive from "grammar constants" of our probabilistic model.
March 28, 2012
121pm, 182 George St, Room 110
What is the neural foundation of the vision process?
Christoph von der Malsburg
Senior Fellow
Frankfurt Institute of Advanced Studies
The current consensus version of the neural basis of vision  or of cognition in general  makes it difficult to model some of the central operations to be performed during perception and learning. I will offer the dynamic link architecture as an alternative physical basis for cognitive processes, will discuss it as a natural framework for vision and other cognitive processes, and will illustrate and support the argument with the help of a concrete model of invariant object recognition.
March 21, 2012
121pm, 182 George St, Room 110
The tree of life and the evolution of genome function
Casey Dunn
Manning Assistant Professor
Ecology and Evolutionary Biology
Brown University
There are estimated to be about 10 million species on the planet, though much of what we know about genome function is the result of work in about 10 species  the laboratory model organisms. Work on model organisms has been extraordinarily productive over the past 100 years, but our focus on such a small number of organisms means we have only looked through a keyhole at the diversity genome function in life on earth. Many interesting biological phenomena remain out of reach, we know little about gene function in the context of macroevolution and ecosystem interactions, and we have only observed gene function within narrow environmental conditions. We are now on the cusp of being able to do functional genomics research in the other 10 million species on the plant, right in their natural environment without growing them in the lab. But to do so will require a complete retooling of data acquisition, entirely new analysis approaches, and a shift in perspective that embraces differences between species rather than obsess over global invariants. Here I outline some of these challenges, including the new analysis challenges we face.
February 8, 2012
121pm, 182 George St, Room 110
Exploiting Sparse Structure by Spectral Connectivity Analysis
Ann Lee
Associate Professor
Department of Statistics
Carnegie Mellon University
For naturally occurring data, the dimension of the given input space is often very large while the data themselves have a low intrinsic dimensionality. Spectral kernel methods are nonlinear techniques for transforming data into a coordinate system that efficiently reveals the underlying structure  in particular, the "connectivity"  of the data. In this talk, I will focus on one particular technique  diffusion maps  but the analysis can be used for other spectral methods as well. I will give examples of various applications of the method in highdimensional regression and density estimation. I will also present a new extension of the diffusion framework to comparing distributions in highdimensional spaces with an application to image retrieval and texture discrimination. (Part of this work is joint with R.R. Coifman, D. Liu, C. Schafer and L. Wasserman.)
February 1, 2012
121pm, 182 George St, Room 110
Statistical analysis of populations with interacting and interfering units
Edo Airoldi
Assistant Professor
Department of Statistics, and FAS Center for Systems Biology
Harvard University
A number of scientific endeavors of current national and international interest involve populations with interacting and/or interfering units. In these problems, a collection of partial measurements about patterns of interaction and interference (e.g., social structure and familial relations) is available, in addition to the more traditional measurements about unitlevel outcomes and covariates. Formal statistical models for the analysis of this type of data have emerged as a major topic of interest in diverse areas of study. Probability models on networks date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online social networking websites such as Facebook and LinkedIn, and a host of more specialized professional networking communities has intensified interest in the study of networks, structured measurements and interference. In this talk, I will review a few ideas and open areas of research that are central to this burgeoning literature, placing emphasis on inference and other core statistical issues. Topics include elements of sampling and inference from nonignorable (network sampling) designs, and semiparametric modeling, with hints to the applications to social, biological and information networks that motivate these statistical problems.
November 30, 2011
121pm, 182 George St, Room 110
Exploring the role of ventral premotor cortex in reachtograsp movements: neural trajectories through spike train similarity space
Carlos VargasIrwin
Postdoctoral Research Associate in Neuroscience
Brown University
Dimensionality reduction applied to neural ensemble data has led to the concept of a 'neural trajectory', a lowdimensional representation of how the state of the network evolves over time. Here we present a novel neural trajectory extraction algorithm which combines spike train distance metrics (Victor and Purpura, 1996) with dimensionality reduction based on local neighborhood statistics (van der Maaten and Hinton, 2008.) . We apply this technique to describe and quantify the activity of primate ventral premotor cortex neuronal ensebles in the context of a cued reaching and grasping task with instructed delay.
References:
Victor and Purpura. Nature and precision of temporal coding in visual cortex: a metricspace analysis. J Neurophysiol. 1996 Aug;76(2):131026.
L.J.P. van der Maaten and G.E. Hinton. Visualizing Data using tSNE. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
November 2, 2011
121pm, Barus & Holley Room 190
Perceptual Fragments: BottomUp and TopDown Use of Shape in Object Recognition
Benjamin Kimia
Professor
Division of Engineering
Brown University
The bottomup “segmentation followed by recognition” strategy has for some time now given way to featurebased discriminative recognition with significant success. As the number of categories and exemplars per category increases, however, lowlevel features are no longer sufficiently discriminative, motivating the construction and use of more complex features. It is argued here that these complex features will necessarily be encoding shape and this in turn requires curves and regions, thus reviving aspects of bottomup segmentation strategies. We suggest that the demise of segmentation was due to prematurely committing to a grouping in face of ambiguities and propose a framework for representing multiple grouping options in a containment graph. Specifically, we use contour symmetry to partition the image into atomic fragments and define transforms to iteratively grow these atomic fragments into mote distinctive perceptual fragments, the nodes of the containment graph. We also briefly present a fragmentbased language for generating shapes and the use of fragments in topdown category recognition. The bottomup and topdown processes are then integrated by interaction through the midlevel representation of perceptual fragments.
October 19, 2011
121pm, 182 George St, Room 110
Statistical challenges in neural data analysis
Liam Paninski
Associate Professor
Department of Statistics
Columbia University
Systems and circuit neuroscience have recently experienced something of a renaissance, driven by a remarkable explosion in the development of groundbreaking new experimental tools. These methods have in turn opened up a variety of challenging statistical problems, in which we must often perform some highdimensional inference under strong computational constraints (for example, in some cases realtime processing is required). This talk will review some recent progress on three exemplary problems, each of fundamental neuroscientific importance: 1) Optimal filtering and smoothing of voltage signals on large, complex dendritic trees; 2) Optimal decoding of sensory information from the activity of large neural populations, and 3) Inference of connectivity in large neuronal networks given limited, noisy observations.
October 12, 2011
121pm, 182 George St, Room 110
Spectral Methods for Learning Graphical Models
Sham Kakade
Senior Research Scientist
Microsoft Research New England
Associate Professor of Statistics
The Wharton School, University of Pennsylvania
This work presents a methodology for learning graphical models with hidden nodes through algebraic techniques (in particular, matrix decomposition and spectral methods), using independent samples of the observed variables. The talk focuses on tree models, and covers two aspects of the underlying learning problem: parameter estimation and structural learning. The underlying idea is to utilize the spectral decomposition of the second moment matrix to reveal the latent structure.
The first part is concerned with parameter estimation. Here, we present an an efficient and provably correct algorithm for learning HMMs (i.e. recovering the correct HMM dynamics), with a sample complexity depending on some mild conditions of the underlying system. The algorithm is also simple, employing only a singular value decomposition and matrix multiplications, and does not suffer from local minimum issues in nonconvex optimization (such as for more traditional approaches, including the EM algorithm), and it handles high dimensional observations and long range dependencies more easily. The method can be extended to estimating parameters for nonlinear systems and general tree structured graphical models with unobserved nodes.
The second part is concerned with structural learning, where we provide the Spectral Recursive Grouping algorithm, an efficient and simple procedure for recovering the underlying tree topology of a broad class of multivariate tree models with hidden nodes. Exact recovery of the tree structure can be established based on certain natural dependencies on statistical and structural properties of the underlying joint distribution.
Join work with: Daniel Hsu, Tong Zhang; Anima Anandkumar, Kamalika Chaudhuri, Le Song
October 5, 2011
121pm, 182 George St, Room 110
Dynamic regulation of decision threshold by frontal cortex and basal ganglia
Michael Frank
Associate Professor
Department of Cognitive, Linguistic, & Psychological Sciences
Department of Psychiatry and Human Behavior
Brown University
The basal ganglia and frontal cortex interact to support rewardbased decision making. While medial prefrontal cortex (mPFC) is implicated in processing uncertainty and reinforcement conflict, the subthalamic nucleus (STN) is thought to act as a brake on corticostriatal function, preventing premature impulsive responding. Simulations using a neural circuit model combined with drift diffusion model analysis show that the STN acts to temporarily increase the decision threshold as a function of conflicting cortical decision plans, and that this same decision threshold modulation accounts for human behavioral data. We tested these posited neural mechanisms in experiments with electrophysiological recordings over mPFC and in the STN (intraoperative recordings from Parkinson's patients undergoing deep brain stimulation surgery). We show a relationship between mPFC activity and decision threshold adjustment which is altered by STN manipulation. These data support the hypothesis that frontalbasal ganglia communication regulates decision processes as a function of reinforcement conflict.
September 14, 2011
121pm, 182 George St, Room 110
Nonparametric Priors for Segmentation of Medical Images
Polina Golland
Associate Professor
Department of Electrical Engineering and Computer Science, and CSAIL
Massachusetts Institute of Technology
We propose a nonparametric probabilistic model for prior knowledge in segmentation of medical images. The resulting inference algorithms register individual training images to the new image, transfer the segmentation labels and fuse them to obtain the final segmentation of the test subject. Our generative model yields previously proposed label fusion algorithms as special cases, and also leads to a new variant that aggregates evidence for the segmentation label locally. We demonstrate the advantages of our approach in two clinical applications: segmentation of neuroanatomical structures and segmentation of the left heart atrium whose shape varies significantly across the population.
May 11, 2011
121pm, 182 George St, Rm 110
Division of Applied Mathematics & Center for Vision Research Seminar
Statistical structures of natural scenes and neural ensembles activities
TaiSing Lee
Associate Professor
Computer Science Department and Center for the Neural Basis of Cognition
Carnegie Mellon University
Statistical structures of the natural environment could provide important clues for understanding the neural representations and functional circuitry underlying visual perception. We studied the statistical structures of 3D scenes, and evaluated whether these structures are encoded in the tuning properties and functional connectivities of neurons in the early visual cortex. We found that information encoded in the distribution of disparitytuned neurons at the population level matches the statistical structures of natural scenes in a number of aspects: the distribution of tuning curves, the correlation in tunings to different visual cues, and the functional connectivity of neurons. We also found that the ensemble activities of neurons support the notion of a probabilistic population code that allows the encoding of statistical priors of the natural scenes, the uncertainty and the posterior distribution of inferred perceptual attributes. These findings provide preliminary empirical insights on how pattern theory can be realized at the level of neuronal population and neuronal circuits.
May 4, 2011
121pm, 182 George St, Rm 110
Logistic Regression on Data Streams
Kevin Kochanek
Research Mathematician
US Department of Defense
The emergence of highvolume data sources in fields such as finance, telecommunications, and biomedicine has led to a surge of interest in streaming data analysis. In this talk we consider the problem of constructing generalized linear models on data streams. We also present a novel streaming adaptive histogramming algorithm to facilitate our approach that is also useful in other contexts.
April 20, 2011
121pm, 182 George St, Rm 110
Selffolding of polyhedra experiments and a little theory
Govind Menon
Associate Professor
Division of Applied Mathematics
Brown University
A fascinating development in materials chemistry is the construction of `large' components by selfassembly. I will describe some fascinating experiments of David Gracias' lab, and some preliminary mathematical work on mesoscale polyhedra that selffold. Many problems are open, and the main purpose of this talk is to advertise some of them.
April 6, 2011
121pm, 182 George St, Rm 110
Division of Applied Mathematics & Center for Vision Research Seminar
A HighThroughput Screening Approach to BiologicallyInspired Object Recognition
David Cox
Principal Investigator, Visual Neuroscience Laboratory
The Rowland Institute at Harvard
Biological visual systems are currently unrivaled by artificial systems in their ability to recognize faces and objects in highly variable and cluttered realworld environments. Biologicallyinspired computer vision systems seek to capture key aspects of the computational architecture of the brain, and such approaches have proven successful across a range of standard object and face recognition tasks. However, while many models of biological object recognition share a common set of "broadstroke" properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model. Since the number of such parameters (explicit or implicit) is typically large, and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored. Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea, or because the correct "parts" have not been tuned correctly, assembled at sufficient scale, or provided with enough training.
In this talk, I'll present a highthroughput search approach for exploring the range of biologicallyinspired visual models, that emphasizes simplicity, careful evaluation procedures, and scale. Systems discovered using this approach are shown to perform surprisingly well across a range of visual problem domains and achieve stateoftheart performance on a collection of challenging realworld face recognition datasets. Finally, I'll discuss current limitations and future opportunities of this approach and argue for renewed attention to data set quality and model scalability.
March 16, 2011
121pm, 182 George St, Rm 110
Solving Inference, Optimization, and Constraint Satisfaction Problems with the Divide & Concur and Belief Propagation MessagePassing Algorithms
Jonathan Yedidia
Distinguished Member Research Staff
Mitsubishi Electric Research Labs, Cambridge, MA
In "probabilistic inference" problems one tries to estimate the true state of some quantities of interest, given only noisy or ambiguous sensor measurements of related quantities. Such problems arise in a very wide variety of fields, with applications in communications, signal processing, computer vision, speech and audio recognition, machine learning, physics, and robotics.
In this talk, I will describe and compare two particularly important algorithms that can solve probabilistic inference problems, as well as related constraint satisfaction and optimization problems: the celebrated "belief propagation" (BP) algorithm; and the "divide and concur" (D&C) algorithm. I will show that the D&C algorithm, which was recently introduced by Gravel and Elser for solving nonconvex constraint satisfaction problems, can also be understood as an messagepassing algorithm that can be used on an even wider class of inference and optimization problems than BP. Although less wellknown, the D&C algorithm has some notable advantages compared with BP, in that it more naturally deals with problems with continuous valued variables, or with variables that lack local evidence. Another advantage of D&C is that its "differencemap" dynamics enables it to avoid "traps."
If time permits, I will also describe a new decoder (developed with Yige Wang and Stark Draper) for lowdensity parity check codes that combines the ideas of the D&C and BP algorithms. This "differencemap belief propagation" (DMBP) decoder significantly improves errorfloor performance compared with stateoftheart BP decoders, while maintaining a similar computational complexity.
March 9, 2011
121pm, 182 George St, Rm 110
A history of applying principal component analyses to human population genetic data
Sohini Ramachandran
Assistant Professor
Department of Ecology and Evolutionary Biology, and
Center for Computational Molecular Biology
Brown University
Nearly 30 years ago, Luca CavalliSforza and colleagues pioneered the use of PCA in population genetics, producing maps that summarized observed human genetic variation across many continental regions. Patterns in these maps were interpreted as signatures of specific migration events. As genetic datasets have grown in the number of markers used and as sequencing technology has improved, PCA has been widely applied to population genetic datasets. I will present a review of how PCA is used in population genetics and discuss the conclusions that have been drawn via this method, as well as other observed genetic patterns in geographic space in my research on both the inference of past migrations using extant human genetic data and the genetic underpinnings of disease.
February 16, 2011
121pm, 182 George St, Rm 110
Patterns of Thought (in humanoid robots)
Yiannis Aloimonos
Professor
Department of Computer Science
University of Maryland
Patterns of thought, for a cognitive system that interprets a scene or performs a task, consist of perceptual, motor and linguistic representations of the world, controlled by an attention mechanism. The role of the attention mechanism is to facilitate parsing of the different representations according to a grammar of action. In this talk I will introduce, for the first time, an action grammar. It is really a disguised version of Chomsky’s Universal Grammar adapted in a minimalist framework. Parsing requires an attention mechanism comprising visual human and object filters, active segmentation routines and a context implementing system utilizing language. I will describe our developments in learning visual filters, achieving robust segmentation techniques and integrating vision with language and our implementations of these ideas in the field of Humanoid Robotics and the general area of video interpretation.
February 9, 2011
121pm, 182 George St, Rm 110
Division of Applied Mathematics & Center for Vision Research Seminar
Learning on Analytic Manifolds
Fatih Porikli
Senior Principal Research Scientist and Technical Manager, Imaging Group
Mitsubishi Electric Research Labs, Cambridge, MA
A large number of natural phenomena can be formulated on analytic manifolds. More specifically in computer vision, such underlying notions emerge in multifactor analysis including feature selection, pose estimation, structure from motion, appearance tracking, and shape embedding. Unlike Euclidean spaces, analytic manifolds does not exhibit global homeomorphism, thus, differential geometry is applicable only locally. This prevents application of conventional inference and learning methods, which require vector norms.
Recently we introduced appearance based object descriptors and motion transformations that exhibit Riemannian manifold structure on positive definite matrices that enable projections of the original problems onto tangent spaces. In this manner, we do not need to flatten the underlying manifold or discover its topology. This talk will demonstrate entertaining results of manifold learning on human detection, regression tracking, unusual event analysis, and affine pose estimation.
November 17, 2010
121pm, 182 George St, Room 110
Point process adaptive filters and the analysis of ensemble neural spiking activity
Uri Eden
Assistant Professor
Department of Mathematics and Statistics
Boston University
Although it is well known that brain areas receive, process and transmit information via sequences of sudden, stereotyped electrical impulses, called action potentials or spikes, most analyses of neural data ignore the localized nature of these events. The theory of point processes offers a unified approach to modeling the firing properties of spiking neural systems, and assessing goodnessoffit between a neural model and observed spiking data.
We develop a point process modeling framework and state space estimation algorithms to describe and track the evolution of dynamic representations from large neural ensembles. This allows us to derive a toolbox of estimation algorithms and adaptive filters to address questions of static and dynamic encoding and decoding. In our analysis of these filtering algorithms, we draw analogies to wellstudied linear estimation algorithms for continuous valued processes, such as the Kalman filter and its discrete and continuous time extensions.
These methods will be illustrated in the analysis of spatially specific spiking activity in rat hippocampus. Using simple point process models, we are able to accurately characterize the localized spiking activity of these neurons as a function of the animal's position in its environment, track changes in their firing properties, reconstruct the animal's past movements, and predict its future behavior from a population of spiking neurons.
November 3, 2010
121pm, 182 George St, Room 110
Bottomup and topdown processing in visual perception
Thomas Serre
Assistant Professor
Department of Cognitive, Linguistic, & Psychological Sciences
Brown University
Perception involves a complex interaction between feedforward sensorydriven information and feedback attentional, memory, and executive processes that modulate such feedforward processing. A mechanistic understanding of feedforward and feedback integration is a necessary step towards elucidating key aspects of visual and cognitive functions and dysfunctions.
In this talk, I will describe a computational framework for the study of visual perception. I will present computational as well as experimental evidence suggesting that bottomup and topdown processes make a distinct and essential contribution to the recognition of complex visual scenes. A feedforward hierarchical architecture may provide a satisfactory account of "immediate recognition" corresponding to the first few hundred milliseconds of visual processing. However, such an architecture may be limited in recognizing complex visual scenes. I will show how attentional mechanisms and cortical feedback may help improve object recognition performance in complex cluttered scenes.
October 20, 2010
121pm, 182 George St, Room 110
Discovering Influential Variables: A Partition Retention Approach
Herman Chernoff
Professor Emeritus
Department of Statistics, Harvard University
Department of Applied Mathematics, MIT
A general approach, pioneered by S.H. Lo and T. Zheng, the method of Partition Retention, is designed to identify those variables among many possible explanatory variables, which have an influence on a dependent variable. This method can be successful when the influence depends on the interaction of several of these variables and the marginal influences of the individual variables are negligible. In those cases where the number of candidates is huge, thousands, this method involves the idea of resuscitating apparently "dead" or useless variables which depend on interactions. The method is computer intensive and involves measuring influence of a small group of candidates, from which the least influential are dropped one at a time until only promising ones are retained. This procedure is repeated many times with small randomly selected subgroups of the candidate variables, and attention is focused on those which are retained most often.
October 6, 2010
121pm, CIT 241 (Swig Boardroom) *** change in location ***
Learning Probabilistic Models with Deep Hierarchical Structures
Ruslan Salakhutdinov
Postdoctoral Fellow
Department of Brain and Cognitive Sciences and CSAIL
Massachusetts Institute of Technology
Building intelligent systems that are capable of extracting higherorder knowledge from highdimensional data and successfully transferring that knowledge to learning new concepts lies at the core of solving many AI related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that involve many layers of nonlinear processing.
In this talk I will first introduce a broad class of probabilistic generative models called Deep Boltzmann Machines (DBMs) that contain many layers of latent variables. I will describe a new learning algorithm for this class of models that uses variational methods and Markov chain Monte Carlo (MCMC). This new learning algorithm, in addition to a bottomup pass, can incorporate topdown feedback, which allows DBMs to better propagate uncertainty about ambiguous inputs. I will show that these deep models can learn interesting representations and can be successfully applied in many application domains, including information retrieval, object recognition, and nonlinear dimensionality reduction. In the second part of the talk, I will describe new ways of developing more complex systems that combine Deep Boltzmann Machines with more structured hierarchical Bayesian models. I will show how these hybrid models can learn a deep hierarchical structure for sharing knowledge across hundreds of visual categories, which allows efficient learning of new categories from few, even just one, examples  a problem known as 'oneshot learning'.
*May 10, 2010 (Monday)
4:005:30pm, 182 George St, Rm 110
Shared Segmentation of Natural Scenes using Dependent PitmanYor Processes
Erik Sudderth
Assistant Professor
Department of Computer Science
Brown University
We explore statistical frameworks for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the PitmanYor (PY) process. This generalization of the Dirichlet process leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY priors, we use nonMarkov Gaussian processes to infer spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to stateoftheart methods, while simultaneously discovering categories shared among natural scenes.
*April 23, 2010 (Friday)
45:30pm, 182 George St, Rm 110
A Generative Model Approach to Fraud Detection
Brian Lucena
Chief Mathematician
Guardian Analytics Inc.
Abstract: Standard discriminative approaches rely on seeing lots of examples of both classes in order to distinguish between the two. However, in the specific case of online banking fraud, these techniques are hampered by two major practical concerns. First, there is a huge asymmetry in the amount of data available about each class – hundreds of millions of legitimate transactions and perhaps a few hundred fraudulent ones. Second, fraudsters quickly change their tactics, and so even those few data points on fraud may not be very useful in predicting future fraud. Consequently, we take a modelbased approach to exploit the natural differences between individual users. This talk will describe the FraudMAP system and discuss several interesting problems that arise in its development.
April 21, 2010
121pm, 182 George St, Rm 110
Segmentation of Image Ensembles via Latent Atlases
Tammy RiklinRaviv
Postdoctoral Associate
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
The images acquired via medical imaging modalities are frequently subject to low signaltonoise ratio, bias field and partial volume effects. These artifacts, together with the naturally low contrast between image intensities of some neighboring structures, make the extraction of regions of interest (ROIs) in clinical images a challenging problem. Probabilistic atlases, typically generated from comprehensive sets of manually labeled examples, facilitate the analysis by providing statistical priors for tissue classification and structure segmentation. However, the limited availability of training examples that are compatible with the images to be segmented renders the atlasbased approaches impractical in many cases. In the talk I will present a generative model for joint segmentation of corresponding regions of interest in a collection of aligned images that does not require labeled training data. Instead, the evolving segmentation of the entire image set supports each of the individual segmentations. This is made possible by iteratively inferring a subset of the model parameters, called the spatial parameters, as part of the joint segmentation processes. These spatial parameters are defined in the image domain and can be viewed as a latent atlas. Our latent atlas formulation is based on probabilistic principles, but we solve it using partial differential equations and energy minimization criteria. We evaluate the method successfully for the segmentation of cortical and subcortical structures within different populations and of brain tumors in a singlesubject multimodal longitudinal experiment.
*April 16, 2010 (Friday)
45:30pm, 182 George St, Rm 110
Division of Applied Mathematics / Center for Statistical Sciences Seminar
MultiResolution Inference of Stochastic Models from Partially Observed Data
Samuel Kou
Professor
Department of Statistics
Harvard University
Stochastic models, diffusion models in particular, are widely used in science, engineering and economics. Inferring the parameter values from data is often complicated by the fact that the underlying stochastic processes are only partially observed. Examples include inference of discretely observed diffusion processes, stochastic volatility models, and double stochastic Poisson (Cox) processes. Likelihood based inference faces the difficulty that the likelihood is usually not available even numerically. Conventional approach discretizes the stochastic model to approximate the likelihood. In order to have desirable accuracy, one has to use highly dense discretization. However, dense discretization usually imposes unbearable computation burden. In this talk we will introduce the framework of Bayesian multiresolution inference to address this difficulty. By working on different resolution (discretization) levels simultaneously and by letting the resolutions talk to each other, we substantially improve not only the computational efficiency, but also the estimation accuracy. We will illustrate the strength of the multiresolution approach by examples.
April 14, 2010
121pm, 182 George St, Rm 110
Preserving knowledge through media transitions: ushering the heritage of India into the digital age
Peter Scharf
Director of The Sanskrit Library and Senior Lecturer in Sanskrit
Department of Classics
Brown University
Human beings express knowledge in various modes: through images in visual art; through movement in dance, theatrical performance, and gestures; and through speech in spoken language. Each of these means of expression includes means to encode knowledge, and each is used to express knowledge originally encoded in one of the others. In particular, writing represents knowledge originally encoded in speech. Certain media dominate as the primary methods for the transmission of detailed information at different times and places. Oral tradition dominated the tradition of Sanskrit in India in the first and second millennia B.C.E. Writing overtook orality in the first millennium C.E. and dominated until replaced by printing in the 19th century. At the dawn of the 21st century, the digital medium is replacing printing as the dominant means of knowledge transmission.
While brittle editions of printed Sanskrit books are removed from easy access to library annexes, the Sanskrit Library is working to usher the heritage of India into the digital age. Scharf designed an encoding scheme to represent Sanskrit accurately and revised the Unicode Standard to include characters necessary for the representation of the ancient Vedic texts. He directed an NSFfunded project 20062009 to build a digital Sanskrit library that integrates lexical resources, linguistic software, and machinereadable texts. Now he directs an NEHfunded project to produce highquality digital images of manuscripts at Brown University and the University of Pennsylvania and to develop wordspotting technology to facilitate searching even in digital images not amenable to optical character recognition (OCR).
*April 9, 2010 (Friday)
45:30pm, 182 George St, Rm 110
Analysis of Molecular Networks
Mark Gerstein
Albert L. Williams Professor of Biomedical Informatics
Departments of Molecular Biophysics and Biochemistry and of Computer Science
Yale University
My talk will be concerned with understanding protein function on a genomic scale. My lab approaches this through the prediction and analysis of biological networks, focusing on proteinprotein interaction and transcriptionfactortarget ones. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various topological statistics. In particular, I will discuss a number of recent analyses: (1) Improving the prediction of molecular networks through systematic trainingset expansion; (2) Showing how the analysis of pathways across environments potentially allows them to act as biosensors; (3a) Analyzing the structure of the regulatory network indicates that it has a hierarchical layout with the "middlemanagers" acting as information bottlenecks; (3b) Showing these middle managers tend be arranged in various "partnership" structures giving the hierarchy a "democratic character"; (4) Showing that most human variation occurs on the periphery of the protein interaction network; and (5) Developing useful webbased tools for the analysis of networks (TopNet and tYNA).
http://networks.gersteinlab.org
http://topnet.gersteinlab.org
The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 296870.
Analysis of Diverse Regulatory Networks in a Hierarchical Context: Consistent Tendencies for Collaboration in the Middle Levels. N Bhardwaj et al. PNAS (2010, in press)
Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. PM Kim, JO Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 202749.
Training Set Expansion: An Approach to Improving the Reconstruction of Biological Networks from Limited and Uneven Reliable Interactions. KY Yip, M Gerstein (2008) Bioinformatics
Quantifying environmental adaptation of metabolic pathways in metagenomics. T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein (2009) PNAS
April 7, 2010
121pm, 182 George St, Rm 110
Geometry of the space of 2D shapes equipped with the WeilPetersson metric
Sergey Kushnarev
Graduate Student
Division of Applied Mathematics
Brown University
The study of planar simple closed curves (or "2D shapes") and their similarities is a central problem in the field of computer vision. It arises in the task of characterizing and classifying objects from their observed silhouette. Defining natural distance between 2D shapes creates a metric on the infinitedimensional space of shapes. In this talk I will describe one particular metric, which comes from the conformal mapping of the 2D shapes, via the theory of Teichmuller spaces. In this space every simple closed curve (or a 2D shape) is represented by a smooth selfmap of a circle. I will talk about a specific class of solitonlike geodesics on the space of shapes, called teichons. Some numerical examples of geodesics and effects of the curvature will be demonstrated.
March 24, 2010
121pm, 182 George St, Rm 110
Unsupervised Part of Speech Tagging: From graphical models to statistical models to "biological" models
Michael Lamar
Graduate student
Division of Applied Mathematics
Brown University
A longstanding problem in computational linguistics is the automatic extraction of part of speech categories from a corpus. We introduce this problem and discuss several purely contextual, unsupervised approaches to this problem beginning with Hidden Markov Models which, until recently, have set the standard for performance. We then formulate two new alternative approaches. The first is a purely statistical treatment which incorporates singular value decomposition and clustering. The second is a selforganizing model inspired by simple biological processes. We compare these new approaches to the HMM's and discuss why they achieve better performance.
March 17, 2010
121pm, 182 George St, Rm 110
Learning in Social Networks with Signals of Bounded Informativeness
Ilan Lobel
Postdoctoral Researcher
Microsoft Research New England
We study the (perfect Bayesian) equilibrium of a model of learning over a general social network. Each individual receives a signal about the underlying state of the world, observes the past actions of a stochasticallygenerated neighborhood of individuals, and chooses one of two possible actions. The stochastic process generating the neighborhoods defines the network topology (social network). We characterize purestrategy equilibria for arbitrary stochastic and deterministic social networks and determine conditions under which there will be asymptotic learning, that is, the conditions under which, as the social network becomes large, individuals converge (in probability) to taking the right action. In previous work, we showed that when the likelihood ratio of the signals is unbounded, there is asymptotic learning when there is some minimal amount of "expansion in observations". In this talk, we show that with signals of bounded likelihood ratio, asymptotic learning is achieved if three conditions hold: expanding observations holds, there exists a subsequence of agents with unpersuasive neighborhoods and there exists a uniform lower bound on the probability of observing the history of past actions. This talk is based on joint work with Daron Acemoglu (MIT Economics), Munther Dahleh (MIT EECS) and Asu Ozdaglar (MIT EECS).
*March 12, 2010 (Friday)
45:30pm, 182 George St, Rm 110
Graphs and polytopes: learning structures with linear programming relaxations
Tommi Jaakkola
Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Combinatorial problems are commonplace in modeling. They arise in predicting likely values for variables (e.g., selection/orientation of residues in protein design) or learning the model structure itself. In recent years, a lot of effort has gone into developing linear programming (LP) relaxations for finding the most likely values for the variables in a given model. In this talk, I will focus on the related combinatorial problem of learning the model structure, i.e., solving for the highest scoring Bayesian network (BN) graph given data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal graph structure. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. I will also illustrate the strengths and weaknesses of the approach in the context of sample structure learning problems.
The talk covers joint work with David Sontag, Amir Globerson, and Marina Meila.
March 10, 2010
121pm, 182 George St, Rm 110
RespondentDriven Sampling for Networks: Degrees of Uncertainty with Uncertain Degrees
Joe Blitzstein
Assistant Professor
Department of Statistics
Harvard University
Respondentdriven sampling (RDS) is a network sampling design (introduced by Heckathorn) which is widely and increasingly being used to study properties of individuals in a social network, e.g., HIV prevalance. The standard estimators make many strong assumptions, including that the degree d of an individual in the sample is known exactly and that this individual is equally likely to recruit any of his or her d neighbors in the social network.
Classical issues of modelbased vs. designbased inference arise here, with new challenges from having also to account for the network structure. We explore the biasvariance tradeoff encountered in this setting, and show how the uncertainty in the measurement of degrees propagates into uncertainty in the RDS estimates. We then show how Bayesian and importance sampling ideas can be used to give obtain more reliable estimates.
March 3, 2010
121pm, 182 George St, Rm 110
Topic Models: Priors, Stop Words and Languages
Hanna Wallach
Senior Postdoctoral Research Associate
Department of Computer Science
University of Massachusetts Amherst
In this talk, I will compare several classes of structured priors for topic models, and show that an asymmetric Dirichlet prior over the documenttopic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topicword distributions provides no real benefit. This combination of priors substantially increases the robustness of topic models to the highly skewed word frequency distributions common in natural language. I will also present two topic models that cannot easily rely on standard stop word pruning techniques and therefore require such robustness. The first is a topicbased predictive language model that incorporates both ngram statistics and latent topic variables, while the second is a polylingual topic model that discovers topics aligned across multiple languages from massive collections of interlinked documents.
February 17, 2010
121pm, 182 George St, Rm 110
Message Passing Algorithms for Compressed Sensing
Andrea Montanari
Assistant Professor
Department of Electrical Engineering and Department of Statistics
Stanford University
Compressed sensing aims to undersample certain highdimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsityundersampling tradeoff is achieved when reconstructing by convex optimization  which is expensive in important largescale applications.
Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for largescale problems. Unfortunately known fast algorithms offer substantially worse sparsityundersampling tradeoffs than convex optimization.
We introduce a simple costless modification to iterative thresholding making the sparsityundersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterativethresholding algorithms are inspired by belief propagation in graphical models.
Our empirical measurements of the sparsityundersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsityundersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.
This is based on joint work with D.L. Donoho, A. Maleki and with M. Bayati.
*February 12, 2010 (Friday)
4:005:30pm, Barus & Holley 190 [refreshments begin at 3:45pm]
Division of Applied Mathematics / Center for Vision Research Seminar
Learning Hierarchies of Sparse Visual Features
Yann LeCun
Silver Professor of Computer Science and Neural Science
The Courant Institute of Mathematical Sciences and Center for Neural Science
New York University
Intelligent tasks such object recognition, auditory scene analysis, or language understanding require the construction of good internal representations of the world. Internal representations (or "features") must be invariant (or robust) to irrelevant variations of the input, but must preserve the information relevant to the task. An important goal of our research, and an important challenge for Machine Learning over the next few years, is to devise methods that can automatically learn good internal representations from labeled and unlabeled data. Theoretical and empirical evidence suggest that the visual world is best represented by a multistage hierarchy, in which features in successive stages are increasingly global, invariant, and abstract. The main question is how can one train such deep architectures from unlabeled data and limited amounts of labeled data.
We describe a class of methods to train multistage systems in which each stage performs a series of convolutions followed by simple nonlinearities. The unsupervised learning phase is based on sparse coding methods, but includes a feedforward predictor that gives a quick approximation of the sparse code. A number of such stages are stacked and trained sequentially in an unsupervised manner. The entire system is then refined in a supervised manner.
An application to categorylevel object recognition with invariance to pose and illumination will be described. By stacking multiple stages of sparse features, and refining the whole system with supervised training, statetheart accuracy can be achieved on standard datasets with very few labeled samples. A realtime demo will be shown. Another application to visionbased navigation for offroad mobile robots will be shown. After a phase of offline unsupervised learning, the system autonomously learns to discriminate obstacles from traversable areas at long range using labels produced with stereo vision for nearby areas.
This is joint work with YLan Boureau, Karol Gregor, Raia Hadsell, Koray Kavakcuoglu, and Marc'Aurelio Ranzato.
February 3, 2010
121pm, 182 George St, Rm 110
An analysis of connectivity in neuronal population recordings
Asohan Amarasingham
Postdoctoral Fellow
Center for Molecular and Behavioral Neuroscience
Rutgers University
Little is known about how neurons interact through their spike trains in natural conditions. A chief barrier is experimental: simultaneous neuronal recordings are extremely sparse. But recent advances are expanding the number of neurons that can be observed simultaneously in a behaving animal. We will describe a recent study in which we combined simultaneous recordings of medial prefrontal cortex neurons in behaving rats with careful statistical analysis to assess the evidence for functional signatures of anatomical and physiological connectivity in that neuronal population. We will use the scientific setting as a context in which to explore some basic issues in neurophysiological data analysis, with particular emphasis on spike trainresampling techniques. This is joint work with Shigeyoshi Fujisawa, Gyorgy Buzsaki, Matthew Harrison, and Stuart Geman.
*November 20, 2009 (Friday)
4:005:30 pm, 182 George St, Rm 110 [refreshments beginning at 3:45]
Applied Mathematics / Center for Statistical Sciences Seminar
Error controls for multiple hypothesis testing
Zhiyi Chi
Associate Professor
Department of Statistics
University of Connecticut
Multiple hypothesis testing is challenging when signals in data are weak, the distributions of noise and signals are not completely known, or there is statistical dependency in data. I will present some theoretical results on these issues. I will show that selfnormalized large deviations can be useful in dealing with experiment designs for weak signals, linear programming can be used to evaluate pvalues when data distributions are not completely known, and techniques similar to nonlinear filtering can be used to analyze the performance of multiple testing on hidden states of Markov models that may be nonstationary.
November 18, 2009
121pm, 182 George St, Rm 110
Spectral Filtering Approaches to Machine Learning
Lorenzo Rosasco
Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology
In this talk we discuss a computational framework for learning complex high dimensional data. The proposed methods rely on estimating regularized version of datadependent kernel matrices, where regularization is achieved filtering out the unstable components corresponding to small eigenvalues. In many cases these matrices can be interpreted as empirical versions of underlying covariance matrices, integral operators or closely related objects, such as diffusion operators. We will describe different strategies to implement regularization and discuss the corresponding computational properties. The sample complexity of the methods is analyzed and shown to be optimal for a large class of problems. Experimental results in the context of supervised learning classification and vector fields estimation will be presented.
November 11, 2009
121pm, 182 George St, Rm 110
Generative Models for Image Analysis
LoBin Chang
Graduate Student
Division of Applied Mathematics
Brown University
A probabilistic grammar for the grouping and labeling of parts and objects, when taken together with pose and partdependent appearance models, constitutes a generative scene model and a Bayesian framework for image analysis. To the extent that the generative model generates features, as opposed to pixel intensities, the posterior distribution (i.e. the conditional distribution on part and object labels given the image) is based on incomplete information; feature vectors are generally insufficient to recover the original intensities. I will propose a way to learn pixellevel models for the appearances of parts. I will demonstrate the utility of the models with some experiments in Bayesian image classification.
*November 6, 2009 (Friday)
4:005:30 pm [refreshments beginning at 3:45 pm] Marcuvitz Auditiorium (Room 220), Sidney Frank Hall, 185 Meeting St.
Division of Applied Mathematics / Center for Vision Research Seminar
Understanding Visual Scenes
Antonio Torralba
Esther and Harold E. Edgerton Associate Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Human visual scene understanding is remarkable: with only a brief glance at an image, an abundance of information is available  spatial structure, scene category and the identity of main objects in the scene. In traditional computer vision, scene and object recognition are two visual tasks generally studied separately. However, it is unclear whether it is possible to build robust systems for scene and object recognition, matching human performance, based only on local representations. Another key component of machine vision algorithms is the access to data that describe the content of images. As the field moves into integrated systems that try to recognize many object classes and learn about contextual relationships between objects, the lack of large annotated datasets hinders the fast development of robust solutions. In the early days, the first challenge a computer vision researcher would encounter would be the difficult task of digitizing a photograph. Even once a picture was in digital form, storing a large number of pictures (say six) consumed most of the available computational resources. In addition to the algorithmic advances required to solve object recognition, a key component to progress is access to data in order to train computational models for the different object classes. This situation has dramatically changed in the last decade, especially via the internet, which has given computer vision researchers access to billions of images and videos. In this talk I will describe recent work on visual scene understanding that try to build integrated models for scene and object recognition, emphasizing the power of large database of annotated images in computer vision.
*November 2, 2009 (Monday)
3:304:30 pm [refreshments beginning at 3:15], 121 South Main St, Rm 245
Center for Statistical Sciences / Division of Applied Mathematics Seminar
Application of Heteroskedastic Spatial Models to Computer Experiments
Richard A. Davis
Howard Levene Professor of Statistics
Department of Statistics
Columbia University
We consider modeling a deterministic computer response as a realization from a stochastic heteroskedastic process (SHP), which incorporates a spatiallycorrelated volatility process into the traditional spatiallycorrelated Gaussian process (GP) model. Unconditionally, the SHP is a stationary nonGaussian process, with stationary GP as a special case. Conditional on a latent process, the SHP is a nonstationary GP. The sample paths of this process offer more modeling flexibility than those produced by a traditional GP, and can better reflect prediction uncertainty. GP prediction error variances depend only on the locations of inputs, while SHP can reflect local inhomogeneities in a response surface through prediction error variances that depend on both input locations and output responses.
We use maximum likelihood for inference, which is complicated by the high dimensionality of the latent process. Accordingly, we develop an importance sampling method for likelihood computation and use a lowrank kriging approximation to reconstruct the latent process. Responses at unobserved locations can be predicted using empirical best predictors or by empirical best linear unbiased predictors. Prediction error variances are also obtained. In examples with simulated and real computer experiment data, the SHP model is superior to traditional GP models. (This is joint work with Jay Breidt, Wenying Huang, and Ke Wang.)
October 28, 2009
121pm, 182 George St, Rm 110
Statistical Analysis of Climate Ecosystem Dynamics
Surajit Ray
Assistant Professor
Department of Mathematics & Statistics
Boston University
Functional approaches to modeling dynamics of biological systems, trends in financial cycle, seasonal measurements of spectral bands in remote sensing, are becoming increasingly popular as a data analysis tool.Clustering and classification is often an important final objective of functional data analysis. But most model based clustering techniques rely on the assumptions of normal or T distributions and as such are not appropriate for clustering functional data, which often lie on nonlinear manifolds. In this talk we will present a novel model based clustering techniques for analyzing functional data. Our method is based on the modal clustering methodology developed by Li, Ray and Lindsay (2007), but new visualization, computational and inferential techniques need to be designed especially for functional data. The talk will also focus on parallelization of modal clustering. Application of functional clustering to remote sensing data will also be discussed.
October 21, 2009
121pm, 182 George St, Rm 110
Recent Advances on the Geometry of the Riemannian Manifold of Landmarks
Mario Micheli
Assistant Adjunct Professor
Department of Mathematics
University of California, Los Angeles
In the past few years there has been a growing interest, in diverse scientific communities, in endowing "shape spaces" with Riemannian metrics, so to be able to measure similarities between shapes and perform statistical analysis on data sets (e.g. for object recognition, target detection and tracking, classification, and automated medical diagnostics). The geometry of such spaces has started to emerge only very recently; in this talk we will explore the sectional curvature for the Riemannian manifold of landmark points (which is one of the simplest, in that it is finitedimensional) and discuss its effects on applications.
October 14, 2009
121pm, 182 George St, Rm 110
SIFT Flow: Dense Scene Alignment and Its Applications
Ce Liu
Postdoctoral Researcher
Microsoft Research New England
In this talk I will introduce dense scene alignment and how it can be applied to a number of computer vision problems varying from satellite image registration to object recognition and scene parsing. We propose SIFT flow that establishes dense, semantically meaningful correspondence between two images across scenes by matching pixelwise SIFT features. Using SIFT flow, we develop a new framework for image parsing by transferring metadata information, such as annotation, motion and depth, from the images in a large database to an unknown query image. We demonstrate this framework using new applications such as predicting motion from a single image and motion synthesis via object transfer. Based on SIFT flow, we introduce a nonparametric scene parsing system using label transfer, with very promising experimental results suggesting that our system outperforms stateoftheart techniques based on training classifiers.
October 7, 2009
121pm, 182 George St, Rm 110
The probabilistic language of thought
Noah Goodman
Research Scientist
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology
Logic and probability (aka. 'symbols and statistics', 'structured representation and graded inference') are key themes of cognitive science that have long had an uneasy coexistence. I will describe a Probabilistic Language of Thought approach that brings them together into compositional representations with probabilistic meaning. This provides a view of cognition in which mental representations describe causal "working models" of the world, that can be used for reasoning and learning by probabilistic inference. Using this framework I will investigate human concept learning, beginning with simple categorization tasks and extending to acquisition of abstract concepts. I will describe a model of learning the meanings of number words that predicts the staged progression exhibited by children (including the striking conceptual reorganization associated with the "cardinal principle" transition). I will then briefly describe how a probabilistic language of thought explains phenomena of human reasoning, including both classic syllogistic reasoning tasks and more complex reasoning involving causality and agency.
September 30, 2009
121pm, 182 George St, Rm 110
Learning maximumentropy models of salience via EM
Micha Elsner
Graduate Student
Department of Computer Science
Brown University
Coreference resolution is the task of linking a nominal expression (such as a pronoun) to its antecedent in text or speech data. We propose a maximum entropy model which ranks the syntactic environments in which antecedents tend to occur. We learn this model in an unsupervised way using the Expectation/Maximization (EM) algorithm. As an initial application, we describe a consistent, probabilistic reformulation of the pronoun resolution model given in (Charniak+Elsner 09); we then extend this to a highprecision model predicting coreference between noun phrases with the same head word. This is joint work with Eugene Charniak and Mark Johnson; it is very much still in progress, so comments and suggestions are welcome.
September 23, 2009
121pm, 182 George St, Rm 110
Latent SocioSpatial Process Model for Social Networks
Crystal Linkletter
Assistant Professor
Department of Community Health and Center for Statistical Sciences
Brown University
With concerns of bioterrorism, the advent of new epidemics that spread with persontoperson contact, such as SARS, and the rapid growth of online social networking websites, there is currently great interest in building statistical models that emulate social networks. Stochastic network models can provide insight into social interactions and increase understanding of dynamic processes that evolve through society. A major challenge in developing any stochastic social network model is the fact that social connections tend to exhibit unique inherent dependencies. For example, they tend to show a lot of clustering and transitive behavior, heuristically described as “a friend of a friend is a friend.” It might be reasonable to expect that covariate similarities, or “closeness” in social space, should somehow be related to the probability of connection for some social network data. The relationship between covariates and relations is likely to be complex, however, and may in fact be different in different regions of the covariate space. Here, we present a new sociospatial process model that smoothes the relationship between covariates and connections in a sample network using relatively few parameters, so the probabilities of connection for a population can be inferred and likely social network structures generated. Having a predictive social network model is an important step toward the exploration of disease transmission models that depend on an underlying social network.
September 16, 2009
121pm, 182 George St, Rm 110
Organizational meeting & Conditional inference for nonstationary data
Matthew Harrison
Assistant Professor
Division of Applied Mathematics
Brown University
This is a new time and a new format for the pattern theory seminar, so we will begin with an organizational meeting. Bring ideas for things you'd like to see in our seminar series. With the remaining time, I will discuss a semiparametric conditional inference perspective for modeling data that have fast (and interesting) dynamics confounded with slow, nonstationary (and not so interesting) dynamics. This is work in progress with Asohan Amarasingham and Stuart Geman.
