Pattern Theory Lunch Seminar Series

Brown University

*To receive email reminders about upcoming pattern theory seminars, send an email to:

   LISTSERV@LISTSERV.BROWN.EDU

with the following line in the body of the email:

   SUBSCRIBE PTG-SEMINAR-L

Main seminar page

Abstracts


Wednesday, Oct 9, 2019

12-1pm, 182 George St, Room 110

Knockoffs or perturbations, that is a question

Jun S Liu

Professor
Department of Statistics
Harvard University

Simultaneously finding multiple influential variables and controlling the false discovery rate (FDR) for linear regression models is a fundamental problem with a long history. Researchers recently have proposed and examined a few innovative approaches surrounding the idea of creating “knockoff” variables (like spike-ins in biological experiments) to control FDR. As opposed to creating knockoffs, a classical statistical idea is to introduce perturbations and examine the impacts. We introduce here a perturbation-based Gaussian Mirror (GM) method, which creates for each predictor variable a pair of perturbed “mirror variables” by adding and subtracting a randomly generated Gaussian random variable, and proceeds with a certain regression method, such as the ordinary least-square or the Lasso. The mirror variables naturally lead to a test statistic highly effective for controlling the FDR. The proposed GM method does not require strong conditions for the covariates, nor any knowledge of the noise level and relative magnitudes of dimension p and sample size n. We observe that the GM method is more powerful than many existing methods in selecting important variables, subject to the control of FDR especially under the case when high correlations among the covariates exist. Additionally, we provide a method to reliably estimate a confidence interval and upper bound for the number of false discoveries. If time permits, I will also discuss a simpler bootstrap-type perturbation method for estimating FDRs, which is also more powerful than knockoff methods when the predictors are reasonably correlated. The presentation is based on joint work with Xing Xin and Chenguang Dai.


Apr 16, 2019 (Tuesday)

11am-12pm, 170 Hope St, Room 108

A Representational Theory of Grid Cells

Ying Nian Wu

Professor
Department of Statistics
UCLA

Imagine walking in your living room in the dark. Purely based on your self-motion, you can calculate your self-position, and you can also plan your path to light switch. The grid cells in your brains play a key role in these tasks. The grid cells were discovered by Dr. May-Britt Moser and Dr. Edvard Moser, who won the 2014 Nobel Prize for their surprising discovery that such cells fire at regular hexagon grids imposed on the spatial environment. In this talk, I shall present a representational theory of grid cells, where the self-position is represented by a vector and the self-displacement is represented by a matrix that rotates the vector. I will show that hexagon grid patterns can be learned from simulated trajectories. I will also discuss a similar representational scheme for primary visual cortex. The talk is based on joint work with Ruiqi Gao, Jianwen Xie and Song-Chun Zhu.


Feb 13, 2019

12-1pm, 182 George St, Room 110

A combinatorial/algebraic topological approach to nonlinear dynamics

Konstantin Mischaikow

Professor
Department of Mathematics
Rutgers University

Motivated by the increase in data driven science I will discuss a combinatorial/algebraic topological approach to characterizing nonlinear dynamics. In particular, I will describe how order theory can be used to efficiently and effectively organize the decomposition of dynamics and how algebraic topological tools can be used to characterize the structure of the dynamics. I will then propose a definition of nonlinear dynamics based on these structures.

To demonstrate the effectiveness of this approach I will consider several problems from systems and synthetic biology. I will focus on identification and rejection of network models for these types of systems based on functional form and time series data.


Jan 30, 2019

12-1pm, 182 George St, Room 110

Statistical models of large graphs and networks

Peter Orbanz

Associate Professor
Department of Statistics
Columbia University

Relational data is, roughly speaking, any form of data that can be represented as a graph: A social network, user preference data, protein-protein interactions, etc. A recent body of work, by myself and others, aims to develop a statistical theory of such data for problems where a single graph is observed (such as a small part of a large social network). Keywords include graphons, edge-exchangeable and sparse exchangeable graphs, and many latent variable models used in machine learning. I will summarize the main ideas and results of this theory: How and why the exchangeability assumptions implicit in commonly used models for such data may fail; what can be done about it; what we know about convergence; and implications of these results for methods popular in machine learning, such as graph embeddings and empirical risk minimization.


Dec 5, 2018

12-1pm, 182 George St, Room 110

Hidden Physics Models: Machine Learning of Non-Linear Partial Differential Equations

Maziar Raissi

Assistant Professor (Research)
Division of Applied Mathematics
Brown University

A grand challenge with great opportunities is to develop a coherent framework that enables blending conservation laws, physical principles, and/or phenomenological behaviours expressed by differential equations with the vast data sets available in many fields of engineering, science, and technology. At the intersection of probabilistic machine learning, deep learning, and scientific computations, this work is pursuing the overall vision to establish promising new directions for harnessing the long-standing developments of classical methods in applied mathematics and mathematical physics to design learning machines with the ability to operate in complex domains without requiring large quantities of data. To materialize this vision, this work is exploring two complementary directions: (1) designing data-efficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and non-linear differential equations, to extract patterns from high-dimensional data generated from experiments, and (2) designing novel numerical algorithms that can seamlessly blend equations and noisy multi-fidelity data, infer latent quantities of interest (e.g., the solution to a differential equation), and naturally quantify uncertainty in computations. The latter is aligned in spirit with the emerging field of probabilistic numerics.


Nov 7, 2018

12-1pm, 182 George St, Room 110

Visual Program Induction

Daniel Ritchie

Assistant Professor of Computer Science
Brown University

Many elements of our visual world can be compactly described by programs: the repetitive structures in graphical patterns, the symmetric parts of furniture and other human-made objects, and so on. In this talk, I'll present some recent work on automatically inferring such programs from raw perceptual inputs (2D images or unstructured 3D volumes). Our methods use deep neural networks which learn to perform this inference from a large number of supervised examples. As it is rare to find (and expensive to construct) large corpora of paired (image, program) data in the real world, we train these models primarily using synthetic data. I'll show visual program inference results for three domains: 2D figures, part-based 3D objects, and simple 3D scenes.


Oct 24, 2018

12-1pm, 182 George St, Room 110

Approximate Message Passing Algorithms for High Dimensional Statistical Estimation

Cynthia Rush

Assistant Professor of Statistics
Columbia University

Approximate Message Passing or AMP is a class of low complexity, scalable algorithms used to solve high-dimensional noisy linear regression problems where the goal is to estimate a vector x from a noisy measurement y = Ax + noise. AMP has the attractive feature that its performance, measured for example by the squared error loss, can be tracked accurately by a scalar iteration referred to as state evolution. In this talk, I will present recent performance guarantees for analysis of this algorithm under various problem conditions and I will introduce recent applications in statistical inference and estimation.


Oct 17, 2018

12-1pm, 182 George St, Room 110

Unsupervised Learning for Large Scale Medical Image Analysis

Adrian Dalca

Postdoctoral Fellow
CSAIL, Massachusetts Institute of Technology
Massachusetts General Hospital, Harvard Medical Schools

New approaches to the analysis of medical images can lead to new insights into complex diseases. In particular, there is a wealth of untapped clinical knowledge to be gained from the millions of diagnostic, low resolution clinical scans acquired every day as part of standard medical practice. In this talk, I will present machine learning techniques that can enable the use of clinical image collections in broad medical research studies. First, I define image imputation – the statistical inference of unobserved image anatomy. I describe an imputation method that exploits anatomical structure in collections of clinically-acquired low resolution scans, and learns to synthesize missing anatomy. This method enables subsequent studies not previously possible with clinical scans. Second, I present an unsupervised learning algorithm for deformable medical image registration that is up to 20,000 times faster than the current state of the art, while preserving topological guarantees. This enables, for the first time, efficient processing of image collections comprised of tens of thousands of scans. Finally, I demonstrate analysis methods enabled by these algorithms that are currently being used in studies across twelve hospital sites to derive new insights about patients who have suffered a stroke.


Oct 3, 2018

12-1pm, 182 George St, Room 110

Variable Prioritization in “Black Box” Statistical Methods

Lorin Crawford

Assistant Professor of Biostatistics
Brown University

A consistent theme of the work done in the Crawford Lab is to take modern computational approaches and develop theory that enable their interpretations to be related back to classical genomic principles. The central aim of this talk is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel, interpretable, and computationally efficient way to summarize the relative importance of predictor variables. Methodologically, we develop the “RelATive cEntrality” (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We will illustrate RATE through Bayesian Gaussian process regression; although, the proposed innovations apply to other nonlinear methods (e.g. deep neural networks). It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and applications to real genome-wide association mapping studies, we show that applying RATE enables an explanation for this improved performance.


May 2, 2018

12-1pm, 182 George St, Room 110

The synchronization problem for Kuramoto oscillators and beyond

Javier Morales

U Maryland

Collective phenomena such as aggregation, flocking, and synchronization are ubiquitous in natural biological, chemical, and mechanical systems--e.g., the flashing of fireflies, chorusing of crickets, synchronous firing of cardiac pacemakers, and metabolic synchrony in yeast cell suspensions. The Kuramoto model introduced by Yoshiki Kuramoto is one of the first theoretical tools developed to understand such phenomena and has recently gained extensive attention in the physical and mathematical community. Moreover, it has become the starting point of several generalizations that have applications ranging from opinion dynamics to the development of human-made interacting multi-agent system of UAVs and data clustering. In this talk, we will review the state of the art for the synchronization problem of the Kuramoto model at the kinetic and particle level. Additionally, we will introduce new developments and variational techniques for the dynamics of this model and some of its variants and its generalization.


April 18, 2018

12-1pm, 182 George St, Room 110

Machine Learning in a Setting of Ordinal Distance Information

Matthaus Kleindessner

Rutgers

In a typical machine learning scenario we are given numerical dissimilarity values between objects (or feature representations of objects, from which such dissimilarity values can readily be computed). In the relaxed setting of ordinal distance information we are only provided with binary answers to comparisons of dissimilarity values such as d(A,B)<d(A,C) instead. This setting has attracted interest in recent years, mainly due to the possibility of simple collection of such data by means of crowdsourcing.

My talk will have two parts. First, I will talk about a result that states the asymptotic uniqueness of ordinal embeddings. Ordinal embedding, up to now, is the standard approach to machine learning in a setting of ordinal distance information. The idea is to map the objects of a data set to points in a Euclidean space such that the points preserve the given ordinal distance information as well as possible (with respect to the Euclidean interpoint distances). Second, I will introduce data-dependent kernel functions that can be evaluated given only ordinal distance information about a data set. They provide a generic alternative to the ordinal embedding approach and avoid some of its drawbacks. For both works, I want to address a number of open questions.


April 11, 2018

12-1pm, 182 George St, Room 110

Recent Advances in Elastic Functional and Shape Data Analysis

Anuj Srivastava

Florida State University

Functional and shape data analysis (FSDA) is fast becoming an important research area, due to its broad applications in many branches of science, including biostatistics and bioinformatics. An essential component of FSDA is registration of points across functional objects. Without proper registration, the results are often inferior and difficult to interpret. The current practice in FSDA community is to treat registration as a pre-processing step, using off-the-shelf alignment procedures, and follow it up with statistical analysis of the resulting data. In contrast, Elastic FSDA is a more comprehensive approach, where one solves for the registration and statistical inferences in a simultaneous fashion. The key idea here is to use Riemannian metrics with appropriate invariance properties, to form objective functions for alignment and to develop statistical models involving functional data. While these elastic metrics are complicated in general, we have developed a family of square-root transformations that map these metrics into simpler Euclidean representations, thus enabling more standard statistical procedures. Specifically, we have developed techniques for elastic functional PCA and elastic regression models involving functional variables. I will demonstrate this ideas using imaging data in neuroscience and bioinformatics, where biological structures can often be represented as functions (curves or surfaces) on intervals or spheres. Examples of curves include DTI fiber tracts and chromosomes while examples of surfaces include subcortical structures (hippocampus, thalamus, putamen, etc). Statistical goals here include shape analysis and modeling of these structures and to use their shapes in medical diagnosis.


March 21, 2018

12-1pm, 182 George St, Room 110

Measure Transport for Bayesian Inference: theory and applications.

Daniele Bigoni

MIT

Measure transport is a valuable tool for characterizing, sampling and manipulating multivariate non-Gaussian target distributions. This method has a broad range of applications -- e.g., the solution of Bayesian inverse problems, as well as filtering and smoothing of dynamical systems. The transport maps framework seeks a deterministic parametric map that pushes forward a tractable reference distribution to a potentially complex target distribution of interest. The construction of high-dimensional maps may be challenging due to the curse of
dimensionality. In many cases, though, one can leverage a number of sources of low-dimensional structure: marginal independence, smoothness, separability, conditional independence, to just name a few. In this seminar we will outline the transport map framework and some of the key ingredients useful to tackle high-dimensional problems. The presentation will be accompanied by examples of Bayesian inference problems in geophysics and finance.


February 7, 2018

12-1pm, 182 George St, Room 110

Causes and Consequences of Human Genomic Variation

Sohini Ramachandran

Brown University

Abstract: Research in the Ramachandran lab addresses problems in population genetics and evolutionary theory, generally using humans as a study system. Our goal is to infer modern human evolutionary history — how mutation, natural selection, and population histories have interacted to produce observed genetic variation in present-day humans — from genomic data alone. Because of advances in sequencing technology, population-genetic problems are inherently quantitative problems: How can we identify subpopulations in the human species? Which of the millions of common DNA variants predict an individual’s risk for a disease of interest? Can we classify beneficial mutations versus selectively neutral (so-called “junk”) mutations? I’ll discuss our recent approaches for answering these questions, in which we draw on topic models, optimization, information theory, and supervised learning.


December 6, 2017

12-1pm, 182 George St, Room 110

Advanced Editing, Exploration, and Interaction for Video

James Tompkin

Brown

Video best represents how we visually perceive, and with ubiquitous cameras everyone can capture imagery of our world. However, our tools to edit, explore, and interact with video often require laborious work or expert skill, which makes it merely a consumption medium for most people. With advances in image understanding through computer graphics, vision, and interaction, we can change how we think about video: First, to transform video from a rigid and inaccessible medium into a malleable and creative one, so that non-experts can make sophisticated content-based edits. Second, to transform video from a linear medium viewed sequentially into novel interactive content-based and context-aware visual explorations. Third, to transform video cameras from passive observers into essential tools for real-time creation and interaction with digital imagery.


November 29, 2017

12-1pm, 182 George St, Room 110

Sequential Bayesian inference via low-dimensional couplings

Youseff Marzouk

MIT

Integration against an intractable probability measure is among the fundamental challenges of Bayesian inference. A useful approach to this problem seeks a deterministic coupling of the measure of interest with a tractable "reference" measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Approximate transports can also be used to "precondition" standard Monte Carlo schemes. Yet characterizing a suitable transport map---e.g., representing, constructing, and evaluating it---grows challenging in high dimensions.

We establish links between the conditional independence structure of the target measure and the existence of certain low-dimensional couplings, induced by transport maps that are sparse or decomposable. Our analysis not only facilitates the construction of couplings in high-dimensional settings, but also suggests new inference methodologies. We will highlight applications of our theory to nonlinear state-space models, where sparse and/or decomposable transports yield new variational algorithms for filtering, smoothing, and sequential parameter inference.

This is joint work with Alessio Spantini and Daniele Bigoni.


November 8, 2017

12-1pm, 182 George St, Room 110

Toward perceptually consistent stereo

Todd Zickler

Harvard University

There are two sources of shape information in a pair of binocular stereo images. One is the correlation (matching) signal from surfaces that are visible to both cameras. The other is the decorrelation (anti-matching) signal from regions that are visible to one camera but occluded in the other. Vision science has repeatedly shown that both types of information are used in the visual cortex, and that people can perceive depth even when correlation cues are absent or very weak, a capability that remains absent from most computational stereo systems.

I will describe two research directions that (hopefully) move us toward computational stereo algorithms that are more consistent with these perceptual phenomena. Both directions are based on representing a depth map as a piecewise smooth function over the visual field, with a flexible notion of smoothness. One is a scanline algorithm that naturally combines correlation and decorrelation cues, and that matches human perception on a collection of well-known perceptual stimuli. The other is a 2D algorithm that efficiently exploits piecewise smoothness, but so far without incorporating decorrelation cues. At the end of the talk, I hope to discuss how insights from these two different directions might be combined. The talk is based on two papers:

1. Juialing Wang, Daniel Glasner and Todd Zickler:
ICCV
2017
http://vision.seas.harvard.edu/stereo/

2. Ayan Chakrabarti, Ying Xiong, Steven J. Gortler and Todd Zickler

CVPR 2015. http://ttic.uchicago.edu/~ayanc/consensus/


November 1, 2017

12-1pm, 182 George St, Room 110

Sifting through Measures on Networks: From a Theoretical Framework to an Empirical Guide

Tina Eliassi-Rad

Northeastern University

In this talk, I will discuss two problems on network data. (1) Measuring tie-strength: Given a set of people and a set of events attended by them, how should we measure connectedness or tie strength between each pair of persons? The underlying assumption is that attendance at mutual events produces an implicit social network between people. I will describe an axiomatic solution to this problem. (2) Measuring similarity between networks: Given two networks (without known node-correspondences), how should we measure similarity between them? This problem occurs frequently in many real-world applications such as transfer learning, re-identification, and change detection. I will present an empirical guide on how to select a network-similarity method, and discuss some promising results based on topological data analysis.


October 25, 2017

12-1pm, 182 George St, Room 110

Feature Driven Exploration of Images Using Topology

Attila Gyulassy

Scientific Computing and Imaging Institute
University of Utah

Two - and three-dimensional images arise in a multitude of disciplines, from computer simulations to experimental sources such as microscopy. A key step to scientific discovery is extracting semantic information from image data. This talk presents practical application of Morse theory to extract and reason with features intrinsic to images. We will discuss both practical algorithms for computing the Morse-Smale complex as well as real-world applications drawing on imaged and simulated examples from material science, basic energy research, geologic exploration, and neuroscience.


October 11, 2017

12-1pm, 182 George St, Room 110

Multireference Alignment with Nonperiodic Distribution is Easier

Nir Sharon

Program of Applied and Computational Mathematics
Princeton University

Multireference alignment is the problem of estimating a signal from its circularly translated copies in the presence of noise. Recently, It was shown that if the translations are drawn from the uniform distribution, the number of samples needed is proportional to 1/SNR^3. We prove that for any nonperiodic distribution, the sample complexity is actually 1/SNR^2. This rate is optimal and can be achieved by a simple spectral algorithm. We further propose and analyze two additional algorithms based on non-convex optimization and semidefinite programming.


September 20, 2017

12-1pm, 182 George St, Room 110

Graph-based Bayesian learning: continuum limits and algorithms

Daniel Sanz-Alonso

Division of Applied Mathematics
Brown University

The principled learning of functions from data is at the core of statistics, machine learning and artificial intelligence. The aim of this talk is to present some new theoretical and methodological developments concerning the graph-based, Bayesian approach to semi-supervised learning. I will show suitable scalings of graph parameters that provably lead to robust Bayesian solutions in the limit of large number of unlabeled data. The analysis relies on a careful choice of topology and in the study of the the spectrum of graph Laplacians. Besides guaranteeing the consistency of graph-based methods, our theory explains the robustness of discretized function space MCMC methods in semi-supervised learning settings. This is joint work with Nicolas Garcia Trillos, Zachary Kaplan, and Thabo Samakhoana.


May 3, 2017

12-1pm, 182 George St, Room 110

PCA from noisy linearly transformed measurements

Amit Singer

Professor
Department of Mathematics
Princeton University

We consider the problem of estimating the covariance of X from measurements of the form y_i = A_i*x_i + e_i (for i = 1, . . . , n) where x_i are i.i.d. unobserved samples of X, A_i are given linear operators, and e_i represent noise. Our estimator is constructed efficiently via a simple linear inversion using conjugate gradient performed after eigenvalue shrinkage motivated by the spiked model in high dimensional PCA. Applications to low-rank matrix completion, 2D image denoising, and 3D structure classification in single particle cryo-electron microscopy will be discussed.


April 26, 2017

12-1pm, 182 George St, Room 110

Numerical Gaussian Processes (Physics Informed Learning Machines)

Maziar Raissi

Postdoctoral Research Associate
Division of Applied Mathematics
Brown University

We introduce the concept of numerical Gaussian processes, which we define as Gaussian processes with covariance functions resulting from temporal discretization of time-dependent partial differential equations. Numerical Gaussian processes, by construction, are designed to deal with cases where: (1) all we observe are noisy data on black-box initial conditions, and (2) we are interested in quantifying the uncertainty associated with such noisy data in our solutions to time-dependent partial differential equations. Our method circumvents the need for spatial discretization of the differential operators by proper placement of Gaussian process priors. This is an attempt to construct structured and data-efficient learning machines, which are explicitly informed by the underlying physics that possibly generated the observed data. The effectiveness of the proposed approach is demonstrated through several benchmark problems involving linear and nonlinear time-dependent operators. In all examples, we are able to recover accurate approximations of the latent solutions, and consistently propagate uncertainty, even in cases involving very long time integration.


April 19, 2017

12-1pm, 182 George St, Room 110

Inference in Dynamical Systems

Sayan Mukherjee

Professor
Department of Statistical Science
Duke University

We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures. We also relate Bayesian inference to the thermodynamic formalism in tracking dynamical systems. We state conditions for consistency of a Gibbs distribution using classic ideas from dynamical systems such as topological entropy and pressure.


March 8, 2017

12-1pm, 182 George St, Room 110

Tropical Coordinates on the Space of Persistence Barcodes

Sara Kalisnik Verovsek

Postdoctoral Research Associate
Department of Mathematics
Data Science Initiative
Brown University

In the last two decades applied topologists have developed numerous methods for ‘measuring’ and building combinatorial representations of the shape of the data. The most famous example of the former is persistent homology. This adaptation of classical homology assigns a barcode, i.e. a collection of intervals with endpoints on the real line, to a finite metric space. Unfortunately, barcodes are not well-adapted for use by practitioners in machine learning tasks. We can circumvent this problem by assigning numerical quantities to barcodes and these outputs can then be used as input to standard algorithms. I will talk about max-plus polynomials and tropical rational functions that can be used as coordinates on the space of barcodes. All of these are stable with respect to the standard distance functions (Bottleneck, Wasserstein) used on the barcode space.


February 8, 2017

12-1pm, 182 George St, Room 110

Deformation models for image and shape matching

Ronen Basri

Professor
Department of Computer Science and Applied Mathematics
Weizmann Institute of Science

Modeling deformations is important for various applications in computer vision, graphics and geometry processing. In this talk I will describe our recent progress in modeling deformations. In particular, I will describe methods for computing bounded distortion transformations, locally injective maps whose differentials' conformal distortion is bounded. Toward this end, we developed a convex framework for solving optimization problems over matrices that involve functionals and constraints expressed in terms of their extremal singular values. In addition, I will describe methods for computing physically-motivated elastic maps between shapes. We have applied these methods in a number of challenging problems, including feature matching between images related by non-rigid deformation, non-rigid registration of shape models, and computing extremal quasi-conformal maps.


February 1, 2017

12-1pm, 182 George St, Room 110

Non-equilibrium transitions between metastable patterns in populations of motile bacteria

Eric Vanden-Eijnden

Professor
Courant Institute of Mathematical Sciences
New York University

Active materials can self-organize in many more ways than their equilibrium counterparts. For example, self-propelled particles whose velocity decreases with their density can display motility-induced phase separation (MIPS), a phenomenon building on a positive feedback loop in which patterns emerge in locations where the particles slow down. Here, we investigate the effects of intrinsic fluctuations in the system's dynamics on MIPS, using a field theoretic description building on results by Cates and collaborators. We show that these fluctuations can lead to transitions between metastable patterns. The pathway and rate of these transitions is analyzed within the realm of large deviation theory, and they are shown to proceed in a very different way than one would predict from arguments based on detailed-balance and microscopic reversibility. Specifically, we show that these transitions involve fluctuations in diffiusivity of the bacteria followed by fluctuations in their population, in a specific sequence. The method of analysis proposed here, including its numerical component, can be used to study noise-induced non-equilibrium transitions in a variety of other non-equilibrium set-ups, and leads to predictions that are verifiable experimentally.


November 30, 2016

12-1pm, 182 George St, Room 110

How high-order image statistics shape cortical visual processing

Jonathan Victor

Professor
Brain and Mind Research Institute
Department of Neurology
Cornell University

Several decades of work have suggested that Barlow's principle of efficient coding is a powerful framework for understanding retinal design principles. Whether a similar notion extends to cortical visual processing is less clear, as there is no "bottleneck" comparable to the optic nerve, and much redundancy has already been removed. Here, we present convergent psychophysical and physiological evidence that regularities of high-order image statistics are indeed exploited by central visual processing, and at a surprising level of detail.

The starting point is a study of natural image statistics (Tkacic et al., 2010), in which we showed that high-order correlations in certain specific spatial configurations are informative, while high-order correlations in other spatial configurations are not: they can be accurately guessed from lower-order ones. We then construct artificial images (visual textures) composed either of informative or uninformative correlations. We find that informative high-order correlations are visually salient, while the uninformative correlations are nearly imperceptible. Physiological studies in macaque visual cortex identify the locus of the underlying computations. First, neuronal responses in macaque V1 and V2 mirror the psychophysical findings, in that many neurons respond differentially to the informative statistics, while few respond to the uninformative ones. Moreover, the differential responses largely arise in the supragranular layers, indicating that the computations are the result of intracortical processing.

We then consider low- and high-order local image statistics together, and apply a dimension-reduction (binarization) to cast them into a 10-dimensional space. We determine the perceptual isodiscrimination surfaces within this space. These are well-approximated by ellipsoids, and the principal axes of the ellipsoids correspond to the distribution of the local statistics in natural images. Interestingly, this correspondence differs in specific ways from the predictions of a model that implements efficient coding in an unrestricted manner. These deviations provide insights into the strategies that underlie the representation of image statistics..


November 16, 2016

12-1pm, 182 George St, Room 110

Single-Channel Multi-Speaker Separation using Deep Clustering

John Hershey

Senior Principal Research Scientist
Mitsubishi Electric Research Laboratories (MERL)
Cambridge, Massachusetts

The human auditory system gives us the extraordinary ability to converse in the midst of a noisy throng of party goers. Solving this so-called cocktail party problem has proven extremely challenging for computers, and separating and recognizing speech in such conditions has been the holy grail of speech processing for more than 50 years. Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering, producing unprecedented speaker-independent single-channel separation performance on two-speaker and three-speaker mixtures. In this framework, a neural network is trained to assign an embedding vector to each element of a multi-dimensional signal, such that clustering the embeddings yields a desired segmentation of the signal. In the cocktail-party problem, the embeddings are assigned to each time-frequency (TF) index of the short-time Fourier transform (STFT) of the mixture of speech signals. Clustering these embeddings yields an assignment of each TF bin to one of the inferred sources. These assignments are used as a masking function to extract the dominant parts of each source. This method has produced remarkable performance, reaching levels of improvement -- over 10 dB SNR gain -- that were previously unobtainable even in simpler speech enhancement tasks. Extensions to the model allow for end-to-end training to optimize the whole architecture for best signal quality. We also evaluate our method using automatic speech recognition (ASR), and show that it can reduce the word error rate (WER) from 89.1


November 9, 2016

12-1pm, 182 George St, Room 110

Joint Scientific Computing and Pattern Theory Seminar

Solution uncertainty quantification for differential equations

Oksana Chkrebtii

Assistant Professor
Department of Statistics
Ohio State University

When models are defined implicitly by systems of differential equations without a closed form solution, small local errors in finite-dimensional solution approximations can propagate into large deviations from the true underlying state trajectory. Inference for such models relies on a likelihood approximation constructed around a numerical solution, which underestimates posterior uncertainty. This talk will introduce and discuss progress in a new formalism for modeling and propagating discretization uncertainty through the Bayesian inferential framework, allowing exact inference and uncertainty quantification for discretized differential equation models.


November 2, 2016

12-1pm, 182 George St, Room 110

Robust Bayesian inference via coarsening

Tamara Broderick

Assistant Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practice---at least in part due to the difficulty of calculating robustness measures from MCMC draws. We demonstrate that, by contrast to MCMC, variational Bayes (VB) techniques are readily amenable to robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for mean-field variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a well-known major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVB---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).


October 5, 2016

12-1pm, 182 George St, Room 110

Robust Bayesian inference via coarsening

Jeffrey Miller

Assistant Professor
Department of Biostatistics
Harvard University

The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure, particularly when the data set is large. We introduce a simple, coherent approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the observed data exactly, one conditions on the event that the model generates data close to the observed data, with respect to a given statistical distance. When closeness is defined in terms of relative entropy, the resulting "coarsened posterior" can be approximated by simply raising the likelihood to a certain fractional power, making the method computationally efficient and easy to implement in practice. We illustrate with real and simulated data, and provide theoretical results.


September 28, 2016

12-1pm, 182 George St, Room 110

Title: New Perspectives on Importance Sampling

Daniel Sanz Alonso

Postdoctoral Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University

Importance sampling is a building block of many algorithms in computational statistics, perhaps most notably particle filters. It is the importance sampling step that often limits the accuracy of these algorithms. In this talk I will introduce a new way of understanding importance sampling based on information theory. I will argue that the fundamental problem facing algorithms based on importance sampling can be understood in terms of the distance between certain measures. The results give new understanding on the potential use of importance sampling and particle filters in high (possibly infinite) dimensional spaces.


September 21, 2016

12-1pm, 182 George St, Room 110

Structure-based comparisons for sequential data

Katherine Kinnaird

Postdoctoral Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University

We present aligned hierarchies, a low-dimensional representation for sequential data streams. The aligned hierarchies encode all hierarchical decompositions of repeated elements from a high-dimensional and noisy sequential data stream in one object. These aligned hierarchies can be embedded into a classification space with a natural notion of distance. We motivate our discussion through the lens of Music Information Retrieval (MIR), constructing aligned hierarchies by finding, encoding, and synthesizing all repeated structure present in a song. For a data set of digitized scores, we conducted experiments addressing the fingerprint task, a song comparison task in MIR, that achieved perfect precision-recall values and provide a proof of concept for the aligned hierarchies.

We also introduce aligned sub-hierarchies and aligned sub-decompositions. Both derived from the aligned hierarchies, these structure based representations for songs can be embedded into classification spaces and can address additional MIR tasks. We will compare properties of the aligned hierarchies, aligned sub-hierarchies, and the aligned sub-decompositions.


September 14, 2016

12-1pm, 182 George St, Room 110

Discovering the Nature of Nonlinear Relationships

Joshua Vogelstein

Assistant Professor
Department of Biomedical Engineering
Institute for Computational Medicine
Johns Hopkins University

As data collection is becoming easier, it is becoming increasingly difficult and important to answer questions of the form: is one property (e.g., clouds) related to another (e.g., grass wetness). Only If we can determine that these two properties are related to---or statistically dependent on---one another does it make sense to further investigate the nature of this relationship. Unfortunately, reliably establishing such a relationship can be challenging, especially when the properties themselves are complex and the relationship is nonlinear. We here describe a procedure, called Multiscale Generalized Correlation (MGC), that addresses these challenges. Our key insight is that if two properties are related, comparisons between measurements of similar pairs of the first property (e.g., similarly shaped clouds) should be correlated with the comparisons between corresponding pairs of the second property (grass wetness under those clouds). We demonstrate the statistical and computational efficiency of MGC in both simulations and theory. We then apply it to detect the presence and nature of the relationships between brain activity and personality, brain shape and disorder, and brain connectivity and creativity. Finally, we demonstrate that MGC does not suffer from the false positives that have plagued conventional dependence tests. Our open source implementation of MGC is applicable to fundamental questions confronting science, government, finance, and many other disciplines.


May 18, 2016

12-1pm, 182 George St, Room 110

Learning Lattice Operators used in Computer Vision Tasks

Junior Barrera

Professor of Computer Science
University of Sao Paulo, Brazil

The transformation of images by lattice operators has some interesting properties: they keep in the discrete and quantized image domain; theoretical results guarantee that any transformation can be implemented by a lattice operator; lattice operators can be implemented through simple elementary lattice operators; lattice operators can be learned through discrete machine learning techniques. In this lecture, we present the general algebraic representation of lattice operator and its application in the learning of these operators. Several approaches for efficient learning (i.e., from a relatively small sample in a reasonable processing time) of complex operators are presented. Between them is the U-curve problem, which is a new combinatorial optimization problem useful for approaching the classical feature selection problem. Another remarkable result is the multi-resolution learning, witch, for a fixed sample, avoids over fitting and gives decreasing estimated errors for an increasing number of parameters. The techniques are illustrated by the learning of several kinds of binary and gray scale image transformations: noise reduction; interpolation; enhancement; segmentation; texture recognition; shape recognition; etc. All the results presented refer to already published papers, but some open problems are pointed and discussed.


April 20, 2016

12-1pm, 182 George St, Room 110

Learning Causal Graphical Models of Large-Scale Systems

David Jensen

Professor of Computer Science
Director of the Knowledge Discovery Laboratory
University of Massachusetts Amherst

Effective methods for inferring causal dependence from observational data have been developed within both computer science and quantitative social science. Methods in computer science have focused on the correspondence between casual graphical models and observed patterns of statistical association. Methods in social science have focused on templates for causal inference often called quasi-experimental designs, including instrumental variables, propensity scores matching, regression discontinuinty designs, and interrupted time-series designs. Recent work has begun to unify these methods using the framework of causal graphical models, but barriers remain because the formal framework of causal graphical models has been insufficiently expressive to represent some quasi-experimental designs. In this talk, I will introduce many of the known experimental and quasi-experimental designs in the language of directed graphical models, and I will present very recent work on defining additional designs in terms of classes of graphical models that can represent relational and temporal dependence. Finally, I will present two novel designs that have resulted from our work on causal inference in relational data.


March 23, 2016

12-1pm, 182 George St, Room 110

The Dynamics of the Unconscious Brain Under General Anesthesia

Emery Brown

Warren M. Zapol Professor of Anaesthesia
Massachusetts General Hospital and Harvard Medical School
Edward Hood Taplin Professor of Medical Engineering & Computational Neuroscience
Massachusetts Institute of Technology

General anesthesia is a drug-induced, reversible condition comprised of five behavioral states: unconsciousness, amnesia (loss of memory), analgesia (loss of pain sensation), akinesia (immobility), and hemodynamic stability with control of the stress response. Our work shows that a primary mechanism through which anesthetics create these altered states of arousal is by initiating and maintaining highly structured oscillations. We present findings from our human studies of general anesthesia using high-density EEG recordings and intracranial recordings which have allowed us to give a detailed characterization of the neurophysiology of loss and recovery of consciousness due to propofol. We show how these dynamics change with other anesthetics and with age. We present a neuro-metabolic model of burst suppression, the profound state of brain inactivation seen in deep states of general anesthesia. We use our characterization of burst suppression to implement a closed-loop anesthesia delivery system for control of a medically-induced coma. Finally, we demonstrate that the state of general anesthesia can be rapidly reversed by activating specific brain circuits. The success of our research has depended critically on tight coupling of experiments, signal processing research and mathematical modeling.


March 16, 2016

12-1pm, 182 George St, Room 110

A simple network model for a variety of Delay Match to Sample tasks

Yali Amit

Professor
Department of Statistics and Department of Computer Science
University of Chicago

Delay match to sample experiments (DMS) have inspired much of the modeling work on attractor neural networks. The basic experiment involves showing a target image, removing it, and after a delay showing a cue image: either the original image or a different one. The monkey needs to indicate if the cue is the same or different than the target. Electrophysiological recordings have shown that if the target is a learned one (has been observed multiple times) neurons selective for it maintain activity during the delay between target and cue presentation. This persistent activity is hypothesized to represent 'working' or 'short term' memory. The attractor network model posits that stimulation with learned patterns leads to sustained activity due to the learned recurrent synaptic connections in the network. There are a number of variations on the basic DMS paradigm involving distractors in between target and cue, or repetition detection experiments where a sequence of images is shown and one of them chosen at random is repeated. These different experiments raise several questions that are rarely addressed in the literature. With distractors, how does the network ensure the target sample pattern stays in working memory? How does the network figure out whether the cue matches the target? How does the network avoid distractor repetitions? Additional interesting phenomena have been observed in more recent experiments on repetition detection, including better performance with novel patterns than with learned patterns.

I will present a parsimonious network model of binary neurons and binary synapses and show how all these phenomena can be handled within this framework, using simple adjustments of certain global parameters such as inhibition, noise level and depression rate. Recent experiments also show that average responses to novel stimuli in IT are higher than to learned stimuli. I will show what adjustment needs to be made to accommodate this phenomenon. Time permitting I will discuss how these issues can be reconciled in a two layer network that can also perform classification on real data.


March 2, 2016

12-1pm, 182 George St, Room 110

The shape space defined by the Gromov-Wasserstein distance

Facundo Memoli

Assistant Professor
Department of Mathematics and Department of Computer Science and Engineering The Ohio State University

The Gromov-Wasserstein distance --a variant of the Gromov-Hausdorff distance based on ideas from mass transport-- provides an intrinsic metric on the collection of all metric measure spaces. I will give an overview of its construction, main properties, lower bounds, and their computation.


February 17, 2016

12-1pm, 182 George St, Room 110

New applications and algorithms for submodular probabilistic models

Stefanie Jegelka

Assistant Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Many real-world inference problems are, at their core, subset selection problems. Probabilistic models for such scenarios rely on having sufficiently accurate yet tractable distributions over discrete sets. We focus on sub-families of such distributions whose special mathematical properties are the basis for fast algorithms. As a specific example, Determinantal Point Processes (DPPs) have recently become popular in machine learning, as elegant and tractable probabilistic models of diversity. We explore new applications of DPPs for variational inference over combinatorial objects, such as coupled cascades in a collection of networks, where we are able to leverage combinatorial and convex structure in the problem.
While sampling from DPPs is possible in polynomial time, the associated algorithm is not practical for large data. In the second part of the talk, I will outline ideas for faster sampling that build on new insights for algorithms that compute bilinear inverse forms. These results have applications beyond DPPs, including sensing with Gaussian Processes and submodular maximization.

This is joint work with Chengtao Li, Josip Djolonga, Suvrit Sra and Andreas Krause.


February 3, 2016

12-1pm, 182 George St, Room 110

Constructing representations using Bayesian nonparametrics and connections between human knowledge, optimal foraging, and random walks on graphs

Joseph Austerweil

Assistant Professor
Department of Cognitive, Linguistic, and Psychological Sciences
Brown University

This talk has two parts (how much detail we go into each will be guided by audience interest).

In the first part of the talk, I outline a statistical framework for constructing representations in a human-like manner. People still outperform computational methods on a diverse set of problems, ranging from language learning to recognizing objects in a scene. To understand human success on these problems, cognitive scientists appeal to representation as a key explanatory device. From a statistical perspective, representations can be interpreted as structure within a probability distribution, which provide biases to prevent overfitting. Puzzlingly, people are remarkably flexible, changing their representation of the same data depending a number of different factors. I will describe a statistical framework for this intriguing phenomenon, which uses Bayesian nonparametric processes.

In the second part of the talk, I describe recent work relating how optimal animals forage for food in a patchy environment to how people search semantic memory. I will discuss how representing semantic memory as a graph enables random walks to mimic optimal foraging. I will conclude with some recent (and ongoing) work that exploits this relation to perform Bayesian inference for a graph from the first hitting times of multiple walks on that graph.


November 18, 2015

12-1pm, 182 George St, Room 110

Big data, Google and disease detection: the statistical story

Samuel Kou

Professor
Department of Statistics
Harvard University

Big data collected from the internet have generated significant interest in not only the academic community but also industry and government agencies. They bring great potential in tracking and predicting massive social activities. We focus on tracking disease epidemics in this talk. We will discuss the applications, in particular Google Flu Trends, some of the fallacy and the statistical implications. We will propose a new model that utilizes publicly available online data to estimate disease epidemics. Our model outperforms all previous real-time tracking models for influenza epidemics at the national level of the US. We will also draw some lessons for big data applications.


October 28, 2015

12-1pm, 182 George St, Room 110

A stochastic model for Gompertzian growth

Benar Svaiter

Professor
IMPA, Brazil

The Gompertz curve has been successfully used for modeling mortality of an aging population since its creation in the 1820's. In the 1960's a time reversed version of this curve was used by Laird to model tumoral and somatic growth. Since them, this is the standard model for tumoral and somatic growth and has been used also for modeling regeneration. Why is it that a model for mortality has been so apt to model growth? We propose a stochastic model for cell divisions based on recent findings on DNA structure and show that Gompertzian growth is the thermodynamic limit of this model after a suitable renormalization.


October 14, 2015

12-1pm, 182 George St, Room 110

Importance sampling in large-scale machine learning problems: why it works and how it can help

Rachel Ward

Assistant Professor
Department of Mathematics
University of Texas at Austin

A recent trend in signal processing and machine learning research is that exact reconstruction is achievable from highly subsampled data by passing to nonlinear, sparsity-inducing, reconstruction methods such as l1 minimization. Such guarantees often require strong structural conditions on the data in addition to sparsity, such as incoherence, which render the theory unusable on problems of practical importance. Here, we show that many of these strong assumptions are tied to i.i.d uniform sampling, and can be dropped by allowing weighted, or importance sampling. First, we explain why importance sampling works in this context: it aims to make the inverse problem as well-conditioned as possible given a fixed sample budget. We then discuss several problem domains where importance sampling strategies can be derived explicitly, and outperform state-of-the-art sampling strategies used in practice: medical imaging, collaborative filtering, uncertainty quantification, and stochastic gradient methods. Along the way, we derive results at the intersection of applied harmonic analysis and random matrix theory that are of independent interest.


October 7, 2015

12-1pm, 182 George St, Room 110

Local Shape from Shading with a Generic Constraint

Ben Kunsberg

Prager Assistant Professor
Division of Applied Mathematics
Brown University

Humans have a remarkable ability to infer shape from shading (SFS) information. In computer vision this is often formulated with a Lambertian reflectance function, but it remains under-posed and incompletely solved. Abstractly, the intensity in an image is a single valued function and the goal is to uncover the vector valued normal function. This ill-posedness has resulted in many proposed techniques that are either regularizations or propagations from known values. Our goal is to understand, mathematically and computationally, how we solve this problem.

First, it has been shown psychophysically that our perception (via gauge figure estimates) is remarkably accurate even when the boundary is masked. Thus classical propagating approaches requiring a known values along a boundary, such as that of characteristic curves or fast marching methods, are unlikely to model the visual system's solution.

An alternative approach requires regularization priors (in a Bayesian framework) or energy terms (in a variational framework). However, many of the proposed priors are ad-hoc and chosen by researchers to optimize performance for a particular test dataset. It is hard to conclude (from solely performance metrics) whether these priors are useful or accurate, e.g. good results are functions of these priors, resolution, the optimization techniques, the test set, and so on.

In this talk, we describe a different approach. We consider the SFS problem on image patches modeled as Taylor polynomials of any order and seek to recover a solution for that patch. We build a boot-strapping tensor framework that allows us to relate a smooth image patch to all of the polynomial surface solutions (under any light source). We then use a generic constraint on the light source to restrict these solutions to a 2-D subspace, plus an unknown rotation matrix. We then investigate several special cases where the ambiguity reduces and the solution can be anchored. Interestingly, these anchor solutions relate to those situations in which human performance is also veridical.


September 23, 2015

12-1pm, 182 George St, Room 110

Continuum limit of total variation on point clouds

Nicolas Garcia-Trillos

Prager Assistant Professor
Division of Applied Mathematics
Brown University

We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. We study when is the cut capacity, and more generally total variation, on these graphs a good approximation of the perimeter (total variation) in the continuum setting. We address this question in the setting of $


September 9, 2015

12-1pm, 182 George St, Room 110

Random matrix theory based analysis of the correlation structure of protein sequences

Lucy Colwell

Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK

Extracting interesting biological information from protein sequences is a grand challenge, with the pace of sequencing increasing by the day. There are many problems in biology where correlations between protein sequences could lead to important information and new tools; my research program aims to find new ways of using these correlations to make biological discoveries. Recently it has been shown by us and others that the correlation structure of large protein sequence alignments contains sufficient information to accurately predict protein 3D structure using approximate graphical modeling and probabilistic inference. The methodology requires inferring a large number of parameter values from a dataset that is highly under-sampled, including samples with varying degrees of correlation to one another, violating the assumptions made by the inference methods. For this approach to be useful more generally -- both for protein structure prediction and for solving other biological problems -- it is essential that we understand how prediction quality depends on the input data available for the protein family of interest. This is because those families for which no homologous crystal structure has been solved typically contain few sequences.

In this talk we first briefly compare the different inference procedures on both exact data simulated from models, and protein sequence data. We then identify a phenomena for protein sequence data in which iterating the model inference procedure allows the inferred model to converge to a more accurate solution, and show how this can be used to accurately predict interacting protein pairs from sequence data alone. This convergence suggests that there is scope to improve the accuracy of model inference, for example by 'cleaning' the sample correlation matrix. The standard random matrix theory approach identifies those few eigenvalues associated with a low rank signal by comparing the spectrum of the sample covariance matrix to the Marcenko-Pastur distribution. In the context of protein sequence data the extensive correlations between samples mean that this approach will not work, but if these correlations are known (via a phylogeny) I will show how the Marcenko Pastur equation can be solved to yield the expected empirical spectral distribution caused by phylogeny alone.


May 6, 2015

12-1pm, 182 George St, Room 110

Two Recent Information Theoretic Variations on the Theme of Patterns in Security

Muriel Medard

Cecil H. Green Professor
Electrical Engineering and Computer Science Department
Massachusetts Institute of Technology

We overview two different sets of results based upon the effect of patterns in security. In the first part, we consider limits of inference, a problem that emerges when we seek to ascertain what limits to privacy we can expect when machine learning algorithms, whose theoretical basis often relies on principal inertia components, are applied to mining publicly available data that may be related, in loosely known ways, to private data. Lower bounds for the average probability of error of estimating a hidden variable X given an observation of a correlated random variable Y, and Fano’s inequality in particular, play a central role in information theory. We present a lower bound for the average estimation error based on the marginal distribution of X and the principal inertias of the joint distribution matrix of X and Y, providing thus limits to privacy. Furthermore, we investigate how to answer a fundamental question in inference and privacy: given an observation Y, can we estimate a function f(X) of the hidden random variable X with an average error below a certain threshold? We provide a general method for answering this question using an approach based on rate-distortion theory.

In the second part, we consider recent results on guesswork, the characterization of the process sequences such as passwords. We note that, what may appear as being even slight differences in distributions of these sequences may lead to differences that are exponential in guesswork, leading to possibly surprising results, such as the failure of the oft-assumed uniformity of compressed sources, and the fact that friendly jamming of an intended user may be advantageous. We conclude with our recently defined notion of inscrutability rate, used to quantify the asymptotic difficulty of guessing U out of V secret strings. Unexpectedly, the inscrutability rate of any finite-order Markov string-source with hidden statistics remains the same as the unhidden case, i.e., the asymptotic value of hiding the statistics per each symbol is vanishing.

Joint work with Ahmad Beirami, Robert Calderbank, Mark Christiansen, Ken Duffy, Flavio du Pin Calmon, Stefano Tessaro, Mayank Varia


April 1, 2015

12-1pm, 182 George St, Room 110

Adaptive Bayesian Estimation of Conditional Densities

Andriy Norets

Associate Professor
Department of Economics
Brown University

Nonparametric estimation of conditional distributions is important in empirical work across many fields. The Bayesian approach to this problem has several attractive properties. It does not require fixing a bandwidth or similar tuning parameters. Instead, it provides estimates of the objects of interest where the tuning parameters are averaged out with respect to their posterior distribution. Also, the Bayesian approach performs well in out-of-sample prediction and Monte Carlo exercises. In this talk, I will discuss theoretical properties of Bayesian nonparametric models and provide an explanation for their excellent performance in applications.

I will focus on mixtures of Gaussian densities with covariate dependent mixing weights and a variable number of mixture components for which a prior on positive integers is specified. Conditional on the number of mixture components, the mixing weights are modelled by a multinomial logit with a common scale parameter. This model is closely related to mixture-of-experts also known as smooth mixtures in econometrics. The main theoretical result of the talk is that the posterior in this model contracts at a minimax rate up to a logarithmic factor. The assumed prior distribution does not depend on the smoothness level of the true conditional density. Thus, the obtained posterior contraction rate is adaptive across all smoothness levels.

I will also briefly discuss some applications and estimation methods for nonparametric Bayesian modelling of conditional densities.

Background papers:
1. "Adaptive Bayesian Estimation of Conditional Densities" by Norets and Pati,
http://arxiv.org/pdf/1408.5355.pdf
2. "Posterior Consistency in Conditional Density Estimation by Covariate Dependent Mixtures" by Norets and Pelenis, Econometric Theory, 2014
http://www.econ.brown.edu/fac/Andriy_Norets/papers/consmixreg.pdf


March 18, 2015

12-1pm, 182 George St, Room 110

Using evolutionary sequence variation to make inferences about protein structure and function

Lucy Colwell

Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using the natural variation present in homologous protein sequences to infer these constraints and thus identify residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the correlation structure of the data.

The challenge is to distinguish true interactions from the noisy and under-sampled set of observed correlations in a large multiple sequence alignment. To address this we build a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair interactions. We translate these interactions into pairwise distance constraints between amino acids and use them to generate all atom structural models. Using proteins of known structure we show that correlations between amino acids at different sites in a protein contain sufficient information to predict low resolution tertiary protein structure of both globular and transmembrane proteins. We then apply our method to predict de novo the structure of 11 medically important transmembrane proteins of unknown structure. In addition we are able to predict protein quaternary structure and alternative conformations. The next step requires development of a theoretical inference framework that enables the relationship between the amount of available input data and the reliability of structural predictions to be better understood.


March 4, 2015

12-1pm, 182 George St, Room 110

From Pixels to Local Layers: Exploring Flexible Representations for Motion Estimation

Deqing Sun

Postdoctoral Research Fellow in Computer Science
Harvard University

We live in a dynamic world where motion is ubiquitous. To exist in such an environment, robots and other intelligent agents should have the ability to perceive and understand motion. Estimating image motion and segmenting the scenes into coherently moving regions are two closely related problems but are often treated separately. Motion actually provides an important cue to identify surfaces in a scene, while segmentation may provide the proper support for motion estimation. Despite decades of research efforts, current methods still tend to produce large errors near motion boundaries and in occlusion regions and falsely merge foreground objects with the background.

I will start from a probabilistic layered model for joint motion estimation and segmentation. We order each moving object (layer) in depth and explicitly model the occlusions between layers. We model the segmentation using thresholded spatio-temporally coherent support functions and the motion using globally coherently but locally flexible priors. Our model enforces that scene structures (segmentation), instead of motion, should persist over time. Our method achieves promising results on both the Middlebury optical flow benchmark and the MIT layer segmentation dataset, particularly in occlusion regions.

Noting that "global" layered models cannot capture mutual or self-occlusions or deal with too many layers, I will introduce a local layering representation that breaks the scenes into local layers and jointly models the motion and occlusion relationship between local layers. By retaining uncertainty on both the motion and the occlusion relationship, we can avoid local minima common to motion-only or occlusion-only approaches. Our method can handle motion and occlusion well for both challenging synthetic and real sequences.

Finally, I will show an application of motion estimation to interactive intrinsic video editing. We introduce a fast and temporally consistent algorithm to decompose video sequences into their reflectance and illumination components. One key observation is that reflectance is an intrinsic property of physical surfaces and tend to persist over time, while lighting may vary. The temporally consistent decomposition results allow illumination-aware video editing, such as retexturing and lighting-aware compositing.

Joint work with Michael J. Black, Erik B. Sudderth, Ce Liu, Hanspeter Pfister, Nicolas Bonneel, Kalyan Sunkavalli, James Tompkin, and Sylvain Paris


February 25, 2015

12-1pm, 182 George St, Room 110

Scaling and Generalizing Variational Inference

David Blei

Professor of Statistics and Computer Science
Columbia University

Latent variable models have become a key tool for the modern statistician, letting us express complex assumptions about the hidden structures that underlie our data. Latent variable models have been successfully applied in numerous fields including natural language processing, computer vision, electronic medical records, genetics, neuroscience, astronomy, political science, sociology, the digital humanities, and many others.

The central computational problem in latent variable modeling is posterior inference, the problem of approximating the conditional distribution of the latent variables given the observations. Posterior inference is central to both exploratory tasks, where we investigate hidden structures that underlie our data, and predictive tasks, where we use the inferred structures to generalize about future data. Approximate posterior inference algorithms have revolutionized Bayesian statistics, revealing its potential as a usable and general-purpose language for data analysis.

Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing approximate inference algorithms do not scale well. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand. This often requires significant model-specific analysis, which precludes us from easily exploring a variety of models.

In this talk I will discuss our recent research on addressing these two limitations. First I will describe stochastic variational inference, an approximate inference algorithm for handling massive data sets. Stochastic inference is easily applied to a large class of Bayesian models, including time-series models, factor models, and Bayesian nonparametric models. I will demonstrate its application to probabilistic topic models of text conditioned on millions of articles. Stochastic inference opens the door to scalable Bayesian computation for modern data analysis.

Then I will discuss black box variational inference. Black box inference is a generic algorithm for approximating the posterior. We can easily apply it to many models with little model-specific derivation and few restrictions on their properties. Black box inference performs better than similarly generic sampling algorithms, such as Metropolis-Hastings inside Gibbs, and can be composed with stochastic inference to handle massive data. I will demonstrate its use on a suite of nonconjugate models of longitudinal healthcare data.

This is joint work based on these two papers:

M. Hoffman, D. Blei, J. Paisley, and C. Wang. Stochastic variational inference. Journal of Machine Learning Research, 14:1303-1347.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf

R. Ranganath, S. Gerrish, and D. Blei. Black box variational inference. Artificial Intelligence and Statistics, 2014.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf


February 4, 2015

12-1pm, 182 George St, Room 110

Computational reconstruction and modeling of multicellular dynamics from 3D+time in vivo imaging of animal early embryogenesis. Extension to Artificial Life.

Rene Doursat

Research Scientist and Lecturer
(a) Complex Systems Institute, Paris Ile-de-France (ISC-PIF), Paris, France
(b) BioEmergences Lab, CNRS (USR 3695), Gif-sur-Yvette (Paris), France
(c) Erasmus Mundus Master's Program in Complex Systems Science, Ecole Polytechnique, Palaiseau (Paris), France

The BioEmergences platform [1,2], designed at Nadine Peyrieras's lab (b), provides automated analysis and reconstruction of collective cell movements based on time-lapse microscopy of organism development. It offers biologists advanced software tools capable of handling large amounts of 4D data through a workflow of image processing algorithms. Raw voxel-based movies of embryos (zebrafish, sea urchin), containing fluorescent proteins to highlight cell membranes and nuclei, are filtered by an edge-preserving smoothing method, then cell positions are extracted from the local maxima of these images and passed on to shape segmentation and cell-tracking modules. The output is a cell lineage tree annotated with various quantitative measurements. In parallel to this data reconstruction effort, the MecaGen platform [3], directed by myself, supports agent-based modeling and simulation of morphogenesis. Centered on the physicochemical coupling of cell mechanics with gene expression and molecular signaling, embryonic development is viewed here a self-organized phenomenon emerging from a myriad of cells via their genetically regulated and regulating behavior. Cells' mechanical properties (division, adhesion, motility) are closely correlated with their spatial location and temporal state of genetic and molecular dynamics (protein and ligand concentrations), and affect each other concurrently. MecaGen is illustrated on morphogenetic episodes occuring during zebrafish and sea urchin early development. Exploration of parameter space is supported by experimental data from BioEmergences, which allows measuring the "fitness" of the virtual embryo and validating hypotheses. Finally, I will also briefly introduce Morphogenetic Engineering [4], an "Artificial Life" spinoff field that I founded, which takes its inspiration from biological development to create programmable and reproducible robotic, software or network architectures by decentralized self-assembly of elementary agents [5].

Selected References
[1] http://www2.die.upm.es/im/papers/Science2010.pdf
[2] http://doursat.free.fr/docs/Castro_et_al_2014_atlas_PLoS.pdf
[3] http://doursat.free.fr/docs/Delile_Doursat_Peyrieras_2013_MecaGen_chapter.pdf
[4] http://doursat.free.fr/docs/Doursat_Sayama_Michel_2013_MorphEng_NACO.pdf
[5] http://doursat.free.fr/docs/Doursat_Sanchez_2014_mapdevo_SoRo.pdf .


December 3, 2014

12-1pm, 182 George St, Room 110

Hypothesis-guided dimensionality reduction and its application to large-scale neuroscience

John Cunningham

Assistant Professor
Department of Statistics
Columbia University

Like many fields, neuroscience is experiencing dramatic increases in the quantity and complexity of recorded data, with the goal of similarly dramatic scientific and technological advances. Critical to realizing these ambitions are the analysis methods to interrogate large datasets jointly. Many questions in the field reduce to a question of "hypothesis-guided" dimensionality reduction: given a large number of noisy time-series, what shared low-dimensional process best describes the data in terms of that hypothesis? I will begin by briefly describing a few applications of this dimensionality reduction in neuroscience. I will then introduce a generic, hypothesis-guided linear dimensionality reduction solver using an optimization program over the Stiefel and Grassman manifolds. To demonstrate the utility of this solver, I will apply it to neural recordings from the motor cortex. Here we show that the seemingly hopeless complexity of neural responses can be simply described by a low-dimensional dynamical system at the level of the neural population. Finally, I will describe a critical statistical test of the significance of this hypothesis, verifying that the inferred structure is not a trivial consequence of high-dimensional data.
http://stat.columbia.edu/~cunningham/


November 20, 2014 (Thursday)

4-5pm, Wilson Hall Room 102

Joint Pattern Theory / LCDS Seminar

Geometric graph-based methods for high dimensional data

Andrea Bertozzi

Professor
Department of Mathematics
UCLA

We present new methods for segmentation of large datasets with graph based structure. The method combines ideas from classical nonlinear PDE-based image segmentation with fast and accessible linear algebra methods for computing information about the spectrum of the graph Laplacian. The goal of the algorithms is to solve semi-supervised and unsupervised graph cut optimization problems. I will present results for image processing applications such as image labeling and hyperspectral video segmentation, and results from machine learning and community detection in social networks, including modularity optimization posed as a graph total variation minimization problem.


November 12, 2014

12-1pm, 182 George St, Room 110

Multiscale models for shapes and images

Pedro Felzenszwalb

Associate Professor
School of Engineering and Department of Computer Science
Brown University

Markov models are widely used in a variety of applications in many different fields. In computer vision Markov models are commonly used as priors for shapes and images. Markov models capture regularities of objects such as sequences or images by modeling their local properties. This often leads to tractable learning and inference problems. However it is clear that in many applications realistic priors need to capture non-trivial long-range correlations and high-order properties of objects. Such models can be difficult to estimate and often lead to intractable inference problems. In this talk I will discuss an approach for defining models using a multiscale representation of an object. The idea involves modeling local properties of an object at multiple resolutions. This leads to a natural low-dimensional parameterization for high-order models. The resulting models also have structure that can lead to efficient inference algorithms. I will discuss two instances of the approach in the context of computer vision. One instance involves modeling curves for recognizing shapes and for detecting shapes in images. Another instance involves modeling images to recover hidden structure from noisy measurements.


November 5, 2014

12-1pm, 182 George St, Room 110

Cross-Study Validation versus Randomized Cross-Validation as Data Accumulate

Lo-Bin Chang

Visiting Assistant Professor
Department of Applied Mathematics and Statistics & The Center for Imaging Science
Johns Hopkins University

In recent years "reproducibility" has emerged as a key factor in evaluating predictors of disease phenotypes. In particular, "validation" is undermined when error rates on data collected from new studies exceed those originally reported, which is hardly surprising for a heterogeneous population. In this talk, I will provide a statistical formulation in the large sample limit: sub-populations are modeled as components of a mixture and all error rates are optimal (Bayes) for a two-class problem. For any number m of studies, the error rate in cross-study validation exceeds that in ordinary randomized cross-validation, the latter (averaged) increases with m, and both converge to the optimal rate.


October 22, 2014

12-1pm, 182 George St, Room 110

Algorithms for Interpretable Machine Learning

Cynthia Rudin

Associate Professor of Statistics
Sloan School of Management and
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

Possibly *the* most important obstacle in the deployment of predictive models is the fact that humans simply do not trust them. If it is known exactly which variables were important for the prediction and how they were combined, this information can be very powerful in helping to convince people to believe (or not believe) the prediction and make the right decision. In this talk I will discuss algorithms for making these non-black box predictions including:

1) "Bayesian Rule Lists" - This algorithm builds a decision list using a probabilistic model over permutations of IF-THEN rules. It competes with the CART algorithm.

2) "Supersparse Linear Integer Models" - This algorithm produces scoring systems, which are a type of model that is widely used in the medical community. It proposes an alternative to the Lasso method.

I will show applications to healthcare, including an alternative to the CHADS_2 score, which is one of the most widely used scoring systems in medicine. Our model was trained on over 1000 times the amount of data as CHADS_2 and is more accurate, but is just as interpretable. I will also show preliminary joint work with the MGH Sleep Lab on diagnosing forms of sleep apnea.

At the end of the talk, I will discuss random cool applications of machine learning being worked on in the Prediction Analysis Lab.

links:

Building Interpretable Classifiers with Rules using Bayesian Analysis
http://web.mit.edu/rudin/www/LethamRuMcMa14.pdf
This is joint work with my student Ben Letham and colleagues Tyler McCormick and David Madigan.

Method and Models for Interpretable Linear Classification
http://arxiv.org/abs/1405.4047
This is joint work with my student Berk Ustun.

Other links are on my webpage under "Interpretable Predictive Models"
http://web.mit.edu/rudin/www/PredicticsPapers.html


October 8, 2014

12-1pm, 182 George St, Room 110

A Simple Pattern-Theoretic Representation

Oren Freifeld

Postdoctoral Associate
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

Reasoning over real-world signals consists of three conceptual steps: representation, modeling and inference. Here, we focus on a specific choice of representation, but one that is strongly motivated by modeling and inference considerations.

The transformational approach, a cornerstone in Ulf Grenander's Pattern Theory, replaces the representation of objects with the representation of transformations acting on the objects. Particularly, diffeomorphisms, a special class of transformations, often play a pivotal role. Typically, the choice of a diffeomorphism space involves a trade-off: such spaces range from ones that are simple (that is a good thing) but of limited expressiveness, to those that are highly-expressive (i.e., are able to capture a wide range of patterns) but are complicated and computationally-demanding. Thus, despite their mathematical elegance and sophistication, the applicability of the latter has been somewhat limited, especially for large data sets or when time is of the essence. Moreover, owing to their complexity, the application of modern approaches to modeling (e.g., Bayesian nonparametric models) and inference (e.g., MCMC) to highly-expressive spaces of diffeomorphisms presents significant mathematical and algorithmic challenges. This is typified by the case of analysis-by-synthesis methods, wherein multiple evaluations of likelihood functions (or cost functions in the deterministic setting) rely on computing multiple transformations.

In this talk I will present our recent, work-in-progress, efforts to facilitate tractable modeling and inference for such spaces. We propose a novel space of transformations which is not only highly-expressive but also (embarrassingly-) simple. Moreover, it supports highly-accurate and fast computations of transformations thus enabling rapid likelihood evaluations. Finally, the representation also lends itself to multi-scale modeling and easy constructions and evaluations of priors. While the representation has certain interesting differential-geometric aspects, the talk will focus on the practical issues of implementation (requiring only basic ideas from linear algebra and first-order differential equations) and the implications for modeling and inference. Preliminary results demonstrate the applicability of the proposed approach in computer-vision tasks such as image warping or registration, and machine-learning tasks such as monotonic regression, time-warp analysis, and modeling CDF- or histogram-valued data.

This is joint work with Soren Hauberg, Julian Straub, Kayhan Batmanghelich, and John Fisher.


October 1, 2014

12-1pm, 182 George St, Room 110

Statistical mechanics for real biological networks

William Bialek

Professor
Department of Physics
Princeton University

It is an old dream that ideas from statistical physics could help us understand the phenomena that emerge from biological networks, be they networks of genes, networks of neurons, or networks of organisms. In recent years, it has become possible to make increasingly accurate, simultaneous measurements on the states of (almost) all the nodes in such networks. I'll discuss the efforts that my colleagues and I are making to connect these data to statistical physics models. The key idea is the (quite old) maximum entropy principle: we try to build models that are consistent with some measured properties of the network (e.g., the correlations among the states of pairs of elements) but otherwise have as little structure as possible. I will use the example of a flock of birds to explain how this works, and to explain our surprise that it works so well. Statistical mechanics teaches us that, as systems become large, the parameter space breaks up into phases, and this also is true for families of maximum entropy models. Thus, we can ask where real networks are in the phase diagram of possible networks. For the flock, we'll see that the system is poised very close to a critical surface. We can go though a similar (but much more complex) analysis for a network of neurons in the vertebrate retina, and surprisingly we find the same answer - the system seems to be close to criticality, and we can detect hints of criticality in other systems as well. It seems that we are either seeing signs of something general, or we are fooling ourselves, and I'll outline a path to telling the difference.


April 30, 2014

12-1pm, 182 George St, Room 110

Division of Applied Mathematics / Center for Vision Research Seminar

Computational Anatomy, High Throughput Neuro-Imaging Informatics, and the BrainCloud

Michael Miller

Professor
Biomedical Engineering and Electrical and Computer Engineering
Johns Hopkins University

Brain parsing is a core technology in Computational Anatomy supporting high throughput indexing of brains. This is allowing us, with the Susumu Mori laboratory, to build several brain clouds in which brains can be searched, retrieved and machine learning can be performed. We will discuss the notions of brain parsing and "geodesic positioning" central to Computational Anatomy, allowing for the understanding of complex medical imagery in terms of the neuroscience ontology at 1mm scale.

The second half of the talk will focus on our medical findings in two of the several neurodegenerative diseases being studied worldwide, dementia of the Alzheimer’s type, and Huntington’s disease. We will describe our results in preclinical dementia in the BIOCARD project led by Marilyn Albert, attempting to characterize the onset and trajectory of biomarker change in temporal lobe structures - amygdala, hippocampus, and entorhinal cortex – prior to conversion to clinical Alzheimer’s symptomology. Secondly, we will discuss results in Huntington’s disease in the PREDICT-HD project, led by University of Iowa, examining motor and subcortical structure change as a function of genetic load.


April 23, 2014

12-1pm, 182 George St, Room 110

Climate Informatics: Recent Advances and Challenge Problems for Machine Learning in Climate Science

Claire Monteleoni

Assistant Professor
Department of Computer Science
George Washington University

The threat of climate change is one of the greatest challenges currently facing society. Given the profound impact machine learning has made on the natural sciences to which it has been applied, such as the field of bioinformatics, machine learning is poised to accelerate discovery in climate science. Our recent progress on climate informatics reveals that collaborations with climate scientists also open interesting new problems for machine learning. I will give an overview of challenge problems in climate informatics, and present recent work from my research group in this nascent field.

A key problem in climate science is how to combine the predictions of the multi-model ensemble of global climate models that inform the Intergovernmental Panel on Climate Change (IPCC). I will present three approaches to this problem. Our Tracking Climate Models (TCM) work demonstrated the promise of an algorithm for online learning with expert advice, for this task. Given temperature predictions from 20 IPCC global climate models, and over 100 years of historical temperature data, TCM generated predictions that tracked the changing sequence of which model currently predicts best. On historical data, at both annual and monthly time-scales, and in future simulations, TCM consistently outperformed the average over climate models, the existing benchmark in climate science, at both global and continental scales. We then extended TCM to take into account climate model predictions at higher spatial resolutions, and to model geospatial neighborhood influence between regions. Our second algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model from which TCM is derived, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best climate model at a given location. We recently applied a third technique, sparse matrix completion, in which we create a sparse (incomplete) matrix from climate model predictions and observed temperature data, and apply a matrix completion algorithm to recover it, yielding predictions of the unobserved temperatures.


April 16, 2014

12-1pm, 182 George St, Room 110

The Blended Paradigm: A Bayesian approach to handling outliers and misspecified models

Steven MacEachern

Professor
Department of Statistics
Ohio State University

Bayesian methods have proven themselves to be enormously successful across a wide range of scientific problems, with analyses ranging from the simple one-sample problem to complicated hierarchical models. They have many well-documented advantages over competing methods. However, Bayesian methods run into difficulties for two major and prevalent classes of problems--handling data sets with outliers and dealing with model misspecification. In both cases, standard Bayesian analyses fall prey to the hubris that is an integral part of the Bayesian paradigm. The large sample behavior of the analysis is driven by the likelihood. We propose the use of restricted likelihood as a single solution to both of these problems. When working with restricted likelihood, we summarize the data, x, through a set of (insufficient) statistics T(x) and update our prior distribution with the likelihood of T(x) rather than the likelihood of x. By choice of T(x), we retain the main benefits of Bayesian methods while reducing the sensitivity of the analysis to selected features of the data. The talk will motivate the blended paradigm, discuss properties of the method and choice of T(x), cover the main computational strategies for its implementation, and illustrate its benefits.


April 9, 2014

12-1pm, 182 George St, Room 110

Division of Applied Mathematics / Center for Vision Research Seminar

The computational magic of the ventral stream: a theory (and why some deep architectures work)

Tomaso Poggio

Professor
NSF Center for Brains, Minds and Machines
McGovern Institute
Computer Science and Artificial Intelligence Laboratory
Brain Sciences Department
Massachusetts Institute of Technology

Following on an idea of Stu Geman, I believe that the present phase of Machine Learning is characterized by supervised learning algorithms relying on large sets of labeled examples (n → ∞). The next phase is likely to focus on algorithms capable of learning from very few labeled examples (n → 1), like humans seem able to do. I will introduce a new approach to this problem describing the underlying theory, based on the unsupervised, automatic learning of a "good" representation for supervised learning, characterized by small sample complexity (n).

We consider the case of visual object recognition, though the theory applies to other domains. The starting point is the conjecture, that we can prove in specific cases, that image representations which are invariant to translation, scaling and other transformations can considerably reduce the sample complexity of learning. We prove that an invariant and unique (discriminative) signature can be computed for each image patch. A module performing filtering and pooling, like the simple and complex cells described by Hubel and Wiesel, can compute such estimates. The theory extends existing deep learning convolutional architectures for image and speech recognition. It also suggests that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and discriminative for recognition — and that this representation may be continuously learned in an unsupervised way during development and visual experience.


April 2, 2014

12-1pm, 182 George St, Room 110

Can connectomics help us understand neural computation? Insights from the fly visual system

Dmitri Chklovskii

Group Leader
Janelia Farm Research Campus

Animal behavior arises from computations in neuronal circuits, but our understanding of these computations has been frustrated by the lack of detailed synaptic connection maps, or connectomes. For example, despite intensive investigations over half a century, the neuronal implementation of local motion detection in the insect visual system remains elusive. We developed a semi-automated pipeline using electron microscopy to reconstruct a connectome, containing 379 neurons and 8,637 chemical synaptic contacts, within the Drosophila optic medulla. By matching reconstructed neurons to examples from light microscopy, we assigned neurons to cell types and assembled a connectome of the repeating module of the medulla. Within this module, we identified cell types constituting a motion detection circuit, and showed that the connections onto individual motion-sensitive neurons in this circuit were consistent with their direction selectivity. Our identification of cell types involved in motion detection allowed targeting of extremely demanding electrophysiological recordings by other labs. Preliminary results from such recordings show time delays confirming our findings. This demonstrates that connectomes can provide key insights into neuronal computations.


March 12, 2014

12-1pm, 182 George St, Room 110

Robust inference on parameters via particle filters and sandwich covariance matrices

Neil Shephard

Professor
Departments of Economics and Statistics
Harvard University

Likelihood based estimation of the parameters of state space models can be carried out via a particle filter. In this paper we show how to make valid inference on such parameters when the model is incorrect. In particular we develop a simulation strategy for computing sandwich covariance matrices which can be used for asymptotic likelihood based inference. These methods are illustrated on some simulated data.


February 26, 2014

12-1pm, 182 George St, Room 110

Productivity and Reuse in Language: Non-parametric Bayesian Models of Lexical Acquisition

Timothy O'Donnell

Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology

A much-celebrated aspect of language is the way in which it allows us to express and comprehend an unbounded number of thoughts. This property is made possible because language consists of several combinatorial systems which can productively build novel forms using a large inventory of stored, reusable parts: the lexicon.

For any given language, however, there are many potentially storable units of structure, each giving rise to many more ways of forming novel expressions than are actually used in practice. For example, English contains suffixes which are highly productive and generalizable (e.g., -ness; Lady-Gagaesqueness, pine-scentedness) and suffixes which can only be reused in specific words, and cannot be generalized (e.g., -th; truth, width, warmth). How are such differences in generalizability and reusability represented? What are the basic, stored building blocks at each level of linguistic structure? When is productive computation licensed and when is it not? How can the child acquire these systems of knowledge?

I will discuss several mathematical models of productivity (computation) and reuse (storage), at different levels of linguistic structure. These models all treat the the problem as a tradeoff between storage of frequent (sub)structures and the need to productively generalize to novel cases. The computation/storage tradeoff provides an illuminating new perspective on standard Bayesian tradeoff between fit to data and simplicity/generalizability of hypotheses. This tradeoff is formalized in these models using tools from Bayesian non-parametric statistics.

The distinction between rules and forms that generalize and those that don't gives rise to several deep questions about the nature of the linguistic system. While the tradeoff between productivity and reuse is consistent across different levels of linguistic structure, and can be modeled using Bayesian non-parametrics, linguistically-informed representations at each level are crucial for correctly capturing the empirical patterns in the data. Working with such rich, structured representations leads to significant inference challenges, which I will also briefly discuss.


February 12, 2014

12-1pm, 182 George St, Room 110

Challenging issues in Likelihood Inference on Mixture Models

Daeyoung Kim

Assistant Professor
Department of Mathematics and Statistics
University of Massachusetts Amherst

Modern computing power and algorithms have greatly increased interest in mixture models as an effective tool for modeling heterogeneity in data. Statistical inference for finite mixture models can be done by the method of maximum likelihood (ML) which provides, in a single package, methods for point estimation, hypothesis testing and construction of confidence sets. Despite attractive features of finite mixture models, there are several challenges for likelihood based inference. In this talk we will address two problems: empirical identifiability on the mixture parameters and multiple local maximizers in the mixture likelihood.


December 4, 2013

12-1pm, 182 George St, Room 110

Big data in neuroscience: Where is the information?

Joachim Buhmann

Professor
Department of Computer Science
ETH Zurich

Neuroscience provides data at very different scales: microscopic imagery (ssTEM) measures information on the structure of neurons, diffusion tensor imaging tracks fibers between brain areas and the hemispheres, fMRI data measures spatial activity patterns to estimate dynamic causal models. At all three length scales, machine learning methods extract complex structures from vast amount of data. Model selection plays an important role in controlling the complexity of the explanation. Furthermore, the amount of supervision information is very scarce since human annotators display a high error rate on these ambiguous data sources. We advocate a maximum entropy approach to interpret such a data source and their model complexity is regularized by an information theoretic model selection principle.


November 20, 2013

12-1pm, CIT 134

Interactive Demonstrations in the Brown Robotics Lab

Chad Jenkins

Associate Professor
Department of Computer Science
Brown University

This meeting will be a tour of the Brown Robotics Lab in the CIT Building. This tour will consist of interactive demonstrations of the PR2 robot for perception and physical manipulation of tabletop objects using data from visual streams. This demo will be complemented with a description of the underlying computational methods, such as processing of point clouds from RGBD data, object recognition, and adjustable autonomy for human-robot interaction. Time permitting, demos of other robots, such as the Rethink Baxter, Suitable Beam, and AR.Drone quad-rotor helicopter, will occur with a description of their vision-related technical challenges.


November 6, 2013

12-1pm, 182 George St, Room 110

Dirichlet process mixture inconsistency for the number of components, and dimension mixture models

Jeffrey Miller

Brown University

Note: The first half will essentially be a practice talk for my NIPS 2013 presentation.

For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of components occurring so far, that is, the posterior on the number of clusters in the observed data. However, it turns out that this posterior is not consistent; it does not converge to the true number of components.

Motivated by this finding, we examine an alternative approach to Bayesian nonparametric models. Many of the commonly-used nonparametric models are infinite-dimensional, such as Dirichlet or Pitman-Yor process mixtures (DPMs/PYMs), Hierarchical Dirichlet processes (HDPs), and Indian buffet processes (IBPs). A less common but very natural type of Bayesian nonparametric model is constructed by taking a family of finite-dimensional models and putting a prior on the dimension --- that is, taking a mixture of finite-dimensional models. Interestingly, this approach gives rise to combinatorial stochastic processes that closely parallel those of DPMs/PYMs, HDPs, and IBPs, and has certain advantages over the infinite-dimensional approach.


October 23, 2013

12-1pm, 182 George St, Room 110

High-dimensional statistics

Sahand Negahban

Assistant Professor
Department of Statistics
Yale University

The focus of my research is to develop theoretically sound methods, which are both computationally and statistically efficient, for extracting information from large datasets. A salient feature of my work has been to understand how hidden low-complexity structure in large datasets can be used to develop computationally and statistically efficient methods for extracting meaningful information for high-dimensional estimation problems. My work borrows from and improves upon tools of statistical signal processing, machine learning, probability and convex optimization.


October 9, 2013

12-1pm, 182 George St, Room 110

Model selection in a large compositional space

Roger Grosse

Massachusetts Institute of Technology

We often build complex probabilistic models by "composing" simpler models -- using one model to generate the latent variables for another model. This allows us to express complex distributions over the observed data and to share statistical structure between different parts of a model. I'll present a space of matrix decomposition models defined by the composition of a small number of motifs of probabilistic modeling, including clustering, low rank factorizations, and binary latent factor models. This compositional structure can be represented by a context-free grammar whose production rules correspond to these motifs. By exploiting the structure of this grammar, we can generically and efficiently infer latent components and estimate predictive likelihood for nearly 2500 model structures using a small toolbox of reusable algorithms. Using a greedy search over this grammar, we automatically choose the decomposition structure from raw data by evaluating only a small fraction of all models. The proposed method typically finds the correct structure for synthetic data and backs off gracefully to simpler models under heavy noise. It learns sensible structures for datasets as diverse as image patches, motion capture, 20 Questions, and U.S. Senate votes, all using exactly the same code.


September 11, 2013

12-1pm, 182 George St, Room 110

Parameter estimation robust to low-frequency contamination

Adam McCloskey

Assistant Professor
Department of Economics
Brown University

We provide methods to robustly estimate the parameters of stationary ergodic short-memory time series models in the potential presence of additive low-frequency contamination. The types of contamination covered include level shifts (changes in mean) and monotone or smooth time trends, both of which have been shown to bias parameter estimates towards regions of persistence in a variety of contexts. The estimators presented here minimize trimmed frequency domain quasi-maximum likelihood (FDQML) objective functions without requiring specification of the low-frequency contaminating component. We provide two approaches, allowing for either thin or heavy-tailed data. When proper sample size-dependent trimmings are used, the FDQML estimators are consistent and asymptotically normal, asymptotically eliminating the presence of any spurious persistence. These asymptotic results also hold in the absence of additive low-frequency contamination, enabling the practitioner to robustly estimate model parameters without prior knowledge of whether contamination is present. Popular time series models that fit into the framework of this article include ARMA, stochastic volatility, GARCH and ARCH models. We explore the finite sample properties of the trimmed FDQML estimators of the parameters of some of these models, providing practical guidance on trimming choice. Empirical estimation results suggest that a large portion of the apparent persistence in certain volatility time series may indeed be spurious.

Joint work with Jonathan B. Hill (UNC-Chapel Hill).


May 8, 2013

12-1pm, 182 George St, Room 110

Scale-invariance and metrics on spaces of plane curves

Matt Feiszli


The study of metrics and distances on spaces of curves and surfaces is an active area of research with applications in vision and imaging. We will present metrics on spaces of plane curves using techniques from harmonic, complex, and geometric analysis. Their behavior can be best understood through notion of scale, and we will discuss geometric ways to understand their multiscale behavior. We will discuss their application to problems in shape deformation and shape matching, and demonstrate scale-invariant approaches to curve matching on standard test datasets.


April 17, 2013

12-1pm, 182 George St, Room 110

Towards a general theory of human learning and reasoning

Charles Kemp

Assistant Professor
Department of Psychology
Carnegie Mellon University

People learn and reason about animals, spatial relations, kinsfolk, and many other domains, and solve a broad range of inductive problems within each of these domains. The full set of domains and the full set of inductive problems within these domains can be collectively described as the conceptual universe. I will present a systematic characterization of the conceptual universe that helps to clarify the relationships between familiar inductive problems such as property induction, categorization, and stimulus generalization, and that introduces new inductive problems for psychological investigation. I will illustrate the framework using case studies that include behavioral and computational studies of inductive reasoning, and a formal analysis of kinship classification across cultures.


April 10, 2013

12-1pm, 182 George St, Room 110

Encoding binary neural codes in networks of threshold-linear neurons

Carina Curto

Assistant Professor
Department of Mathematics
University of Nebraska-Lincoln (UNL)

Networks of neurons in the brain encode preferred patterns of neural activity via their synaptic connections. Despite receiving considerable attention, the precise relationship between network connectivity and encoded patterns is still poorly understood. Here we consider this problem for networks of threshold-linear neurons whose computational function is to learn and store a set of binary patterns (e.g., a neural code) as "permitted sets" of the network. We introduce a simple Encoding Rule that selectively turns "on" synapses between neurons that co-appear in one or more patterns. The rule uses synapses that are binary, in the sense of having only two states ("on" or "off"), but also heterogeneous, with weights drawn from an underlying synaptic strength matrix S. Our main results precisely describe the stored patterns that result from the Encoding Rule -- including unintended "spurious" states -- and give an explicit characterization of the dependence on S. As a consequence, we find that certain neural codes are "natural" in the context of these networks; i.e., the structure of the code closely matches the structure of emerging spurious states, allowing the full code to be accurately learned from a highly undersampled set of patterns. Interestingly, many commonly observed neural codes in cortical and hippocampal areas are natural in this sense. As an application, we construct networks that encode hippocampal place field codes nearly exactly, following presentation of only a small fraction of patterns. To obtain our results, we prove new theorems using classical ideas from convex and distance geometry, such as Cayley-Menger determinants, revealing a novel connection between these areas of mathematics and coding properties of neural networks.

This is joint work with Anda Degeratu and Vladimir Itskov.


April 3, 2013

12-1pm, 182 George St, Room 110

Learning to behave by reading

Regina Barzilay

Associate Professor
Department of Electrical Engineering and Computer Science
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

In this talk, I will address the problem of grounding linguistic analysis in control applications, such as game playing and robot navigation. We assume access to natural language documents that describe the desired behavior of a control algorithm (such as game strategy guides). Our goal is to demonstrate that knowledge automatically extracted from such documents can dramatically improve performance of the target application. First, I will present a reinforcement learning algorithm for learning to map natural language instructions to executable actions. This technique has enabled automation of tasks that until now have required human participation -- for example, automatically configuring software by consulting how-to guides. Next, I will present a Monte-Carlo search algorithm for game playing that incorporates information from game strategy guides. In this framework, the task of text interpretation is formulated as a probabilistic model that is trained based on feedback from Monte-Carlo search. When applied to the Civilization strategy game, a language-empowered player outperforms its traditional counterpart by a significant margin.


March 20, 2013

12-1pm, 182 George St, Room 110

Sparse superposition codes: Communication by regression

Andrew Barron

Professor
Department of Statistics
Yale University

A solution is presented to the problem of provable, fast, reliable, capacity-achieving communications for the additive Gaussian noise channel. This is an important real communications problem for which the fundamental limits of reliability were initially established in the capacity theorem of Shannon in 1948. Not till the 1990s were practical high-rate schemes developed and these form an essential part of the cell phone revolution. However, the demonstration of their reliability is only empirical. The codes we develop here are sparse superposition codes based on a high-dimensional regression model, and they have fast encoders and decoders based on iterative regression fits. In this presentation we describe the framework permitting theoretical demonstration of the desired properties. The error probability is shown to scale favorably with the size of the code, namely, the error probability is exponentially small for every fixed communication rate below capacity. Remaining challenges are discussed in understanding how the complexity scales when the rate approaches capacity. This work is joint with Antony Joseph and Sanghee Cho.


February 27, 2013

12-1pm, 182 George St, Room 110

Bayesian model sampling in reinforcement learning

Michael Littman

Professor
Department of Computer Science
Brown University

Reinforcement learning (RL) is a subfield of machine learning concerned with decision makers that adapt their behavior given utility-based feedback. While temporal difference methods dominate the field, there are a growing number of researchers examining Bayesian statistics as a way of learning about unfamiliar environments while modeling their own uncertainty and addressing the classic exploration/exploitation dilemma. I will introduce the RL problem and provide some background into existing Bayesian methods, as well as presenting current efforts from our lab to generalize these approaches to broader model classes and more efficient computation.


February 20, 2013

12-1pm, 182 George St, Room 110

Nonstationary Modeling Through Dimension Expansion

Luke Bornn

Assistant Professor
Department of Statistics
Harvard University

If atmospheric, agricultural, and other environmental systems share one underlying theme it is complex spatial structures, being influenced by such features as topography and weather. Ideally we might model these effects directly; however, information on the underlying causes is often not routinely available. Hence, when modeling environmental systems there exists a need for a class of spatial models which does not rely on the assumption of stationarity.

In this talk, we propose a novel approach to modeling nonstationary spatial fields. The proposed method works by expanding the geographic plane over which these processes evolve into higher dimensional spaces, transforming and clarifying complex patterns in the physical plane. By combining aspects of multi-dimensional scaling, group lasso, and latent variable models, a dimensionally sparse projection is found in which the originally nonstationary field exhibits stationarity. Following a comparison with existing methods in a simulated environment, dimension expansion is studied on a classic test-bed data set historically used to study nonstationary models. Following this, we explore the use of dimension expansion in modeling air pollution in the United Kingdom, a process known to be strongly influenced by rural/urban effects, amongst others, which gives rise to a nonstationary field.


January 30, 2013

12-1pm, 182 George St, Room 110

Composite Likelihood

Nancy Reid

University Professor
Department of Statistics
University of Toronto

In complex multivariate settings, it is sometimes much simpler to construct a 'likelihood' from lower dimensional marginal or conditional distributions: the general construction was called composite likelihood in Lindsay (1988). Under mild conditions the resulting composite maximum likelihood estimate is consistent and asymptotically normal, with asymptotic variance of the 'sandwich' form that also arises in mis-specified models. A recent workshop at the Banff International Research Station on composite likelihood had as one goal to identify important directions for future research in the theory of composite likelihood, in software development, and in methodology for different areas of application. I will give an overview of the progress made at this workshop and highlight areas where more research is needed.


November 14, 2012

12-1pm, 182 George St, Room 110

Linking signaling pathways and dynamic regulatory networks

Anthony Gitter

Postdoctoral Researcher
Microsoft Research New England

Adaptation to ever-changing environmental conditions is vital to the survival of all organisms. In many stress responses upstream proteins detect an environmental stimulus and propagate signals to the nucleus where transcription factors activate or inhibit genes. Although these upstream proteins and temporal changes in gene expression can be readily observed, the pathways and transcription factors involved remain hidden. We present a strategy for integrating large-scale data to discover the signaling and transcriptional components of stress response. Our model captures the dynamics of transcription factor activity and the directed pathways that activate these regulators, which are inferred from an undirected protein interaction network. Because our approach requires little condition-specific data, it is widely applicable to many conditions and species. Joint work with Miri Carmi, Naama Barkai, and Ziv Bar-Joseph.


November 7, 2012

12-1pm, 182 George St, Room 110

Natural image statistics and image restoration

Yair Weiss

Professor
School of Computer Science and Engineering
The Hebrew University of Jerusalem

Learning a statistical model of natural images is a longstanding topic of research in disciplines ranging from engineering to computational neuroscience. Somewhat embarrassingly, however, algorithms based on these statistical models do not give state-of- the-art performance in image restoration, and are outperformed by "block matching" methods that do not have an explicit probabilistic model. In this work, we show that many widely used statistical models of images are actually not very good density models and a simple, unconstrained, Gaussian Mixture Model (GMM) can give much higher likelihood to unseen images. Using the learned GMM, we obtain state-of-the-art image restoration performance and by examining what the GMM has learned we obtain new insights into the statistics of natural images that are not captured by most existing models.

Joint work with Daniel Zoran.


October 24, 2012

12-1pm, 182 George St, Room 110

Active Learning under Margin Assumptions

Sivan Sabato

Postdoctoral Researcher
Microsoft Research New England

We derive and analyze a new, efficient, pool-based active learning algorithm for half-spaces. Most previous algorithms show exponential improvement in the label complexity assuming that the distribution over the instance space is close to uniform. This assumption rarely holds in practical applications. Instead, we study the label complexity under a large-margin assumption---a much more realistic condition, as evident by the success of margin-based algorithms such as SVM. Our algorithm is computationally efficient and comes with formal guarantees on its label complexity. The analysis is based on new results for sub-modular optimization. The algorithm also naturally extends to the non-separable case and to non-linear kernels. We further show experimentally that our approach yields superior label complexity to previous active learners on real problems.

Joint work with Alon Gonen and Shai Shalev-Shwartz.


October 3, 2012

12-1pm, 182 George St, Room 110

Recovering Large Networks via Optimizing Non-likelihood Functions

Rossi Luo

Assistant Professor
Department of Biostatistics and Center for Statistical Sciences
Brown University

Graphical models are used to describe the relationships between multiple variables, and a class of undirected graphical models can be uncovered from estimating covariance matrices. Popular approaches include optimizing (regularized) likelihood functions, especially in the setting when the sample size is much smaller than the number of variables. In this high dimensional setting, I will describe a few non-likelihood optimization approaches to estimate large covariance and inverse covariance matrices. These approaches are based on exploring algebraic properties of these matrices and low dimensional assumptions, such as sparsity and low rank. Convergence rates of these estimators are obtained for finite samples, when the underlying distribution has either exponential or polynomial tails. All the proposed methods can be formulated as convex optimization problems. A first-order algorithm is proposed to solve one of the non-smooth optimization problems, and its global iterative complexity is also proved. Numerical merits of these methods are demonstrated using simulation and real datasets from brain sciences.


May 16, 2012

12-1pm, Metcalf Auditorium (101)

Division of Applied Mathematics & Center for Vision Research Seminar

Lessons from Photographing and Identifying the Word's Plant Species

Peter Belhumeur

Professor
Department of Computer Science
Columbia University

Columbia University, the University of Maryland, and the Smithsonian Institution are working on visual recognition software to help identify species from photographs. I will discuss our work on developing Leafsnap -- the first in a series of electronic field guides. As part of this work, we have completed photographing close to one third of the world's plant species and have begun capturing beautiful high-resolution images of live specimens. Our work has led us in new research directions for the visual recognition of human faces, dog breeds, and bird species, including the adoption of centuries-old techniques from taxonomy for the process of labeling images with visual attributes and object parts. In particular, I will show that it is possible to automatically recognize a wide range of visual attributes and parts in images and use them in numerous applications of computer vision.


May 9, 2012

12-1pm, 182 George St, Room 110

Computational Challenges in Molecular Medicine

Donald Geman

Professor
Department of Applied Mathematics and Statistics
The Johns Hopkins University

Despite the "omics" revolution, advances in clinical practice through fundamental research in systems biology are rare. First, I will talk about the technological, mathematical and translational barriers encountered in attempting to apply statistical learning to cancer genomics using high-throughput transcriptional data. Then I will propose a methodology based on comparisons of molecular counts, and argue that these are easy to interpret, can account for combinatorial interactions among genes and gene products and might even support a mechanistic interpretation for the underlying decision rules. These ideas will be illustrating by predicting disease phenotypes, the site of origin of metastatic tumors and pathway deregulation.


April 25, 2012

12-1pm, 182 George St, Room 110

Hallucinating scene detail and recognizing scene attributes

James Hays

Assistant Professor
Department of Computer Science
Brown University

In this talk I will discuss two recent projects. First I will present a massively data driven approach to super-resolution. In contrast to recent image enhancement literature, we show that image databases can be more useful than internal image statistics if the right representations are used. Our super-resolution method can insert scene-specific textures and transitions beyond the capabilities of prior work. Second, I will introduce the new SUN Attribute Database. We use crowd-sourcing to discover a taxonomy of attributes and label 14,000 images from 700 scene categories with more than one hundred attributes. We show the first results for attribute recognition and argue that attributes are a natural way to describe the space of scenes.


April 18, 2012

12-1pm, 182 George St, Room 110

LP Relaxations for Global Models in Natural Language Processing

Sebastian Riedel

Research Scientist
University of Massachusetts Amherst

Much work in NLP relies on pipelines of classifiers. For example, when mining biomedical text we usually first extract proteins and other entities, and then determine their relations to each other. Such pipelines suffer from cascading errors: once an upstream processor errs, downstream processors often cannot recover. This has led to work that tackles NLP problems in a more global and joint manner.

In this talk I will first present our work on addressing biomedical event extraction as global optimization problem. This approach achieved state-of-the-art results on several benchmark datasets, and ranked first in the latest international competition on this task. I will illustrate the enabling technology behind our global approach: Linear Programming (LP) relaxations of the global optimization problem. In particular, I will show how we divide event extraction into tractable subproblems and apply dual decomposition to efficiently find LP optima.

In the second part of my talk I will illustrate our ongoing work on addressing the computational challenge that arises from a superlinear number of variables in our LP relaxations. I will show how we use row and column generation to reduce runtime of a global dependency parsing model by 8 times, without loss of optimality guarantees. Our method relies on upper bounds for the reduced costs of variables which we derive from "grammar constants" of our probabilistic model.


March 28, 2012

12-1pm, 182 George St, Room 110

What is the neural foundation of the vision process?

Christoph von der Malsburg

Senior Fellow
Frankfurt Institute of Advanced Studies

The current consensus version of the neural basis of vision -- or of cognition in general -- makes it difficult to model some of the central operations to be performed during perception and learning. I will offer the dynamic link architecture as an alternative physical basis for cognitive processes, will discuss it as a natural framework for vision and other cognitive processes, and will illustrate and support the argument with the help of a concrete model of invariant object recognition.


March 21, 2012

12-1pm, 182 George St, Room 110

The tree of life and the evolution of genome function

Casey Dunn

Manning Assistant Professor
Ecology and Evolutionary Biology
Brown University

There are estimated to be about 10 million species on the planet, though much of what we know about genome function is the result of work in about 10 species - the laboratory model organisms. Work on model organisms has been extraordinarily productive over the past 100 years, but our focus on such a small number of organisms means we have only looked through a keyhole at the diversity genome function in life on earth. Many interesting biological phenomena remain out of reach, we know little about gene function in the context of macro-evolution and ecosystem interactions, and we have only observed gene function within narrow environmental conditions. We are now on the cusp of being able to do functional genomics research in the other 10 million species on the plant, right in their natural environment without growing them in the lab. But to do so will require a complete retooling of data acquisition, entirely new analysis approaches, and a shift in perspective that embraces differences between species rather than obsess over global invariants. Here I outline some of these challenges, including the new analysis challenges we face.


February 8, 2012

12-1pm, 182 George St, Room 110

Exploiting Sparse Structure by Spectral Connectivity Analysis

Ann Lee

Associate Professor
Department of Statistics
Carnegie Mellon University

For naturally occurring data, the dimension of the given input space is often very large while the data themselves have a low intrinsic dimensionality. Spectral kernel methods are non-linear techniques for transforming data into a coordinate system that efficiently reveals the underlying structure -- in particular, the "connectivity" -- of the data. In this talk, I will focus on one particular technique -- diffusion maps -- but the analysis can be used for other spectral methods as well. I will give examples of various applications of the method in high-dimensional regression and density estimation. I will also present a new extension of the diffusion framework to comparing distributions in high-dimensional spaces with an application to image retrieval and texture discrimination. (Part of this work is joint with R.R. Coifman, D. Liu, C. Schafer and L. Wasserman.)


February 1, 2012

12-1pm, 182 George St, Room 110

Statistical analysis of populations with interacting and interfering units

Edo Airoldi

Assistant Professor
Department of Statistics, and FAS Center for Systems Biology
Harvard University

A number of scientific endeavors of current national and international interest involve populations with interacting and/or interfering units. In these problems, a collection of partial measurements about patterns of interaction and interference (e.g., social structure and familial relations) is available, in addition to the more traditional measurements about unit-level outcomes and covariates. Formal statistical models for the analysis of this type of data have emerged as a major topic of interest in diverse areas of study. Probability models on networks date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online social networking websites such as Facebook and LinkedIn, and a host of more specialized professional networking communities has intensified interest in the study of networks, structured measurements and interference. In this talk, I will review a few ideas and open areas of research that are central to this burgeoning literature, placing emphasis on inference and other core statistical issues. Topics include elements of sampling and inference from non-ignorable (network sampling) designs, and semi-parametric modeling, with hints to the applications to social, biological and information networks that motivate these statistical problems.


November 30, 2011

12-1pm, 182 George St, Room 110

Exploring the role of ventral premotor cortex in reach-to-grasp movements: neural trajectories through spike train similarity space

Carlos Vargas-Irwin

Postdoctoral Research Associate in Neuroscience
Brown University

Dimensionality reduction applied to neural ensemble data has led to the concept of a 'neural trajectory', a low-dimensional representation of how the state of the network evolves over time. Here we present a novel neural trajectory extraction algorithm which combines spike train distance metrics (Victor and Purpura, 1996) with dimensionality reduction based on local neighborhood statistics (van der Maaten and Hinton, 2008.) . We apply this technique to describe and quantify the activity of primate ventral premotor cortex neuronal ensebles in the context of a cued reaching and grasping task with instructed delay.

References:
Victor and Purpura. Nature and precision of temporal coding in visual cortex: a metric-space analysis. J Neurophysiol. 1996 Aug;76(2):1310-26.
L.J.P. van der Maaten and G.E. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.


November 2, 2011

12-1pm, Barus & Holley Room 190

Perceptual Fragments: Bottom-Up and Top-Down Use of Shape in Object Recognition

Benjamin Kimia

Professor
Division of Engineering
Brown University

The bottom-up “segmentation followed by recognition” strategy has for some time now given way to feature-based discriminative recognition with significant success. As the number of categories and exemplars per category increases, however, low-level features are no longer sufficiently discriminative, motivating the construction and use of more complex features. It is argued here that these complex features will necessarily be encoding shape and this in turn requires curves and regions, thus reviving aspects of bottom-up segmentation strategies. We suggest that the demise of segmentation was due to prematurely committing to a grouping in face of ambiguities and propose a framework for representing multiple grouping options in a containment graph. Specifically, we use contour symmetry to partition the image into atomic fragments and define transforms to iteratively grow these atomic fragments into mote distinctive perceptual fragments, the nodes of the containment graph. We also briefly present a fragment-based language for generating shapes and the use of fragments in top-down category recognition. The bottom-up and top-down processes are then integrated by interaction through the mid-level representation of perceptual fragments.


October 19, 2011

12-1pm, 182 George St, Room 110

Statistical challenges in neural data analysis

Liam Paninski

Associate Professor
Department of Statistics
Columbia University

Systems and circuit neuroscience have recently experienced something of a renaissance, driven by a remarkable explosion in the development of groundbreaking new experimental tools. These methods have in turn opened up a variety of challenging statistical problems, in which we must often perform some high-dimensional inference under strong computational constraints (for example, in some cases real-time processing is required). This talk will review some recent progress on three exemplary problems, each of fundamental neuroscientific importance: 1) Optimal filtering and smoothing of voltage signals on large, complex dendritic trees; 2) Optimal decoding of sensory information from the activity of large neural populations, and 3) Inference of connectivity in large neuronal networks given limited, noisy observations.


October 12, 2011

12-1pm, 182 George St, Room 110

Spectral Methods for Learning Graphical Models

Sham Kakade

Senior Research Scientist
Microsoft Research New England
Associate Professor of Statistics
The Wharton School, University of Pennsylvania

This work presents a methodology for learning graphical models with hidden nodes through algebraic techniques (in particular, matrix decomposition and spectral methods), using independent samples of the observed variables. The talk focuses on tree models, and covers two aspects of the underlying learning problem: parameter estimation and structural learning. The underlying idea is to utilize the spectral decomposition of the second moment matrix to reveal the latent structure.

The first part is concerned with parameter estimation. Here, we present an an efficient and provably correct algorithm for learning HMMs (i.e. recovering the correct HMM dynamics), with a sample complexity depending on some mild conditions of the underlying system. The algorithm is also simple, employing only a singular value decomposition and matrix multiplications, and does not suffer from local minimum issues in non-convex optimization (such as for more traditional approaches, including the EM algorithm), and it handles high dimensional observations and long range dependencies more easily. The method can be extended to estimating parameters for nonlinear systems and general tree structured graphical models with unobserved nodes.

The second part is concerned with structural learning, where we provide the Spectral Recursive Grouping algorithm, an efficient and simple procedure for recovering the underlying tree topology of a broad class of multivariate tree models with hidden nodes. Exact recovery of the tree structure can be established based on certain natural dependencies on statistical and structural properties of the underlying joint distribution.

Join work with: Daniel Hsu, Tong Zhang; Anima Anandkumar, Kamalika Chaudhuri, Le Song


October 5, 2011

12-1pm, 182 George St, Room 110

Dynamic regulation of decision threshold by frontal cortex and basal ganglia

Michael Frank

Associate Professor
Department of Cognitive, Linguistic, & Psychological Sciences
Department of Psychiatry and Human Behavior
Brown University

The basal ganglia and frontal cortex interact to support reward-based decision making. While medial prefrontal cortex (mPFC) is implicated in processing uncertainty and reinforcement conflict, the subthalamic nucleus (STN) is thought to act as a brake on corticostriatal function, preventing premature impulsive responding. Simulations using a neural circuit model combined with drift diffusion model analysis show that the STN acts to temporarily increase the decision threshold as a function of conflicting cortical decision plans, and that this same decision threshold modulation accounts for human behavioral data. We tested these posited neural mechanisms in experiments with electrophysiological recordings over mPFC and in the STN (intraoperative recordings from Parkinson's patients undergoing deep brain stimulation surgery). We show a relationship between mPFC activity and decision threshold adjustment which is altered by STN manipulation. These data support the hypothesis that frontal-basal ganglia communication regulates decision processes as a function of reinforcement conflict.


September 14, 2011

12-1pm, 182 George St, Room 110

Non-parametric Priors for Segmentation of Medical Images

Polina Golland

Associate Professor
Department of Electrical Engineering and Computer Science, and CSAIL
Massachusetts Institute of Technology

We propose a non-parametric probabilistic model for prior knowledge in segmentation of medical images. The resulting inference algorithms register individual training images to the new image, transfer the segmentation labels and fuse them to obtain the final segmentation of the test subject. Our generative model yields previously proposed label fusion algorithms as special cases, and also leads to a new variant that aggregates evidence for the segmentation label locally. We demonstrate the advantages of our approach in two clinical applications: segmentation of neuroanatomical structures and segmentation of the left heart atrium whose shape varies significantly across the population.


May 11, 2011

12-1pm, 182 George St, Rm 110

Division of Applied Mathematics & Center for Vision Research Seminar

Statistical structures of natural scenes and neural ensembles activities

Tai-Sing Lee

Associate Professor
Computer Science Department and Center for the Neural Basis of Cognition
Carnegie Mellon University

Statistical structures of the natural environment could provide important clues for understanding the neural representations and functional circuitry underlying visual perception. We studied the statistical structures of 3D scenes, and evaluated whether these structures are encoded in the tuning properties and functional connectivities of neurons in the early visual cortex. We found that information encoded in the distribution of disparity-tuned neurons at the population level matches the statistical structures of natural scenes in a number of aspects: the distribution of tuning curves, the correlation in tunings to different visual cues, and the functional connectivity of neurons. We also found that the ensemble activities of neurons support the notion of a probabilistic population code that allows the encoding of statistical priors of the natural scenes, the uncertainty and the posterior distribution of inferred perceptual attributes. These findings provide preliminary empirical insights on how pattern theory can be realized at the level of neuronal population and neuronal circuits.


May 4, 2011

12-1pm, 182 George St, Rm 110

Logistic Regression on Data Streams

Kevin Kochanek

Research Mathematician
US Department of Defense

The emergence of high-volume data sources in fields such as finance, telecommunications, and bio-medicine has led to a surge of interest in streaming data analysis. In this talk we consider the problem of constructing generalized linear models on data streams. We also present a novel streaming adaptive histogramming algorithm to facilitate our approach that is also useful in other contexts.


April 20, 2011

12-1pm, 182 George St, Rm 110

Self-folding of polyhedra-- experiments and a little theory

Govind Menon

Associate Professor
Division of Applied Mathematics
Brown University

A fascinating development in materials chemistry is the construction of `large' components by self-assembly. I will describe some fascinating experiments of David Gracias' lab, and some preliminary mathematical work on mesoscale polyhedra that self-fold. Many problems are open, and the main purpose of this talk is to advertise some of them.


April 6, 2011

12-1pm, 182 George St, Rm 110

Division of Applied Mathematics & Center for Vision Research Seminar

A High-Throughput Screening Approach to Biologically-Inspired Object Recognition

David Cox

Principal Investigator, Visual Neuroscience Laboratory
The Rowland Institute at Harvard

Biological visual systems are currently unrivaled by artificial systems in their ability to recognize faces and objects in highly variable and cluttered real-world environments. Biologically-inspired computer vision systems seek to capture key aspects of the computational architecture of the brain, and such approaches have proven successful across a range of standard object and face recognition tasks. However, while many models of biological object recognition share a common set of "broad-stroke" properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model. Since the number of such parameters (explicit or implicit) is typically large, and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored. Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea, or because the correct "parts" have not been tuned correctly, assembled at sufficient scale, or provided with enough training.

In this talk, I'll present a high-throughput search approach for exploring the range of biologically-inspired visual models, that emphasizes simplicity, careful evaluation procedures, and scale. Systems discovered using this approach are shown to perform surprisingly well across a range of visual problem domains and achieve state-of-the-art performance on a collection of challenging real-world face recognition datasets. Finally, I'll discuss current limitations and future opportunities of this approach and argue for renewed attention to data set quality and model scalability.


March 16, 2011

12-1pm, 182 George St, Rm 110

Solving Inference, Optimization, and Constraint Satisfaction Problems with the Divide & Concur and Belief Propagation Message-Passing Algorithms

Jonathan Yedidia

Distinguished Member Research Staff
Mitsubishi Electric Research Labs, Cambridge, MA

In "probabilistic inference" problems one tries to estimate the true state of some quantities of interest, given only noisy or ambiguous sensor measurements of related quantities. Such problems arise in a very wide variety of fields, with applications in communications, signal processing, computer vision, speech and audio recognition, machine learning, physics, and robotics.

In this talk, I will describe and compare two particularly important algorithms that can solve probabilistic inference problems, as well as related constraint satisfaction and optimization problems: the celebrated "belief propagation" (BP) algorithm; and the "divide and concur" (D&C) algorithm. I will show that the D&C algorithm, which was recently introduced by Gravel and Elser for solving non-convex constraint satisfaction problems, can also be understood as an message-passing algorithm that can be used on an even wider class of inference and optimization problems than BP. Although less well-known, the D&C algorithm has some notable advantages compared with BP, in that it more naturally deals with problems with continuous valued variables, or with variables that lack local evidence. Another advantage of D&C is that its "difference-map" dynamics enables it to avoid "traps."

If time permits, I will also describe a new decoder (developed with Yige Wang and Stark Draper) for low-density parity check codes that combines the ideas of the D&C and BP algorithms. This "difference-map belief propagation" (DMBP) decoder significantly improves error-floor performance compared with state-of-the-art BP decoders, while maintaining a similar computational complexity.


March 9, 2011

12-1pm, 182 George St, Rm 110

A history of applying principal component analyses to human population genetic data

Sohini Ramachandran

Assistant Professor
Department of Ecology and Evolutionary Biology, and
Center for Computational Molecular Biology
Brown University

Nearly 30 years ago, Luca Cavalli-Sforza and colleagues pioneered the use of PCA in population genetics, producing maps that summarized observed human genetic variation across many continental regions. Patterns in these maps were interpreted as signatures of specific migration events. As genetic datasets have grown in the number of markers used and as sequencing technology has improved, PCA has been widely applied to population genetic datasets. I will present a review of how PCA is used in population genetics and discuss the conclusions that have been drawn via this method, as well as other observed genetic patterns in geographic space in my research on both the inference of past migrations using extant human genetic data and the genetic underpinnings of disease.


February 16, 2011

12-1pm, 182 George St, Rm 110

Patterns of Thought (in humanoid robots)

Yiannis Aloimonos

Professor
Department of Computer Science
University of Maryland

Patterns of thought, for a cognitive system that interprets a scene or performs a task, consist of perceptual, motor and linguistic representations of the world, controlled by an attention mechanism. The role of the attention mechanism is to facilitate parsing of the different representations according to a grammar of action. In this talk I will introduce, for the first time, an action grammar. It is really a disguised version of Chomsky’s Universal Grammar adapted in a minimalist framework. Parsing requires an attention mechanism comprising visual human and object filters, active segmentation routines and a context implementing system utilizing language. I will describe our developments in learning visual filters, achieving robust segmentation techniques and integrating vision with language and our implementations of these ideas in the field of Humanoid Robotics and the general area of video interpretation.


February 9, 2011

12-1pm, 182 George St, Rm 110

Division of Applied Mathematics & Center for Vision Research Seminar

Learning on Analytic Manifolds

Fatih Porikli

Senior Principal Research Scientist and Technical Manager, Imaging Group
Mitsubishi Electric Research Labs, Cambridge, MA

A large number of natural phenomena can be formulated on analytic manifolds. More specifically in computer vision, such underlying notions emerge in multi-factor analysis including feature selection, pose estimation, structure from motion, appearance tracking, and shape embedding. Unlike Euclidean spaces, analytic manifolds does not exhibit global homeomorphism, thus, differential geometry is applicable only locally. This prevents application of conventional inference and learning methods, which require vector norms.

Recently we introduced appearance based object descriptors and motion transformations that exhibit Riemannian manifold structure on positive definite matrices that enable projections of the original problems onto tangent spaces. In this manner, we do not need to flatten the underlying manifold or discover its topology. This talk will demonstrate entertaining results of manifold learning on human detection, regression tracking, unusual event analysis, and affine pose estimation.


November 17, 2010

12-1pm, 182 George St, Room 110

Point process adaptive filters and the analysis of ensemble neural spiking activity

Uri Eden

Assistant Professor
Department of Mathematics and Statistics
Boston University

Although it is well known that brain areas receive, process and transmit information via sequences of sudden, stereotyped electrical impulses, called action potentials or spikes, most analyses of neural data ignore the localized nature of these events. The theory of point processes offers a unified approach to modeling the firing properties of spiking neural systems, and assessing goodness-of-fit between a neural model and observed spiking data.

We develop a point process modeling framework and state space estimation algorithms to describe and track the evolution of dynamic representations from large neural ensembles. This allows us to derive a toolbox of estimation algorithms and adaptive filters to address questions of static and dynamic encoding and decoding. In our analysis of these filtering algorithms, we draw analogies to well-studied linear estimation algorithms for continuous valued processes, such as the Kalman filter and its discrete and continuous time extensions.

These methods will be illustrated in the analysis of spatially specific spiking activity in rat hippocampus. Using simple point process models, we are able to accurately characterize the localized spiking activity of these neurons as a function of the animal's position in its environment, track changes in their firing properties, reconstruct the animal's past movements, and predict its future behavior from a population of spiking neurons.


November 3, 2010

12-1pm, 182 George St, Room 110

Bottom-up and top-down processing in visual perception

Thomas Serre

Assistant Professor
Department of Cognitive, Linguistic, & Psychological Sciences
Brown University

Perception involves a complex interaction between feedforward sensory-driven information and feedback attentional, memory, and executive processes that modulate such feedforward processing. A mechanistic understanding of feedforward and feedback integration is a necessary step towards elucidating key aspects of visual and cognitive functions and dysfunctions.

In this talk, I will describe a computational framework for the study of visual perception. I will present computational as well as experimental evidence suggesting that bottom-up and top-down processes make a distinct and essential contribution to the recognition of complex visual scenes. A feedforward hierarchical architecture may provide a satisfactory account of "immediate recognition" corresponding to the first few hundred milliseconds of visual processing. However, such an architecture may be limited in recognizing complex visual scenes. I will show how attentional mechanisms and cortical feedback may help improve object recognition performance in complex cluttered scenes.


October 20, 2010

12-1pm, 182 George St, Room 110

Discovering Influential Variables: A Partition Retention Approach

Herman Chernoff

Professor Emeritus
Department of Statistics, Harvard University
Department of Applied Mathematics, MIT

A general approach, pioneered by S.H. Lo and T. Zheng, the method of Partition Retention, is designed to identify those variables among many possible explanatory variables, which have an influence on a dependent variable. This method can be successful when the influence depends on the interaction of several of these variables and the marginal influences of the individual variables are negligible. In those cases where the number of candidates is huge, thousands, this method involves the idea of resuscitating apparently "dead" or useless variables which depend on interactions. The method is computer intensive and involves measuring influence of a small group of candidates, from which the least influential are dropped one at a time until only promising ones are retained. This procedure is repeated many times with small randomly selected subgroups of the candidate variables, and attention is focused on those which are retained most often.


October 6, 2010

12-1pm, CIT 241 (Swig Boardroom) *** change in location ***

Learning Probabilistic Models with Deep Hierarchical Structures

Ruslan Salakhutdinov

Postdoctoral Fellow
Department of Brain and Cognitive Sciences and CSAIL
Massachusetts Institute of Technology

Building intelligent systems that are capable of extracting higher-order knowledge from high-dimensional data and successfully transferring that knowledge to learning new concepts lies at the core of solving many AI related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that involve many layers of nonlinear processing.

In this talk I will first introduce a broad class of probabilistic generative models called Deep Boltzmann Machines (DBMs) that contain many layers of latent variables. I will describe a new learning algorithm for this class of models that uses variational methods and Markov chain Monte Carlo (MCMC). This new learning algorithm, in addition to a bottom-up pass, can incorporate top-down feedback, which allows DBMs to better propagate uncertainty about ambiguous inputs. I will show that these deep models can learn interesting representations and can be successfully applied in many application domains, including information retrieval, object recognition, and nonlinear dimensionality reduction. In the second part of the talk, I will describe new ways of developing more complex systems that combine Deep Boltzmann Machines with more structured hierarchical Bayesian models. I will show how these hybrid models can learn a deep hierarchical structure for sharing knowledge across hundreds of visual categories, which allows efficient learning of new categories from few, even just one, examples -- a problem known as 'one-shot learning'.


*May 10, 2010 (Monday)

4:00-5:30pm, 182 George St, Rm 110

Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes

Erik Sudderth

Assistant Professor
Department of Computer Science
Brown University

We explore statistical frameworks for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman-Yor (PY) process. This generalization of the Dirichlet process leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY priors, we use non-Markov Gaussian processes to infer spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes.


*April 23, 2010 (Friday)

4-5:30pm, 182 George St, Rm 110

A Generative Model Approach to Fraud Detection

Brian Lucena

Chief Mathematician
Guardian Analytics Inc.

Abstract: Standard discriminative approaches rely on seeing lots of examples of both classes in order to distinguish between the two. However, in the specific case of online banking fraud, these techniques are hampered by two major practical concerns. First, there is a huge asymmetry in the amount of data available about each class – hundreds of millions of legitimate transactions and perhaps a few hundred fraudulent ones. Second, fraudsters quickly change their tactics, and so even those few data points on fraud may not be very useful in predicting future fraud. Consequently, we take a model-based approach to exploit the natural differences between individual users. This talk will describe the FraudMAP system and discuss several interesting problems that arise in its development.


April 21, 2010

12-1pm, 182 George St, Rm 110

Segmentation of Image Ensembles via Latent Atlases

Tammy Riklin-Raviv

Postdoctoral Associate
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

The images acquired via medical imaging modalities are frequently subject to low signal-to-noise ratio, bias field and partial volume effects. These artifacts, together with the naturally low contrast between image intensities of some neighboring structures, make the extraction of regions of interest (ROIs) in clinical images a challenging problem. Probabilistic atlases, typically generated from comprehensive sets of manually labeled examples, facilitate the analysis by providing statistical priors for tissue classification and structure segmentation. However, the limited availability of training examples that are compatible with the images to be segmented renders the atlas-based approaches impractical in many cases. In the talk I will present a generative model for joint segmentation of corresponding regions of interest in a collection of aligned images that does not require labeled training data. Instead, the evolving segmentation of the entire image set supports each of the individual segmentations. This is made possible by iteratively inferring a subset of the model parameters, called the spatial parameters, as part of the joint segmentation processes. These spatial parameters are defined in the image domain and can be viewed as a latent atlas. Our latent atlas formulation is based on probabilistic principles, but we solve it using partial differential equations and energy minimization criteria. We evaluate the method successfully for the segmentation of cortical and subcortical structures within different populations and of brain tumors in a single-subject multi-modal longitudinal experiment.


*April 16, 2010 (Friday)

4-5:30pm, 182 George St, Rm 110

Division of Applied Mathematics / Center for Statistical Sciences Seminar

Multi-Resolution Inference of Stochastic Models from Partially Observed Data

Samuel Kou

Professor
Department of Statistics
Harvard University

Stochastic models, diffusion models in particular, are widely used in science, engineering and economics. Inferring the parameter values from data is often complicated by the fact that the underlying stochastic processes are only partially observed. Examples include inference of discretely observed diffusion processes, stochastic volatility models, and double stochastic Poisson (Cox) processes. Likelihood based inference faces the difficulty that the likelihood is usually not available even numerically. Conventional approach discretizes the stochastic model to approximate the likelihood. In order to have desirable accuracy, one has to use highly dense discretization. However, dense discretization usually imposes unbearable computation burden. In this talk we will introduce the framework of Bayesian multi-resolution inference to address this difficulty. By working on different resolution (discretization) levels simultaneously and by letting the resolutions talk to each other, we substantially improve not only the computational efficiency, but also the estimation accuracy. We will illustrate the strength of the multi-resolution approach by examples.


April 14, 2010

12-1pm, 182 George St, Rm 110

Preserving knowledge through media transitions: ushering the heritage of India into the digital age

Peter Scharf

Director of The Sanskrit Library and Senior Lecturer in Sanskrit
Department of Classics
Brown University

Human beings express knowledge in various modes: through images in visual art; through movement in dance, theatrical performance, and gestures; and through speech in spoken language. Each of these means of expression includes means to encode knowledge, and each is used to express knowledge originally encoded in one of the others. In particular, writing represents knowledge originally encoded in speech. Certain media dominate as the primary methods for the transmission of detailed information at different times and places. Oral tradition dominated the tradition of Sanskrit in India in the first and second millennia B.C.E. Writing overtook orality in the first millennium C.E. and dominated until replaced by printing in the 19th century. At the dawn of the 21st century, the digital medium is replacing printing as the dominant means of knowledge transmission.

While brittle editions of printed Sanskrit books are removed from easy access to library annexes, the Sanskrit Library is working to usher the heritage of India into the digital age. Scharf designed an encoding scheme to represent Sanskrit accurately and revised the Unicode Standard to include characters necessary for the representation of the ancient Vedic texts. He directed an NSF-funded project 2006-2009 to build a digital Sanskrit library that integrates lexical resources, linguistic software, and machine-readable texts. Now he directs an NEH-funded project to produce high-quality digital images of manuscripts at Brown University and the University of Pennsylvania and to develop word-spotting technology to facilitate searching even in digital images not amenable to optical character recognition (OCR).


*April 9, 2010 (Friday)

4-5:30pm, 182 George St, Rm 110

Analysis of Molecular Networks

Mark Gerstein

Albert L. Williams Professor of Biomedical Informatics
Departments of Molecular Biophysics and Biochemistry and of Computer Science
Yale University

My talk will be concerned with understanding protein function on a genomic scale. My lab approaches this through the prediction and analysis of biological networks, focusing on protein-protein interaction and transcription-factor-target ones. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various topological statistics. In particular, I will discuss a number of recent analyses: (1) Improving the prediction of molecular networks through systematic training-set expansion; (2) Showing how the analysis of pathways across environments potentially allows them to act as biosensors; (3a) Analyzing the structure of the regulatory network indicates that it has a hierarchical layout with the "middle-managers" acting as information bottlenecks; (3b) Showing these middle managers tend be arranged in various "partnership" structures giving the hierarchy a "democratic character"; (4) Showing that most human variation occurs on the periphery of the protein interaction network; and (5) Developing useful web-based tools for the analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Analysis of Diverse Regulatory Networks in a Hierarchical Context: Consistent Tendencies for Collaboration in the Middle Levels. N Bhardwaj et al. PNAS (2010, in press)

Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. PM Kim, JO Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of Biological Networks from Limited and Uneven Reliable Interactions. KY Yip, M Gerstein (2008) Bioinformatics

Quantifying environmental adaptation of metabolic pathways in metagenomics. T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein (2009) PNAS


April 7, 2010

12-1pm, 182 George St, Rm 110

Geometry of the space of 2D shapes equipped with the Weil-Petersson metric

Sergey Kushnarev

Graduate Student
Division of Applied Mathematics
Brown University

The study of planar simple closed curves (or "2D shapes") and their similarities is a central problem in the field of computer vision. It arises in the task of characterizing and classifying objects from their observed silhouette. Defining natural distance between 2D shapes creates a metric on the infinite-dimensional space of shapes. In this talk I will describe one particular metric, which comes from the conformal mapping of the 2D shapes, via the theory of Teichmuller spaces. In this space every simple closed curve (or a 2D shape) is represented by a smooth self-map of a circle. I will talk about a specific class of soliton-like geodesics on the space of shapes, called teichons. Some numerical examples of geodesics and effects of the curvature will be demonstrated.


March 24, 2010

12-1pm, 182 George St, Rm 110

Unsupervised Part of Speech Tagging: From graphical models to statistical models to "biological" models

Michael Lamar

Graduate student
Division of Applied Mathematics
Brown University

A longstanding problem in computational linguistics is the automatic extraction of part of speech categories from a corpus. We introduce this problem and discuss several purely contextual, unsupervised approaches to this problem beginning with Hidden Markov Models which, until recently, have set the standard for performance. We then formulate two new alternative approaches. The first is a purely statistical treatment which incorporates singular value decomposition and clustering. The second is a self-organizing model inspired by simple biological processes. We compare these new approaches to the HMM's and discuss why they achieve better performance.


March 17, 2010

12-1pm, 182 George St, Rm 110

Learning in Social Networks with Signals of Bounded Informativeness

Ilan Lobel

Postdoctoral Researcher
Microsoft Research New England

We study the (perfect Bayesian) equilibrium of a model of learning over a general social network. Each individual receives a signal about the underlying state of the world, observes the past actions of a stochastically-generated neighborhood of individuals, and chooses one of two possible actions. The stochastic process generating the neighborhoods defines the network topology (social network). We characterize pure-strategy equilibria for arbitrary stochastic and deterministic social networks and determine conditions under which there will be asymptotic learning, that is, the conditions under which, as the social network becomes large, individuals converge (in probability) to taking the right action. In previous work, we showed that when the likelihood ratio of the signals is unbounded, there is asymptotic learning when there is some minimal amount of "expansion in observations". In this talk, we show that with signals of bounded likelihood ratio, asymptotic learning is achieved if three conditions hold: expanding observations holds, there exists a subsequence of agents with unpersuasive neighborhoods and there exists a uniform lower bound on the probability of observing the history of past actions. This talk is based on joint work with Daron Acemoglu (MIT Economics), Munther Dahleh (MIT EECS) and Asu Ozdaglar (MIT EECS).


*March 12, 2010 (Friday)

4-5:30pm, 182 George St, Rm 110

Graphs and polytopes: learning structures with linear programming relaxations

Tommi Jaakkola

Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Combinatorial problems are commonplace in modeling. They arise in predicting likely values for variables (e.g., selection/orientation of residues in protein design) or learning the model structure itself. In recent years, a lot of effort has gone into developing linear programming (LP) relaxations for finding the most likely values for the variables in a given model. In this talk, I will focus on the related combinatorial problem of learning the model structure, i.e., solving for the highest scoring Bayesian network (BN) graph given data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal graph structure. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. I will also illustrate the strengths and weaknesses of the approach in the context of sample structure learning problems.

The talk covers joint work with David Sontag, Amir Globerson, and Marina Meila.


March 10, 2010

12-1pm, 182 George St, Rm 110

Respondent-Driven Sampling for Networks: Degrees of Uncertainty with Uncertain Degrees

Joe Blitzstein

Assistant Professor
Department of Statistics
Harvard University

Respondent-driven sampling (RDS) is a network sampling design (introduced by Heckathorn) which is widely and increasingly being used to study properties of individuals in a social network, e.g., HIV prevalance. The standard estimators make many strong assumptions, including that the degree d of an individual in the sample is known exactly and that this individual is equally likely to recruit any of his or her d neighbors in the social network.

Classical issues of model-based vs. design-based inference arise here, with new challenges from having also to account for the network structure. We explore the bias-variance tradeoff encountered in this setting, and show how the uncertainty in the measurement of degrees propagates into uncertainty in the RDS estimates. We then show how Bayesian and importance sampling ideas can be used to give obtain more reliable estimates.


March 3, 2010

12-1pm, 182 George St, Rm 110

Topic Models: Priors, Stop Words and Languages

Hanna Wallach

Senior Postdoctoral Research Associate
Department of Computer Science
University of Massachusetts Amherst

In this talk, I will compare several classes of structured priors for topic models, and show that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. This combination of priors substantially increases the robustness of topic models to the highly skewed word frequency distributions common in natural language. I will also present two topic models that cannot easily rely on standard stop word pruning techniques and therefore require such robustness. The first is a topic-based predictive language model that incorporates both n-gram statistics and latent topic variables, while the second is a polylingual topic model that discovers topics aligned across multiple languages from massive collections of interlinked documents.


February 17, 2010

12-1pm, 182 George St, Rm 110

Message Passing Algorithms for Compressed Sensing

Andrea Montanari

Assistant Professor
Department of Electrical Engineering and Department of Statistics
Stanford University

Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications.

Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization.

We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models.

Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.

This is based on joint work with D.L. Donoho, A. Maleki and with M. Bayati.


*February 12, 2010 (Friday)

4:00-5:30pm, Barus & Holley 190 [refreshments begin at 3:45pm]

Division of Applied Mathematics / Center for Vision Research Seminar

Learning Hierarchies of Sparse Visual Features

Yann LeCun

Silver Professor of Computer Science and Neural Science
The Courant Institute of Mathematical Sciences and Center for Neural Science
New York University

Intelligent tasks such object recognition, auditory scene analysis, or language understanding require the construction of good internal representations of the world. Internal representations (or "features") must be invariant (or robust) to irrelevant variations of the input, but must preserve the information relevant to the task. An important goal of our research, and an important challenge for Machine Learning over the next few years, is to devise methods that can automatically learn good internal representations from labeled and unlabeled data. Theoretical and empirical evidence suggest that the visual world is best represented by a multi-stage hierarchy, in which features in successive stages are increasingly global, invariant, and abstract. The main question is how can one train such deep architectures from unlabeled data and limited amounts of labeled data.

We describe a class of methods to train multi-stage systems in which each stage performs a series of convolutions followed by simple non-linearities. The unsupervised learning phase is based on sparse coding methods, but includes a feed-forward predictor that gives a quick approximation of the sparse code. A number of such stages are stacked and trained sequentially in an unsupervised manner. The entire system is then refined in a supervised manner.

An application to category-level object recognition with invariance to pose and illumination will be described. By stacking multiple stages of sparse features, and refining the whole system with supervised training, state-the-art accuracy can be achieved on standard datasets with very few labeled samples. A real-time demo will be shown. Another application to vision-based navigation for off-road mobile robots will be shown. After a phase of off-line unsupervised learning, the system autonomously learns to discriminate obstacles from traversable areas at long range using labels produced with stereo vision for nearby areas.

This is joint work with Y-Lan Boureau, Karol Gregor, Raia Hadsell, Koray Kavakcuoglu, and Marc'Aurelio Ranzato.


February 3, 2010

12-1pm, 182 George St, Rm 110

An analysis of connectivity in neuronal population recordings

Asohan Amarasingham

Postdoctoral Fellow
Center for Molecular and Behavioral Neuroscience
Rutgers University

Little is known about how neurons interact through their spike trains in natural conditions. A chief barrier is experimental: simultaneous neuronal recordings are extremely sparse. But recent advances are expanding the number of neurons that can be observed simultaneously in a behaving animal. We will describe a recent study in which we combined simultaneous recordings of medial prefrontal cortex neurons in behaving rats with careful statistical analysis to assess the evidence for functional signatures of anatomical and physiological connectivity in that neuronal population. We will use the scientific setting as a context in which to explore some basic issues in neurophysiological data analysis, with particular emphasis on spike train-resampling techniques. This is joint work with Shigeyoshi Fujisawa, Gyorgy Buzsaki, Matthew Harrison, and Stuart Geman.


*November 20, 2009 (Friday)

4:00-5:30 pm, 182 George St, Rm 110 [refreshments beginning at 3:45]

Applied Mathematics / Center for Statistical Sciences Seminar

Error controls for multiple hypothesis testing

Zhiyi Chi

Associate Professor
Department of Statistics
University of Connecticut

Multiple hypothesis testing is challenging when signals in data are weak, the distributions of noise and signals are not completely known, or there is statistical dependency in data. I will present some theoretical results on these issues. I will show that self-normalized large deviations can be useful in dealing with experiment designs for weak signals, linear programming can be used to evaluate p-values when data distributions are not completely known, and techniques similar to nonlinear filtering can be used to analyze the performance of multiple testing on hidden states of Markov models that may be nonstationary.


November 18, 2009

12-1pm, 182 George St, Rm 110

Spectral Filtering Approaches to Machine Learning

Lorenzo Rosasco

Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology

In this talk we discuss a computational framework for learning complex high dimensional data. The proposed methods rely on estimating regularized version of data-dependent kernel matrices, where regularization is achieved filtering out the unstable components corresponding to small eigenvalues. In many cases these matrices can be interpreted as empirical versions of underlying covariance matrices, integral operators or closely related objects, such as diffusion operators. We will describe different strategies to implement regularization and discuss the corresponding computational properties. The sample complexity of the methods is analyzed and shown to be optimal for a large class of problems. Experimental results in the context of supervised learning classification and vector fields estimation will be presented.


November 11, 2009

12-1pm, 182 George St, Rm 110

Generative Models for Image Analysis

Lo-Bin Chang

Graduate Student
Division of Applied Mathematics
Brown University

A probabilistic grammar for the grouping and labeling of parts and objects, when taken together with pose and part-dependent appearance models, constitutes a generative scene model and a Bayesian framework for image analysis. To the extent that the generative model generates features, as opposed to pixel intensities, the posterior distribution (i.e. the conditional distribution on part and object labels given the image) is based on incomplete information; feature vectors are generally insufficient to recover the original intensities. I will propose a way to learn pixel-level models for the appearances of parts. I will demonstrate the utility of the models with some experiments in Bayesian image classification.


*November 6, 2009 (Friday)

4:00-5:30 pm [refreshments beginning at 3:45 pm] Marcuvitz Auditiorium (Room 220), Sidney Frank Hall, 185 Meeting St.

Division of Applied Mathematics / Center for Vision Research Seminar

Understanding Visual Scenes

Antonio Torralba

Esther and Harold E. Edgerton Associate Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Human visual scene understanding is remarkable: with only a brief glance at an image, an abundance of information is available - spatial structure, scene category and the identity of main objects in the scene. In traditional computer vision, scene and object recognition are two visual tasks generally studied separately. However, it is unclear whether it is possible to build robust systems for scene and object recognition, matching human performance, based only on local representations. Another key component of machine vision algorithms is the access to data that describe the content of images. As the field moves into integrated systems that try to recognize many object classes and learn about contextual relationships between objects, the lack of large annotated datasets hinders the fast development of robust solutions. In the early days, the first challenge a computer vision researcher would encounter would be the difficult task of digitizing a photograph. Even once a picture was in digital form, storing a large number of pictures (say six) consumed most of the available computational resources. In addition to the algorithmic advances required to solve object recognition, a key component to progress is access to data in order to train computational models for the different object classes. This situation has dramatically changed in the last decade, especially via the internet, which has given computer vision researchers access to billions of images and videos. In this talk I will describe recent work on visual scene understanding that try to build integrated models for scene and object recognition, emphasizing the power of large database of annotated images in computer vision.


*November 2, 2009 (Monday)

3:30-4:30 pm [refreshments beginning at 3:15], 121 South Main St, Rm 245

Center for Statistical Sciences / Division of Applied Mathematics Seminar

Application of Heteroskedastic Spatial Models to Computer Experiments

Richard A. Davis

Howard Levene Professor of Statistics
Department of Statistics
Columbia University

We consider modeling a deterministic computer response as a realization from a stochastic heteroskedastic process (SHP), which incorporates a spatially-correlated volatility process into the traditional spatially-correlated Gaussian process (GP) model. Unconditionally, the SHP is a stationary non-Gaussian process, with stationary GP as a special case. Conditional on a latent process, the SHP is a non-stationary GP. The sample paths of this process offer more modeling flexibility than those produced by a traditional GP, and can better reflect prediction uncertainty. GP prediction error variances depend only on the locations of inputs, while SHP can reflect local inhomogeneities in a response surface through prediction error variances that depend on both input locations and output responses.

We use maximum likelihood for inference, which is complicated by the high dimensionality of the latent process. Accordingly, we develop an importance sampling method for likelihood computation and use a low-rank kriging approximation to reconstruct the latent process. Responses at unobserved locations can be predicted using empirical best predictors or by empirical best linear unbiased predictors. Prediction error variances are also obtained. In examples with simulated and real computer experiment data, the SHP model is superior to traditional GP models. (This is joint work with Jay Breidt, Wenying Huang, and Ke Wang.)


October 28, 2009

12-1pm, 182 George St, Rm 110

Statistical Analysis of Climate Ecosystem Dynamics

Surajit Ray

Assistant Professor
Department of Mathematics & Statistics
Boston University

Functional approaches to modeling dynamics of biological systems, trends in financial cycle, seasonal measurements of spectral bands in remote sensing, are becoming increasingly popular as a data analysis tool.Clustering and classification is often an important final objective of functional data analysis. But most model based clustering techniques rely on the assumptions of normal or T distributions and as such are not appropriate for clustering functional data, which often lie on non-linear manifolds. In this talk we will present a novel model based clustering techniques for analyzing functional data. Our method is based on the modal clustering methodology developed by Li, Ray and Lindsay (2007), but new visualization, computational and inferential techniques need to be designed especially for functional data. The talk will also focus on parallelization of modal clustering. Application of functional clustering to remote sensing data will also be discussed.


October 21, 2009

12-1pm, 182 George St, Rm 110

Recent Advances on the Geometry of the Riemannian Manifold of Landmarks

Mario Micheli

Assistant Adjunct Professor
Department of Mathematics
University of California, Los Angeles

In the past few years there has been a growing interest, in diverse scientific communities, in endowing "shape spaces" with Riemannian metrics, so to be able to measure similarities between shapes and perform statistical analysis on data sets (e.g. for object recognition, target detection and tracking, classification, and automated medical diagnostics). The geometry of such spaces has started to emerge only very recently; in this talk we will explore the sectional curvature for the Riemannian manifold of landmark points (which is one of the simplest, in that it is finite-dimensional) and discuss its effects on applications.


October 14, 2009

12-1pm, 182 George St, Rm 110

SIFT Flow: Dense Scene Alignment and Its Applications

Ce Liu

Postdoctoral Researcher
Microsoft Research New England

In this talk I will introduce dense scene alignment and how it can be applied to a number of computer vision problems varying from satellite image registration to object recognition and scene parsing. We propose SIFT flow that establishes dense, semantically meaningful correspondence between two images across scenes by matching pixel-wise SIFT features. Using SIFT flow, we develop a new framework for image parsing by transferring metadata information, such as annotation, motion and depth, from the images in a large database to an unknown query image. We demonstrate this framework using new applications such as predicting motion from a single image and motion synthesis via object transfer. Based on SIFT flow, we introduce a nonparametric scene parsing system using label transfer, with very promising experimental results suggesting that our system outperforms state-of-the-art techniques based on training classifiers.


October 7, 2009

12-1pm, 182 George St, Rm 110

The probabilistic language of thought

Noah Goodman

Research Scientist
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology

Logic and probability (aka. 'symbols and statistics', 'structured representation and graded inference') are key themes of cognitive science that have long had an uneasy coexistence. I will describe a Probabilistic Language of Thought approach that brings them together into compositional representations with probabilistic meaning. This provides a view of cognition in which mental representations describe causal "working models" of the world, that can be used for reasoning and learning by probabilistic inference. Using this framework I will investigate human concept learning, beginning with simple categorization tasks and extending to acquisition of abstract concepts. I will describe a model of learning the meanings of number words that predicts the staged progression exhibited by children (including the striking conceptual reorganization associated with the "cardinal principle" transition). I will then briefly describe how a probabilistic language of thought explains phenomena of human reasoning, including both classic syllogistic reasoning tasks and more complex reasoning involving causality and agency.


September 30, 2009

12-1pm, 182 George St, Rm 110

Learning maximum-entropy models of salience via EM

Micha Elsner

Graduate Student
Department of Computer Science
Brown University

Coreference resolution is the task of linking a nominal expression (such as a pronoun) to its antecedent in text or speech data. We propose a maximum entropy model which ranks the syntactic environments in which antecedents tend to occur. We learn this model in an unsupervised way using the Expectation/Maximization (EM) algorithm. As an initial application, we describe a consistent, probabilistic reformulation of the pronoun resolution model given in (Charniak+Elsner 09); we then extend this to a high-precision model predicting coreference between noun phrases with the same head word. This is joint work with Eugene Charniak and Mark Johnson; it is very much still in progress, so comments and suggestions are welcome.


September 23, 2009

12-1pm, 182 George St, Rm 110

Latent Socio-Spatial Process Model for Social Networks

Crystal Linkletter

Assistant Professor
Department of Community Health and Center for Statistical Sciences
Brown University

With concerns of bioterrorism, the advent of new epidemics that spread with person-to-person contact, such as SARS, and the rapid growth of on-line social networking websites, there is currently great interest in building statistical models that emulate social networks. Stochastic network models can provide insight into social interactions and increase understanding of dynamic processes that evolve through society. A major challenge in developing any stochastic social network model is the fact that social connections tend to exhibit unique inherent dependencies. For example, they tend to show a lot of clustering and transitive behavior, heuristically described as “a friend of a friend is a friend.” It might be reasonable to expect that covariate similarities, or “closeness” in social space, should somehow be related to the probability of connection for some social network data. The relationship between covariates and relations is likely to be complex, however, and may in fact be different in different regions of the covariate space. Here, we present a new socio-spatial process model that smoothes the relationship between covariates and connections in a sample network using relatively few parameters, so the probabilities of connection for a population can be inferred and likely social network structures generated. Having a predictive social network model is an important step toward the exploration of disease transmission models that depend on an underlying social network.


September 16, 2009

12-1pm, 182 George St, Rm 110

Organizational meeting & Conditional inference for non-stationary data

Matthew Harrison

Assistant Professor
Division of Applied Mathematics
Brown University

This is a new time and a new format for the pattern theory seminar, so we will begin with an organizational meeting. Bring ideas for things you'd like to see in our seminar series. With the remaining time, I will discuss a semi-parametric conditional inference perspective for modeling data that have fast (and interesting) dynamics confounded with slow, non-stationary (and not so interesting) dynamics. This is work in progress with Asohan Amarasingham and Stuart Geman.