Pattern Theory Seminars
[ back to main page ]
*To receive email reminders about upcoming pattern theory seminars, send an email to:
   LISTSERV@LISTSERV.BROWN.EDU
with the following line in the body of the email:
   SUBSCRIBE PTG-SEMINAR-L



Spring 2017


February 1, 2017
12-1pm, 182 George St, Room 110
Non-equilibrium transitions between metastable patterns in populations of motile bacteria
Eric Vanden-Eijnden
Professor
Courant Institute of Mathematical Sciences
New York University

Active materials can self-organize in many more ways than their equilibrium counterparts. For example, self-propelled particles  whose velocity decreases with their density can display motility-induced phase separation (MIPS), a phenomenon building on a  positive feedback loop in which patterns emerge in locations where the particles slow down. Here, we investigate the effects of intrinsic fluctuations in the system's dynamics on MIPS, using a field theoretic description building on results by Cates and collaborators. We show that these fluctuations can lead to transitions between metastable patterns. The pathway and rate of these transitions is analyzed within the realm of large deviation theory, and they are shown to proceed in a very different way than one would predict from arguments based on detailed-balance and microscopic reversibility.  Specifically, we show that these transitions involve fluctuations in diffiusivity of the bacteria followed by fluctuations in their population, in a specific sequence. The method of analysis proposed here, including its numerical  component, can be used to study noise-induced non-equilibrium transitions in a variety of other non-equilibrium set-ups, and leads to predictions that are verifiable experimentally.




February 8, 2017
12-1pm, 182 George St, Room 110
Deformation models for image and shape matching
Ronen Basri
Professor
Department of Computer Science and Applied Mathematics
Weizmann Institute of Science

Modeling deformations is important for various applications in computer vision, graphics and geometry processing. In this talk I will describe our recent progress in modeling deformations. In particular, I will describe methods for computing bounded distortion transformations, locally injective maps whose differentials' conformal distortion is bounded. Toward this end, we developed a convex framework for solving optimization problems over matrices that involve functionals and constraints expressed in terms of their extremal singular values. In addition, I will describe methods for computing physically-motivated elastic maps between shapes. We have applied these methods in a number of challenging problems, including feature matching between images related by non-rigid deformation, non-rigid registration of shape models, and computing extremal quasi-conformal maps.




March 8, 2017
12-1pm, 182 George St, Room 110
Tropical Coordinates on the Space of Persistence Barcodes
Sara Kalisnik Verovsek
Postdoctoral Research Associate
Department of Mathematics
Data Science Initiative
Brown University

In the last two decades applied topologists have developed numerous methods for ‘measuring’ and building combinatorial representations of the shape of the data. The most famous example of the former is persistent homology. This adaptation of classical homology assigns a barcode, i.e. a collection of intervals with endpoints on the real line, to a finite metric space. Unfortunately, barcodes are not well-adapted for use by practitioners in machine learning tasks. We can circumvent this problem by assigning numerical quantities to barcodes and these outputs can then be used as input to standard algorithms. I will talk about max-plus polynomials and tropical rational functions that can be used as coordinates on the space of barcodes. All of these are stable with respect to the standard distance functions (Bottleneck, Wasserstein) used on the barcode space. 




April 19, 2017
12-1pm, 182 George St, Room 110
Inference in Dynamical Systems
Sayan Mukherjee
Professor
Department of Statistical Science
Duke University

We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures. We also relate Bayesian inference to the thermodynamic formalism in tracking dynamical systems. We state conditions for consistency of a Gibbs distribution using classic ideas from dynamical systems such as topological entropy and pressure.



April 26, 2017
12-1pm, 182 George St, Room 110
Numerical Gaussian Processes (Physics Informed Learning Machines)
Maziar Raissi
Postdoctoral Research Associate
Department of Applied Mathematics
Brown University

We introduce the concept of numerical Gaussian processes, which we define as Gaussian processes with covariance functions resulting from temporal discretization of time-dependent partial differential equations. Numerical Gaussian processes, by construction, are designed to deal with cases where: (1) all we observe are noisy data on black-box initial conditions, and (2) we are interested in quantifying the uncertainty associated with such noisy data in our solutions to time-dependent partial differential equations. Our method circumvents the need for spatial discretization of the differential operators by proper placement of Gaussian process priors. This is an attempt to construct structured and data-efficient learning machines, which are explicitly informed by the underlying physics that possibly generated the observed data. The effectiveness of the proposed approach is demonstrated through several benchmark problems involving linear and nonlinear time-dependent operators. In all examples, we are able to recover accurate approximations of the latent solutions, and consistently propagate uncertainty, even in cases involving very long time integration.




May 3, 2017
12-1pm, 182 George St, Room 110
PCA from noisy linearly transformed measurements
Amit Singer
Professor
Department of Mathematics
Princeton University

We consider the problem of estimating the covariance of X from measurements of the form y_i = A_i*x_i + e_i (for i = 1, . . . , n) where x_i are i.i.d. unobserved samples of X, A_i are given linear operators, and e_i represent noise. Our estimator is constructed efficiently via a simple linear inversion using conjugate gradient performed after eigenvalue shrinkage motivated by the spiked model in high dimensional PCA. Applications to low-rank matrix completion, 2D image denoising, and 3D structure classification in single particle cryo-electron microscopy will be discussed.


Fall 2016


September 14, 2016
12-1pm, 182 George St, Room 110
Discovering the Nature of Nonlinear Relationships
Joshua Vogelstein
Assistant Professor
Department of Biomedical Engineering
Institute for Computational Medicine
Johns Hopkins University

As data collection is becoming easier, it is becoming increasingly difficult and important to answer questions of the form: is one property (e.g., clouds) related to another (e.g., grass wetness).  Only If we can determine that these two properties are related to---or statistically dependent on---one another does it make sense to further investigate the nature of this relationship. Unfortunately, reliably establishing such a relationship can be  challenging, especially when the properties themselves are complex and the relationship is nonlinear. We here describe a procedure, called Multiscale Generalized Correlation (MGC), that addresses these challenges. Our key insight is that if two properties are related, comparisons between measurements of similar pairs of the first property (e.g., similarly shaped clouds) should be correlated with the comparisons between corresponding pairs of the second property (grass wetness under those clouds). We demonstrate the statistical and computational efficiency of MGC in both simulations and theory.  We then apply it to detect the presence and nature of the relationships between brain activity and personality, brain shape and disorder, and  brain connectivity and creativity.  Finally, we demonstrate that MGC does not suffer from the false positives that have plagued conventional dependence tests. Our open source implementation of MGC is applicable to fundamental questions confronting science, government, finance, and many other disciplines.




September 21, 2016
12-1pm, 182 George St, Room 110
Structure-based comparisons for sequential data
Katherine Kinnaird
Postdoctoral Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University

We present aligned hierarchies, a low-dimensional representation for sequential data streams. The aligned hierarchies encode all hierarchical decompositions of repeated elements from a high-dimensional and noisy sequential data stream in one object. These aligned hierarchies can be embedded into a classification space with a natural notion of distance. We motivate our discussion through the lens of Music Information Retrieval (MIR), constructing aligned hierarchies by finding, encoding, and synthesizing all repeated structure present in a song. For a data set of digitized scores, we conducted experiments addressing the fingerprint task, a song comparison task in MIR, that achieved perfect precision-recall values and provide a proof of concept for the aligned hierarchies.

We also introduce aligned sub-hierarchies and aligned sub-decompositions. Both derived from the aligned hierarchies, these structure based representations for songs can be embedded into classification spaces and can address additional MIR tasks. We will compare properties of the aligned hierarchies, aligned sub-hierarchies, and the aligned sub-decompositions.




September 28, 2016
12-1pm, 182 George St, Room 110
Title: New Perspectives on Importance Sampling
Daniel Sanz Alonso
Postdoctoral Research Associate
Division of Applied Mathematics
Data Science Initiative
Brown University

Importance sampling is a building block of many algorithms in computational statistics, perhaps most notably particle filters. It is the importance sampling step that often limits the accuracy of these algorithms. In this talk I will introduce a new way of understanding importance sampling based on information theory. I will argue that the fundamental problem facing algorithms based on importance sampling can be understood in terms of the distance between certain measures. The results give new understanding on the potential use of importance sampling and particle filters in high (possibly infinite) dimensional spaces.




October 5, 2016
12-1pm, 182 George St, Room 110
Robust Bayesian inference via coarsening
Jeffrey Miller
Assistant Professor
Department of Biostatistics
Harvard University

The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class.  However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure, particularly when the data set is large.  We introduce a simple, coherent approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the observed data exactly, one conditions on the event that the model generates data close to the observed data, with respect to a given statistical distance.  When closeness is defined in terms of relative entropy, the resulting "coarsened posterior" can be approximated by simply raising the likelihood to a certain fractional power, making the method computationally efficient and easy to implement in practice.  We illustrate with real and simulated data, and provide theoretical results.



November 2, 2016
12-1pm, 182 George St, Room 110
Robust Bayesian inference via coarsening
Tamara Broderick
Assistant Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practice---at least in part due to the difficulty of calculating robustness measures from MCMC draws. We demonstrate that, by contrast to MCMC, variational Bayes (VB) techniques are readily amenable to robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for mean-field variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a well-known major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVB---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).



November 9, 2016
12-1pm, 182 George St, Room 110
Joint Scientific Computing and Pattern Theory Seminar
Solution uncertainty quantification for differential equations
Oksana Chkrebtii
Assistant Professor
Department of Statistics
Ohio State University

When models are defined implicitly by systems of differential equations without a closed form solution, small local errors in finite-dimensional solution approximations can propagate into large deviations from the true underlying state trajectory. Inference for such models relies on a likelihood approximation constructed around a numerical solution, which underestimates posterior uncertainty. This talk will introduce and discuss progress in a new formalism for modeling and propagating discretization uncertainty through the Bayesian inferential framework, allowing exact inference and uncertainty quantification for discretized differential equation models.



November 16, 2016
12-1pm, 182 George St, Room 110
Single-Channel Multi-Speaker Separation using Deep Clustering
John Hershey
Senior Principal Research Scientist
Mitsubishi Electric Research Laboratories (MERL)
Cambridge, Massachusetts

The human auditory system gives us the extraordinary ability to converse in the midst of a noisy throng of party goers. Solving this so-called cocktail party problem has proven extremely challenging for computers, and separating and recognizing speech in such conditions has been the holy grail of speech processing for more than 50 years. Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering, producing unprecedented speaker-independent single-channel separation performance on two-speaker and three-speaker mixtures.  In this framework, a neural network is trained to assign an embedding vector to each element of a multi-dimensional signal, such that clustering the embeddings yields a desired segmentation of the signal.  In the cocktail-party problem, the embeddings are assigned to each time-frequency (TF) index of the short-time Fourier transform (STFT) of the mixture of speech signals. Clustering these embeddings yields an assignment of each TF bin to one of the inferred sources. These assignments are used as a masking function to extract the dominant parts of each source.   This method has produced remarkable performance, reaching levels of improvement -- over 10 dB SNR gain -- that were previously unobtainable even in simpler speech enhancement tasks.  Extensions to the model allow for end-to-end training to optimize the whole architecture for best signal quality.  We also evaluate our method using automatic speech recognition (ASR), and show that it can reduce the word error rate (WER) from 89.1% down to 30.8%, representing important progress towards solving the cocktail party problem.




November 30, 2016
12-1pm, 182 George St, Room 110
How high-order image statistics shape cortical visual processing
Jonathan Victor
Professor

Brain and Mind Research Institute
Department of Neurology
Cornell University

Several decades of work have suggested that Barlow's principle of efficient coding is a powerful framework for understanding retinal  design principles. Whether a similar notion extends to cortical visual processing is less clear, as there is no "bottleneck" comparable to the optic nerve, and much redundancy has already been removed.  Here, we present convergent psychophysical  and physiological evidence that regularities of high-order image  statistics are indeed exploited by central visual processing, and at a surprising level of detail.

The starting point is a study of natural image statistics (Tkacic et al., 2010),  in which we showed that high-order correlations in certain specific spatial  configurations are informative, while high-order correlations in  other spatial configurations are not:  they can be accurately guessed from lower-order ones.  We then construct artificial images (visual textures) composed either of informative or uninformative  correlations. We find that informative high-order correlations are visually salient, while the uninformative correlations are nearly imperceptible. Physiological studies in macaque visual cortex  identify the locus of the underlying computations.  First, neuronal responses in macaque V1 and V2 mirror the psychophysical findings, in that many neurons respond differentially to the informative statistics, while few respond to the uninformative ones. Moreover, the differential responses largely arise in the supragranular layers, indicating that the computations are the  result of intracortical processing.

We then consider low- and high-order local image statistics together, and apply a dimension-reduction (binarization) to cast them into a 10-dimensional space. We determine the perceptual  isodiscrimination surfaces within this space.  These are well-approximated by ellipsoids, and the principal axes of the ellipsoids correspond to the distribution of the local statistics in natural images.   Interestingly, this correspondence differs in specific ways from the predictions of a model that implements efficient coding in an unrestricted manner.  These deviations provide insights into the strategies that underlie the representation of image statistics..


Spring 2016


February 3, 2016
12-1pm, 182 George St, Room 110
Constructing representations using Bayesian nonparametrics and connections between human knowledge, optimal foraging, and random walks on graphs
Joseph Austerweil
Assistant Professor
Department of Cognitive, Linguistic, and Psychological Sciences
Brown University

This talk has two parts (how much detail we go into each will be guided by audience interest).

In the first part of the talk, I outline a statistical framework for constructing representations in a human-like manner. People still outperform computational methods on a diverse set of problems, ranging from language learning to recognizing objects in a scene. To understand human success on these problems, cognitive scientists appeal to representation as a key explanatory device. From a statistical perspective, representations can be interpreted as structure within a probability distribution, which provide biases to prevent overfitting. Puzzlingly, people are remarkably flexible, changing their representation of the same data depending a number of different factors. I will describe a statistical framework for this intriguing phenomenon, which uses Bayesian nonparametric processes.

In the second part of the talk, I describe recent work relating how optimal animals forage for food in a patchy environment to how people search semantic memory. I will discuss how representing semantic memory as a graph enables random walks to mimic optimal foraging. I will conclude with some recent (and ongoing) work that exploits this relation to perform Bayesian inference for a graph from the first hitting times of multiple walks on that graph.




February 17, 2016
12-1pm, 182 George St, Room 110
New applications and algorithms for submodular probabilistic models
Stefanie Jegelka
Assistant Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Many real-world inference problems are, at their core, subset selection problems. Probabilistic models for such scenarios rely on having sufficiently accurate yet tractable distributions over discrete sets. We focus on sub-families of such distributions whose special mathematical properties are the basis for fast algorithms. As a specific example, Determinantal Point Processes (DPPs) have recently become popular in machine learning, as elegant and tractable probabilistic models of diversity. We explore new applications of DPPs for variational inference over combinatorial objects, such as coupled cascades in a collection of networks, where we are able to leverage combinatorial and convex structure in the problem.
While sampling from DPPs is possible in polynomial time, the associated algorithm is not practical for large data. In the second part of the talk, I will outline ideas for faster sampling that build on new insights for algorithms that compute bilinear inverse forms. These results have applications beyond DPPs, including sensing with Gaussian Processes and submodular maximization.

This is joint work with Chengtao Li, Josip Djolonga, Suvrit Sra and Andreas Krause.




March 2, 2016
12-1pm, 182 George St, Room 110
The shape space defined by the Gromov-Wasserstein distance
Facundo Memoli
Assistant Professor
Department of Mathematics and Department of Computer Science and Engineering The Ohio State University

The Gromov-Wasserstein distance --a variant of the Gromov-Hausdorff distance based on ideas from mass transport-- provides an intrinsic metric on the collection of all metric measure spaces. I will give an overview of its construction, main properties, lower bounds, and their computation.



March 16, 2016
12-1pm, 182 George St, Room 110
A simple network model for a variety of Delay Match to Sample tasks
Yali Amit
Professor
Department of Statistics and Department of Computer Science
University of Chicago

Delay match to sample experiments (DMS) have inspired much of the modeling work on attractor neural networks. The basic experiment involves showing a target image, removing it, and after a delay showing a cue image: either the original image or a different one. The monkey needs to indicate if the cue is the same or different than the target. Electrophysiological recordings have shown that if the target is a learned one (has been observed multiple times) neurons selective for it maintain activity during the delay between target and cue presentation. This persistent activity is hypothesized to represent 'working' or 'short term' memory. The attractor network model posits that stimulation with learned patterns leads to sustained activity due to the learned recurrent synaptic connections in the network. There are a number of variations on the basic DMS paradigm involving distractors in between target and cue, or repetition detection experiments where a sequence of images is shown and one of them chosen at random is repeated. These different experiments raise several questions that are rarely addressed in the literature. With distractors, how does the network ensure the target sample pattern stays in working memory? How does the network figure out whether the cue matches the target? How does the network avoid distractor repetitions?  Additional interesting phenomena have been observed in more recent experiments on repetition detection, including better performance with novel patterns than with learned patterns.

I will present a parsimonious network model of binary neurons and binary synapses and show how all these phenomena can be handled within this framework, using simple adjustments of certain global parameters such as inhibition, noise level and depression rate. Recent experiments also show that average responses to novel stimuli in IT are higher than to learned stimuli. I will show what adjustment needs to be made to accommodate this phenomenon. Time permitting I will discuss how these issues can be reconciled in a two layer network that can also perform classification on real data.




March 23, 2016
12-1pm, 182 George St, Room 110
The Dynamics of the Unconscious Brain Under General Anesthesia
Emery Brown
Warren M. Zapol Professor of Anaesthesia
Massachusetts General Hospital and Harvard Medical School
Edward Hood Taplin Professor of Medical Engineering & Computational Neuroscience
Massachusetts Institute of Technology 

General anesthesia is a drug-induced, reversible condition comprised of five behavioral states: unconsciousness, amnesia (loss of memory), analgesia (loss of pain sensation), akinesia (immobility), and hemodynamic stability with control of the stress response. Our work shows that a primary mechanism through which anesthetics create these altered states of arousal is by initiating and maintaining highly structured oscillations. We present findings from our human studies of general anesthesia using high-density EEG recordings and intracranial recordings which have allowed us to give a detailed characterization of the neurophysiology of loss and recovery of consciousness due to propofol. We show how these dynamics change with other anesthetics and with age. We present a neuro-metabolic model of burst suppression, the profound state of brain inactivation seen in deep states of general anesthesia. We use our characterization of burst suppression to implement a closed-loop anesthesia delivery system for control of a medically-induced coma. Finally, we demonstrate that the state of general anesthesia can be rapidly reversed by activating specific brain circuits. The success of our research has depended critically on tight coupling of experiments, signal processing research and mathematical modeling.



April 20, 2016
12-1pm, 182 George St, Room 110
Learning Causal Graphical Models of Large-Scale Systems
David Jensen
Professor of Computer Science
Director of the Knowledge Discovery Laboratory
University of Massachusetts Amherst 

Effective methods for inferring causal dependence from observational data have been developed within both computer science and quantitative social science. Methods in computer science have focused on the correspondence between casual graphical models and observed patterns of statistical association. Methods in social science have focused on templates for causal inference often called quasi-experimental designs, including instrumental variables, propensity scores matching, regression discontinuinty designs, and interrupted time-series designs. Recent work has begun to unify these methods using the framework of causal graphical models, but barriers remain because the formal framework of causal graphical models has been insufficiently expressive to represent some quasi-experimental designs. In this talk, I will introduce many of the known experimental and quasi-experimental designs in the language of directed graphical models, and I will present very recent work on defining additional designs in terms of classes of graphical models that can represent relational and temporal dependence. Finally, I will present two novel designs that have resulted from our work on causal inference in relational data.



May 18, 2016
12-1pm, 182 George St, Room 110
Learning Lattice Operators used in Computer Vision Tasks
Junior Barrera
Professor of Computer Science
University of Sao Paulo, Brazil

The transformation of images by lattice operators has some interesting properties: they keep in the discrete and quantized image domain; theoretical results guarantee that any transformation can be implemented by a lattice operator; lattice operators can be implemented through simple elementary lattice operators; lattice operators can be learned through discrete machine learning techniques. In this lecture, we present the general algebraic representation of lattice operator and its application in the learning of these operators. Several approaches for efficient learning (i.e., from a relatively small sample in a reasonable processing time) of complex operators are presented. Between them is the U-curve problem, which is a new combinatorial optimization problem useful for approaching the classical feature selection problem. Another remarkable result is the multi-resolution learning, witch, for a fixed sample, avoids over fitting and gives decreasing estimated errors for an increasing number of parameters. The techniques are illustrated by the learning of several kinds of binary and gray scale image transformations: noise reduction; interpolation; enhancement; segmentation; texture recognition; shape recognition; etc.  All the results presented refer to already published papers, but some open problems are pointed and discussed.


Fall 2015


September 9, 2015
12-1pm, 182 George St, Room 110
Random matrix theory based analysis of the correlation structure of protein sequences
Lucy Colwell
Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK

Extracting interesting biological information from protein sequences is a grand challenge, with the pace of sequencing increasing by the day. There are many problems in biology where correlations between protein sequences could lead to important information and new tools; my research program aims to find new ways of using these correlations to make biological discoveries. Recently it has been shown by us and others that the correlation structure of large protein sequence alignments contains sufficient information to accurately predict protein 3D structure using approximate graphical modeling and probabilistic inference. The methodology requires inferring a large number of parameter values from a dataset that is highly under-sampled, including samples with varying degrees of correlation to one another, violating the assumptions made by the inference methods. For this approach to be useful more generally -- both for protein structure prediction and for solving other biological problems -- it is essential that we understand how prediction quality depends on the input data available for the protein family of interest. This is because those families for which no homologous crystal structure has been solved typically contain few sequences.

In this talk we first briefly compare the different inference procedures on both exact data simulated from models, and protein sequence data. We then identify a phenomena for protein sequence data in which iterating the model inference procedure allows the inferred model to converge to a more accurate solution, and show how this can be used to accurately predict interacting protein pairs from sequence data alone. This convergence suggests that there is scope to improve the accuracy of model inference, for example by 'cleaning' the sample correlation matrix. The standard random matrix theory approach identifies those few eigenvalues associated with a low rank signal by comparing the spectrum of the sample covariance matrix to the Marcenko-Pastur distribution. In the context of protein sequence data the extensive correlations between samples mean that this approach will not work, but if these correlations are known (via a phylogeny) I will show how the Marcenko Pastur equation can be solved to yield the expected empirical spectral distribution caused by phylogeny alone.




September 23, 2015
12-1pm, 182 George St, Room 110
Continuum limit of total variation on point clouds
Nicolas Garcia-Trillos
Prager Assistant Professor
Division of Applied Mathematics
Brown University

We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect.  We study when is the cut capacity, and more generally total variation, on these graphs a good approximation of the perimeter (total variation) in the continuum setting. We address this question in the setting of $\Gamma$-convergence. Applications to the study of consistency of cut based clustering procedures will be discussed.



October 7, 2015
12-1pm, 182 George St, Room 110
Local Shape from Shading with a Generic Constraint
Ben Kunsberg
Prager Assistant Professor
Division of Applied Mathematics
Brown University

Humans have a remarkable ability to infer shape from shading (SFS) information. In computer vision this is often formulated with a Lambertian reflectance function, but it remains under-posed and incompletely solved.  Abstractly, the intensity in an image is a single valued function and the goal is to uncover the vector valued normal function. This ill-posedness has resulted in many proposed techniques that are either regularizations or propagations from known values. Our goal is to understand, mathematically and computationally, how we solve this problem.

First, it has been shown psychophysically that our perception (via gauge figure estimates) is remarkably accurate even when the boundary is masked.  Thus classical propagating approaches requiring a known values along a boundary, such as that of characteristic curves or fast marching methods,  are unlikely to model the visual system's solution.

An alternative approach requires regularization priors (in a Bayesian framework) or energy terms (in a variational framework).  However, many of the proposed priors are ad-hoc and chosen by researchers to optimize performance for a particular test dataset.  It is hard to conclude (from solely performance metrics) whether these priors are useful or accurate, e.g.  good results are functions of these priors, resolution, the optimization techniques, the test set, and so on.

In this talk, we describe a different approach. We consider the SFS problem on image patches modeled as Taylor polynomials of any order and seek to recover a solution for that patch.  We build a boot-strapping tensor framework that allows us to relate a smooth image patch to all of the polynomial surface solutions (under any light source).  We then use a generic constraint on the light source to restrict these solutions to a 2-D subspace, plus an unknown rotation matrix.  We then investigate several special cases where the ambiguity reduces and the solution can be anchored. Interestingly, these anchor solutions relate to those situations in which human performance is also veridical.




October 14, 2015
12-1pm, 182 George St, Room 110
Importance sampling in large-scale machine learning problems: why it works and how it can help
Rachel Ward
Assistant Professor
Department of Mathematics
University of Texas at Austin
 
A recent trend in signal processing and machine learning research is that exact reconstruction is achievable from highly subsampled data by passing to nonlinear, sparsity-inducing, reconstruction methods such as l1 minimization.   Such guarantees often require strong structural conditions on the data in addition to sparsity, such as incoherence, which render the theory unusable on problems of practical importance.  Here, we show that many of these strong assumptions are tied to i.i.d uniform sampling, and can be dropped by allowing weighted, or importance sampling.  First, we explain why importance sampling works in this context: it aims to make the inverse problem as well-conditioned as possible given a fixed sample budget.  We then discuss several problem domains where importance sampling strategies can be derived explicitly, and outperform state-of-the-art sampling strategies used in practice: medical imaging, collaborative filtering, uncertainty quantification, and stochastic gradient methods.   Along the way, we derive results at the intersection of applied harmonic analysis and random matrix theory that are of independent interest.



October 28, 2015
12-1pm, 182 George St, Room 110
A stochastic model for Gompertzian growth
Benar Svaiter
Professor
IMPA, Brazil

The Gompertz curve has been successfully used for modeling mortality of an aging population since its creation in the 1820's. In the 1960's a time reversed version of this curve was used by Laird to model tumoral and somatic growth. Since them, this is the standard model for tumoral and somatic growth and has been used also for modeling regeneration. Why is it that a model for mortality has been so apt to model growth? We propose a stochastic model for cell divisions based on recent findings on DNA structure and show that Gompertzian growth is the thermodynamic limit of this model after a suitable renormalization.



November 18, 2015
12-1pm, 182 George St, Room 110
Big data, Google and disease detection: the statistical story
Samuel Kou
Professor
Department of Statistics
Harvard University

Big data collected from the internet have generated significant interest in not only the academic community but also industry and government agencies. They bring great potential in tracking and predicting massive social activities. We focus on tracking disease epidemics in this talk. We will discuss the applications, in particular Google Flu Trends, some of the fallacy and the statistical implications. We will propose a new model that utilizes publicly available online data to estimate disease epidemics. Our model outperforms all previous real-time tracking models for influenza epidemics at the national level of the US. We will also draw some lessons for big data applications.


Spring 2015


February 4, 2015
12-1pm, 182 George St, Room 110
Computational reconstruction and modeling of multicellular dynamics from 3D+time in vivo imaging of animal early embryogenesis. Extension to Artificial Life.
Rene Doursat
Research Scientist and Lecturer

(a) Complex Systems Institute, Paris Ile-de-France (ISC-PIF), Paris, France
(b) BioEmergences Lab, CNRS (USR 3695), Gif-sur-Yvette (Paris), France
(c) Erasmus Mundus Master's Program in Complex Systems Science, Ecole Polytechnique, Palaiseau (Paris), France

The BioEmergences platform [1,2], designed at Nadine Peyrieras's lab (b), provides automated analysis and reconstruction of collective cell movements based on time-lapse microscopy of organism development. It offers biologists advanced software tools capable of handling large amounts of 4D data through a workflow of image processing algorithms. Raw voxel-based movies of embryos (zebrafish, sea urchin), containing fluorescent proteins to highlight cell membranes and nuclei, are filtered by an edge-preserving smoothing method, then cell positions are extracted from the local maxima of these images and passed on to shape segmentation and cell-tracking modules. The output is a cell lineage tree annotated with various quantitative measurements. In parallel to this data reconstruction effort, the MecaGen platform [3], directed by myself, supports agent-based modeling and simulation of morphogenesis. Centered on the physicochemical coupling of cell mechanics with gene expression and molecular signaling, embryonic development is viewed here a self-organized phenomenon emerging from a myriad of cells via their genetically regulated and regulating behavior. Cells' mechanical properties (division, adhesion, motility) are closely correlated with their spatial location and temporal state of genetic and molecular dynamics (protein and ligand concentrations), and affect each other concurrently. MecaGen is illustrated on morphogenetic episodes occuring during zebrafish and sea urchin early development. Exploration of parameter space is supported by experimental data from BioEmergences, which allows measuring the "fitness" of the virtual embryo and validating hypotheses. Finally, I will also briefly introduce Morphogenetic Engineering [4], an "Artificial Life" spinoff field that I founded, which takes its inspiration from biological development to create programmable and reproducible robotic, software or network architectures by decentralized self-assembly of elementary agents [5].

Selected References
[1] http://www2.die.upm.es/im/papers/Science2010.pdf
[2] http://doursat.free.fr/docs/Castro_et_al_2014_atlas_PLoS.pdf
[3] http://doursat.free.fr/docs/Delile_Doursat_Peyrieras_2013_MecaGen_chapter.pdf
[4] http://doursat.free.fr/docs/Doursat_Sayama_Michel_2013_MorphEng_NACO.pdf
[5] http://doursat.free.fr/docs/Doursat_Sanchez_2014_mapdevo_SoRo.pdf .



February 25, 2015
12-1pm, 182 George St, Room 110
Scaling and Generalizing Variational Inference
David Blei
Professor of Statistics and Computer Science
Columbia University

Latent variable models have become a key tool for the modern statistician, letting us express complex assumptions about the hidden structures that underlie our data. Latent variable models have been successfully applied in numerous fields including natural language processing, computer vision, electronic medical records, genetics, neuroscience, astronomy, political science, sociology, the digital humanities, and many others.

The central computational problem in latent variable modeling is posterior inference, the problem of approximating the conditional distribution of the latent variables given the observations. Posterior inference is central to both exploratory tasks, where we investigate hidden structures that underlie our data, and predictive tasks, where we use the inferred structures to generalize about future data. Approximate posterior inference algorithms have revolutionized Bayesian statistics, revealing its potential as a usable and general-purpose language for data analysis.

Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing approximate inference algorithms do not scale well. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand. This often requires significant model-specific analysis, which precludes us from easily exploring a variety of models.

In this talk I will discuss our recent research on addressing these two limitations. First I will describe stochastic variational inference, an approximate inference algorithm for handling massive data sets. Stochastic inference is easily applied to a large class of Bayesian models, including time-series models, factor models, and Bayesian nonparametric models. I will demonstrate its application to probabilistic topic models of text conditioned on millions of articles. Stochastic inference opens the door to scalable Bayesian computation for modern data analysis.

Then I will discuss black box variational inference. Black box inference is a generic algorithm for approximating the posterior. We can easily apply it to many models with little model-specific derivation and few restrictions on their properties. Black box inference performs better than similarly generic sampling algorithms, such as Metropolis-Hastings inside Gibbs, and can be composed with stochastic inference to handle massive data. I will demonstrate its use on a suite of nonconjugate models of longitudinal healthcare data.

This is joint work based on these two papers:

M. Hoffman, D. Blei, J. Paisley, and C. Wang. Stochastic variational inference. Journal of Machine Learning Research, 14:1303-1347.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf

R. Ranganath, S. Gerrish, and D. Blei. Black box variational inference. Artificial Intelligence and Statistics, 2014.
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiWangPaisley2013.pdf




March 4, 2015
12-1pm, 182 George St, Room 110
From Pixels to Local Layers: Exploring Flexible Representations for Motion Estimation
Deqing Sun
Postdoctoral Research Fellow in Computer Science
Harvard University

We live in a dynamic world where motion is ubiquitous. To exist in such an environment, robots and other intelligent agents should have the ability to perceive and understand motion. Estimating image motion and segmenting the scenes into coherently moving regions are two closely related problems but are often treated separately. Motion actually provides an important cue to identify surfaces in a scene, while segmentation may provide the proper support for motion estimation. Despite decades of research efforts, current methods still tend to produce large errors near motion boundaries and in occlusion regions and falsely merge foreground objects with the background.

I will start from a probabilistic layered model for joint motion estimation and segmentation. We order each moving object (layer) in depth and explicitly model the occlusions between layers. We model the segmentation using thresholded spatio-temporally coherent support functions and the motion using globally coherently but locally flexible priors. Our model enforces that scene structures (segmentation), instead of motion, should persist over time. Our method achieves promising results on both the Middlebury optical flow benchmark and the MIT layer segmentation dataset, particularly in occlusion regions.

Noting that "global" layered models cannot capture mutual or self-occlusions or deal with too many layers, I will introduce a local layering representation that breaks the scenes into local layers and jointly models the motion and occlusion relationship between local layers. By retaining uncertainty on both the motion and the occlusion relationship, we can avoid local minima common to motion-only or occlusion-only approaches. Our method can handle motion and occlusion well for both challenging synthetic and real sequences.

Finally, I will show an application of motion estimation to interactive intrinsic video editing. We introduce a fast and temporally consistent algorithm to decompose video sequences into their reflectance and illumination components. One key observation is that reflectance is an intrinsic property of physical surfaces and tend to persist over time, while lighting may vary. The temporally consistent decomposition results allow illumination-aware video editing, such as retexturing and lighting-aware compositing.

Joint work with Michael J. Black, Erik B. Sudderth, Ce Liu, Hanspeter Pfister, Nicolas Bonneel, Kalyan Sunkavalli, James Tompkin, and Sylvain Paris




March 18, 2015
12-1pm, 182 George St, Room 110
Using evolutionary sequence variation to make inferences about protein structure and function
Lucy Colwell
Lecturer in Molecular Informatics
Department of Chemistry
University of Cambridge, UK

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using the natural variation present in homologous protein sequences to infer these constraints and thus identify residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the correlation structure of the data.

The challenge is to distinguish true interactions from the noisy and under-sampled set of observed correlations in a large multiple sequence alignment. To address this we build a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair interactions. We translate these interactions into pairwise distance constraints between amino acids and use them to generate all atom structural models. Using proteins of known structure we show that correlations between amino acids at different sites in a protein contain sufficient information to predict low resolution tertiary protein structure of both globular and transmembrane proteins. We then apply our method to predict de novo the structure of 11 medically important transmembrane proteins of unknown structure. In addition we are able to predict protein quaternary structure and alternative conformations. The next step requires development of a theoretical inference framework that enables the relationship between the amount of available input data and the reliability of structural predictions to be better understood.




April 1, 2015
12-1pm, 182 George St, Room 110
Adaptive Bayesian Estimation of Conditional Densities
Andriy Norets
Associate Professor
Department of Economics
Brown University

Nonparametric estimation of conditional distributions is important in empirical work across many fields. The Bayesian approach to this problem has several attractive properties. It does not require fixing a bandwidth or similar tuning parameters. Instead, it provides estimates of the objects of interest where the tuning parameters are averaged out with respect to their posterior distribution. Also, the Bayesian approach performs well in out-of-sample prediction and Monte Carlo exercises. In this talk, I will discuss theoretical properties of Bayesian nonparametric models and provide an explanation for their excellent performance in applications.

I will focus on mixtures of Gaussian densities with covariate dependent mixing weights and a variable number of mixture components for which a prior on positive integers is specified. Conditional on the number of mixture components, the mixing weights are modelled by a multinomial logit with a common scale parameter. This model is closely related to mixture-of-experts also known as smooth mixtures in econometrics. The main theoretical result of the talk is that the posterior in this model contracts at a minimax rate up to a logarithmic factor. The assumed prior distribution does not depend on the smoothness level of the true conditional density. Thus, the obtained posterior contraction rate is adaptive across all smoothness levels.

I will also briefly discuss some applications and estimation methods for nonparametric Bayesian modelling of conditional densities.

Background papers:
1. "Adaptive Bayesian Estimation of Conditional Densities" by Norets and Pati,
http://arxiv.org/pdf/1408.5355.pdf
2. "Posterior Consistency in Conditional Density Estimation by Covariate Dependent Mixtures" by Norets and Pelenis, Econometric Theory, 2014
http://www.econ.brown.edu/fac/Andriy_Norets/papers/consmixreg.pdf




May 6, 2015
12-1pm, 182 George St, Room 110
Two Recent Information Theoretic Variations on the Theme of Patterns in Security
Muriel Medard
Cecil H. Green Professor
Electrical Engineering and Computer Science Department
Massachusetts Institute of Technology

We overview two different sets of results based upon the effect of patterns in security. In the first part, we consider limits of inference, a problem that emerges when we seek to ascertain what limits to privacy we can expect when machine learning algorithms, whose theoretical basis often relies on principal inertia components, are applied to mining publicly available data that may be related, in loosely known ways, to private data. Lower bounds for the average probability of error of estimating a hidden variable X given an observation of a correlated random variable Y, and Fano’s inequality in particular, play a central role in information theory. We present a lower bound for the average estimation error based on the marginal distribution of X and the principal inertias of the joint distribution matrix of X and Y, providing thus limits to privacy. Furthermore, we investigate how to answer a fundamental question in inference and privacy: given an observation Y, can we estimate a function f(X) of the hidden random variable X with an average error below a certain threshold? We provide a general method for answering this question using an approach based on rate-distortion theory.

In the second part, we consider recent results on guesswork, the characterization of the process sequences such as passwords. We note that, what may appear as being even slight differences in distributions of these sequences may lead to differences that are exponential in guesswork, leading to possibly surprising results, such as the failure of the oft-assumed uniformity of compressed sources, and the fact that friendly jamming of an intended user may be advantageous.  We conclude with our recently defined notion of inscrutability rate, used to quantify the asymptotic difficulty of guessing U out of V secret strings. Unexpectedly, the inscrutability rate of any finite-order Markov string-source with hidden statistics remains the same as the unhidden case, i.e., the asymptotic value of hiding the statistics per each symbol is vanishing.

Joint work with Ahmad Beirami, Robert Calderbank, Mark Christiansen, Ken Duffy, Flavio du Pin Calmon, Stefano Tessaro, Mayank Varia


Fall 2014


October 1, 2014
12-1pm, 182 George St, Room 110
Statistical mechanics for real biological networks
William Bialek
Professor
Department of Physics
Princeton University

It is an old dream that ideas from statistical physics could help us understand the phenomena that emerge from biological networks, be they networks of genes, networks of neurons, or networks of organisms.   In recent years, it has become possible to make increasingly accurate, simultaneous measurements on the states of (almost) all the nodes in such networks.  I'll discuss the efforts that my colleagues and I are making to connect these data to statistical physics models.  The key idea is the (quite old) maximum entropy principle:  we try to build models that are consistent with some measured properties of the network (e.g., the correlations among the states of pairs of elements) but otherwise have as little structure as possible.  I will use the example of a flock of birds to explain how this works, and to explain our surprise that it works so well.   Statistical mechanics teaches us that, as systems become large, the parameter space breaks up into phases, and this also is  true for families of maximum entropy models.    Thus, we can ask where real networks are in the phase diagram of possible networks.  For the flock, we'll see that the system is poised very close to a critical surface.  We can go though a similar (but much more complex) analysis for a network of neurons in the vertebrate retina, and surprisingly we find the same answer - the system seems to be close to criticality, and we can detect hints of criticality in other systems as well.  It seems that we are either seeing signs of something general, or we are fooling ourselves, and I'll outline a path to telling the difference.



October 8, 2014
12-1pm, 182 George St, Room 110
A Simple Pattern-Theoretic Representation
Oren Freifeld
Postdoctoral Associate
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

Reasoning over real-world signals consists of three conceptual steps: representation, modeling and inference. Here, we focus on a specific choice of representation, but one that is strongly motivated by modeling and inference considerations.

The transformational approach, a cornerstone in Ulf Grenander's Pattern Theory, replaces the representation of objects with the representation of transformations acting on the objects. Particularly, diffeomorphisms, a special class of transformations, often play a pivotal role. Typically, the choice of a diffeomorphism space involves a trade-off: such spaces range from ones that are simple (that is a good thing) but of limited expressiveness, to those that are highly-expressive (i.e., are able to capture a wide range of patterns) but are complicated and computationally-demanding. Thus, despite their mathematical elegance and sophistication, the applicability of the latter has been somewhat limited, especially for large data sets or when time is of the essence. Moreover, owing to their complexity, the application of modern approaches to modeling (e.g., Bayesian nonparametric models) and inference (e.g., MCMC) to highly-expressive spaces of diffeomorphisms presents significant mathematical and algorithmic challenges. This is typified by the case of analysis-by-synthesis methods, wherein multiple evaluations of likelihood functions (or cost functions in the deterministic setting) rely on computing multiple transformations.

In this talk I will present our recent, work-in-progress, efforts to facilitate tractable modeling and inference for such spaces. We propose a novel space of transformations which is not only highly-expressive but also (embarrassingly-) simple. Moreover, it supports highly-accurate and fast computations of transformations thus enabling rapid likelihood evaluations. Finally, the representation also lends itself to multi-scale modeling and easy constructions and evaluations of priors.  While the representation has certain interesting differential-geometric aspects, the talk will focus on the practical issues of implementation (requiring only basic ideas from linear algebra and first-order differential equations) and the implications for modeling and inference.  Preliminary results demonstrate the applicability of the proposed approach in computer-vision tasks such as image warping or registration, and machine-learning tasks such as monotonic regression, time-warp analysis, and modeling CDF- or histogram-valued data.

This is joint work with Soren Hauberg, Julian Straub, Kayhan Batmanghelich, and John Fisher.




October 22, 2014
12-1pm, 182 George St, Room 110
Algorithms for Interpretable Machine Learning
Cynthia Rudin
Associate Professor of Statistics
Sloan School of Management and
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

Possibly *the* most important obstacle in the deployment of predictive models is the fact that humans simply do not trust them. If it is known exactly which variables were important for the prediction and how they were combined, this information can be very powerful in helping to convince people to believe (or not believe) the prediction and make the right decision. In this talk I will discuss algorithms for making these non-black box predictions including:

1) "Bayesian Rule Lists" - This algorithm builds a decision list using a probabilistic model over permutations of IF-THEN rules. It competes with the CART algorithm.

2) "Supersparse Linear Integer Models" - This algorithm produces scoring systems, which are a type of model that is widely used in the medical community. It proposes an alternative to the Lasso method.

I will show applications to healthcare, including an alternative to the CHADS_2 score, which is one of the most widely used scoring systems in medicine. Our model was trained on over 1000 times the amount of data as CHADS_2 and is more accurate, but is just as interpretable. I will also show preliminary joint work with the MGH Sleep Lab on diagnosing forms of sleep apnea.

At the end of the talk, I will discuss random cool applications of machine learning being worked on in the Prediction Analysis Lab.

links:

Building Interpretable Classifiers with Rules using Bayesian Analysis
http://web.mit.edu/rudin/www/LethamRuMcMa14.pdf
This is joint work with my student Ben Letham and colleagues Tyler McCormick and David Madigan.

Method and Models for Interpretable Linear Classification
http://arxiv.org/abs/1405.4047
This is joint work with my student Berk Ustun.

Other links are on my webpage under "Interpretable Predictive Models"
http://web.mit.edu/rudin/www/PredicticsPapers.html



November 5, 2014
12-1pm, 182 George St, Room 110
Cross-Study Validation versus Randomized Cross-Validation as Data Accumulate
Lo-Bin Chang
Visiting Assistant Professor
Department of Applied Mathematics and Statistics & The Center for Imaging Science
Johns Hopkins University

In recent years "reproducibility" has emerged as a key factor in evaluating predictors of disease phenotypes. In particular, "validation" is undermined when error rates on data collected from new studies exceed those originally reported, which is hardly surprising for a heterogeneous population. In this talk, I will provide a statistical formulation in the large sample limit: sub-populations are modeled as components of a mixture and all error rates are optimal (Bayes) for a two-class problem. For any number m of studies, the error rate in cross-study validation exceeds that in ordinary randomized cross-validation, the latter (averaged) increases with m, and both converge to the optimal rate.



November 12, 2014
12-1pm, 182 George St, Room 110
Multiscale models for shapes and images
Pedro Felzenszwalb
Associate Professor
School of Engineering and Department of Computer Science
Brown University

Markov models are widely used in a variety of applications in many different fields.  In computer vision Markov models are commonly used as priors for shapes and images.  Markov models capture regularities of objects such as sequences or images by modeling their local properties.  This often leads to tractable learning and inference problems.  However it is clear that in many applications realistic priors need to capture non-trivial long-range correlations and high-order properties of objects.  Such models can be difficult to estimate and often lead to intractable inference problems.  In this talk I will discuss an approach for defining models using a multiscale representation of an object.  The idea involves modeling local properties of an object at multiple resolutions.  This leads to a natural low-dimensional parameterization for high-order models.  The resulting models also have structure that can lead to efficient inference algorithms.  I will discuss two instances of the approach in the context of computer vision.  One instance involves modeling curves for recognizing shapes and for detecting shapes in images.  Another instance involves modeling images to recover hidden structure from noisy measurements.




November 20, 2014 (Thursday)
Joint Pattern Theory / LCDS Seminar
4-5pm, Wilson Hall Room 102

Geometric graph-based methods for high dimensional data
Andrea Bertozzi
Professor
Department of Mathematics
UCLA

We present new methods for segmentation of large datasets with graph based structure. The method combines ideas from classical nonlinear PDE-based image segmentation with fast and accessible linear algebra methods for computing information about the spectrum of the graph Laplacian. The goal of the algorithms is to solve semi-supervised and unsupervised graph cut optimization problems. I will present results for image processing applications such as image labeling and hyperspectral video segmentation, and results from machine learning and community detection in social networks, including modularity optimization posed as a graph total variation minimization problem.





December 3, 2014
12-1pm, 182 George St, Room 110

Hypothesis-guided dimensionality reduction and its application to large-scale neuroscience
John Cunningham
Assistant Professor
Department of Statistics
Columbia University

Like many fields, neuroscience is experiencing dramatic increases in the quantity and complexity of recorded data, with the goal of similarly dramatic scientific and technological advances. Critical to realizing these ambitions are the analysis methods to interrogate large datasets jointly.  Many questions in the field reduce to a question of "hypothesis-guided" dimensionality reduction: given a large number of noisy time-series, what shared low-dimensional process best describes the data in terms of that hypothesis? I will begin by briefly describing a few applications of this dimensionality reduction in neuroscience. I will then introduce a generic, hypothesis-guided linear dimensionality reduction solver using an optimization program over the Stiefel and Grassman manifolds. To demonstrate the utility of this solver, I will apply it to neural recordings from the motor cortex.  Here we show that the seemingly hopeless complexity of neural responses can be simply described by a low-dimensional dynamical system at the level of the neural population.  Finally, I will describe a critical statistical test of the significance of this hypothesis, verifying that the inferred structure is not a trivial consequence of high-dimensional data.
http://stat.columbia.edu/~cunningham/


Spring 2014


February 12, 2014
12-1pm, 182 George St, Room 110
Challenging issues in Likelihood Inference on Mixture Models
Daeyoung Kim
Assistant Professor
Department of Mathematics and Statistics
University of Massachusetts Amherst

Modern computing power and algorithms have greatly increased interest in mixture models as an effective tool for modeling heterogeneity in data. Statistical inference for finite mixture models can be done by the method of maximum likelihood (ML) which provides, in a single package, methods for point estimation, hypothesis testing and construction of confidence sets. Despite attractive features of finite mixture models, there are several challenges for likelihood based inference. In this talk we will address two problems: empirical identifiability on the mixture parameters and multiple local maximizers in the mixture likelihood.



February 26, 2014
12-1pm, 182 George St, Room 110
Productivity and Reuse in Language: Non-parametric Bayesian Models of Lexical Acquisition
Timothy O'Donnell
Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology

A much-celebrated aspect of language is the way in which it allows us to express and comprehend an unbounded number of thoughts. This property is made possible because language consists of several combinatorial systems which can productively build novel forms using a large inventory of stored, reusable parts: the lexicon.

For any given language, however, there are many potentially storable units of structure, each giving rise to many more ways of forming novel expressions than are actually used in practice. For example, English contains suffixes which are highly productive and generalizable (e.g., -ness; Lady-Gagaesqueness, pine-scentedness) and suffixes which can only be reused in specific words, and cannot be generalized (e.g., -th; truth, width, warmth). How are such differences in generalizability and reusability represented? What are the basic, stored building blocks at each level of linguistic structure? When is productive computation licensed and when is it not? How can the child acquire these systems of knowledge?

I will discuss several mathematical models of productivity (computation) and reuse (storage), at different levels of linguistic structure. These models all treat the the problem as a tradeoff between storage of frequent (sub)structures and the need to productively generalize to novel cases. The computation/storage tradeoff provides an illuminating new perspective on standard Bayesian tradeoff between fit to data and simplicity/generalizability of hypotheses. This tradeoff is formalized in these models using tools from Bayesian non-parametric statistics.

The distinction between rules and forms that generalize and those that don't gives rise to several deep questions about the nature of the linguistic system. While the tradeoff between productivity and reuse is consistent across different levels of linguistic structure, and can be modeled using Bayesian non-parametrics, linguistically-informed representations at each level are crucial for correctly capturing the empirical patterns in the data. Working with such rich, structured representations leads to significant inference challenges, which I will also briefly discuss.



March 12, 2014
12-1pm, 182 George St, Room 110
Robust inference on parameters via particle filters and sandwich covariance matrices
Neil Shephard
Professor
Departments of Economics and Statistics
Harvard University

Likelihood based estimation of the parameters of state space models can be carried out via a particle filter. In this paper we show how to make valid inference on such parameters when the model is incorrect. In particular we develop a simulation strategy for computing sandwich covariance matrices which can be used for asymptotic likelihood based inference. These methods are illustrated on some simulated data.




April 2, 2014
12-1pm, 182 George St, Room 110
Can connectomics help us understand neural computation? Insights from the fly visual system
Dmitri Chklovskii
Group Leader
Janelia Farm Research Campus

Animal behavior arises from computations in neuronal circuits, but our understanding of these computations has been frustrated by the lack of detailed synaptic connection maps, or connectomes. For example, despite intensive investigations over half a century, the neuronal implementation of local motion detection in the insect visual system remains elusive. We developed a semi-automated pipeline using electron microscopy to reconstruct a connectome, containing 379 neurons and 8,637 chemical synaptic contacts, within the Drosophila optic medulla. By matching reconstructed neurons to examples from light microscopy, we assigned neurons to cell types and assembled a connectome of the repeating module of the medulla. Within this module, we identified cell types constituting a motion detection circuit, and showed that the connections onto individual motion-sensitive neurons in this circuit were consistent with their direction selectivity. Our identification of cell types involved in motion detection allowed targeting of extremely demanding electrophysiological recordings by other labs. Preliminary results from such recordings show time delays confirming our findings. This demonstrates that connectomes can provide key insights into neuronal computations.



April 9, 2014
12-1pm, 182 George St, Room 110
Division of Applied Mathematics / Center for Vision Research Seminar

The computational magic of the ventral stream: a theory (and why some deep architectures work)
Tomaso Poggio
Professor
NSF Center for Brains, Minds and Machines
McGovern Institute
Computer Science and Artificial Intelligence Laboratory
Brain Sciences Department
Massachusetts Institute of Technology

Following on an idea of Stu Geman, I believe that the present phase of Machine Learning is characterized by supervised learning algorithms relying on large sets of labeled examples (n → ∞). The next phase is likely to focus on algorithms capable of learning from very few labeled examples (n → 1), like humans seem able to do. I will introduce a new approach to this problem describing the underlying theory, based on the unsupervised, automatic learning of a "good" representation for supervised learning, characterized by small sample complexity (n).

We consider the case of visual object recognition, though the theory applies to other domains. The starting point is the conjecture, that we can prove in specific cases, that image representations which are invariant to translation, scaling and other transformations can considerably reduce the sample complexity of learning. We prove that an invariant and unique (discriminative) signature can be computed for each image patch. A module performing filtering and pooling, like the simple and complex cells described by Hubel and Wiesel, can compute such estimates. The theory extends existing deep learning convolutional architectures for image and speech recognition. It also suggests that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and discriminative for recognition — and that this representation may be continuously learned in an unsupervised way during development and visual experience.



April 16, 2014
12-1pm, 182 George St, Room 110
The Blended Paradigm:  A Bayesian approach to handling outliers and misspecified models
Steven MacEachern
Professor
Department of Statistics
Ohio State University

Bayesian methods have proven themselves to be enormously successful across a wide range of scientific problems, with analyses ranging from the simple one-sample problem to complicated hierarchical models.  They have many well-documented advantages over competing methods.  However, Bayesian methods run into difficulties for two major and prevalent classes of problems--handling data sets with outliers and dealing with model misspecification.  In both cases, standard Bayesian analyses fall prey to the hubris that is an integral part of the Bayesian paradigm.  The large sample behavior of the analysis is driven by the likelihood.  We propose the use of restricted likelihood as a single solution to both of these problems.  When working with restricted likelihood, we summarize the data, x, through a set of (insufficient) statistics T(x) and update our prior distribution with the likelihood of T(x) rather than the likelihood of x.  By choice of T(x), we retain the main benefits of Bayesian methods while reducing the sensitivity of the analysis to selected features of the data.  The talk will motivate the blended paradigm, discuss properties of the method and choice of T(x), cover the main computational strategies for its implementation, and illustrate its benefits.



April 23, 2014
12-1pm, 182 George St, Room 110
Climate Informatics: Recent Advances and Challenge Problems for Machine Learning in Climate Science
Claire Monteleoni
Assistant Professor
Department of Computer Science
George Washington University

The threat of climate change is one of the greatest challenges currently facing society. Given the profound impact machine learning has made on the natural sciences to which it has been applied, such as the field of bioinformatics, machine learning is poised to accelerate discovery in climate science. Our recent progress on climate informatics reveals that collaborations with climate scientists also open interesting new problems for machine learning. I will give an overview of challenge problems in climate informatics, and present recent work from my research group in this nascent field.

A key problem in climate science is how to combine the predictions of the multi-model ensemble of global climate models that inform the Intergovernmental Panel on Climate Change (IPCC). I will present three approaches to this problem. Our Tracking Climate Models (TCM) work demonstrated the promise of an algorithm for online learning with expert advice, for this task. Given temperature predictions from 20 IPCC global climate models, and over 100 years of historical temperature data, TCM generated predictions that tracked the changing sequence of which model currently predicts best. On historical data, at both annual and monthly time-scales, and in future simulations, TCM consistently outperformed the average over climate models, the existing benchmark in climate science, at both global and continental scales. We then extended TCM to take into account climate model predictions at higher spatial resolutions, and to model geospatial neighborhood influence between regions. Our second algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model from which TCM is derived, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best climate model at a given location. We recently applied a third technique, sparse matrix completion, in which we create a sparse (incomplete) matrix from climate model predictions and observed temperature data, and apply a matrix completion algorithm to recover it, yielding predictions of the unobserved temperatures.



April 30, 2014
12-1pm, 182 George St, Room 110
Division of Applied Mathematics / Center for Vision Research Seminar

Computational Anatomy, High Throughput Neuro-Imaging Informatics, and the BrainCloud
Michael Miller
Professor
Biomedical Engineering and Electrical and Computer Engineering
Johns Hopkins University

Brain parsing is a core technology in Computational Anatomy supporting high throughput indexing of brains. This is allowing us, with the Susumu Mori laboratory, to build several brain clouds in which brains can be searched, retrieved and machine learning can be performed.  We will discuss the notions of brain parsing and "geodesic positioning" central to Computational Anatomy, allowing for the understanding of complex medical imagery in terms of the neuroscience ontology at 1mm scale.

The second half of the talk will focus on our medical findings in two of the several neurodegenerative diseases being studied worldwide, dementia of the Alzheimer’s type, and Huntington’s disease. We will describe our results in preclinical dementia in the BIOCARD project led by Marilyn Albert, attempting to characterize the onset and trajectory of biomarker change in temporal lobe structures - amygdala, hippocampus, and entorhinal cortex – prior to conversion to clinical Alzheimer’s symptomology. Secondly, we will discuss results in Huntington’s disease in the PREDICT-HD project, led by University of Iowa, examining motor and subcortical structure change as a function of genetic load.


Fall 2013


September 11, 2013
12-1pm, 182 George St, Room 110
Parameter estimation robust to low-frequency contamination
Adam McCloskey
Assistant Professor
Department of Economics
Brown University

We provide methods to robustly estimate the parameters of stationary ergodic short-memory time series models in the potential presence of additive low-frequency contamination. The types of contamination covered include level shifts (changes in mean) and monotone or smooth time trends, both of which have been shown to bias parameter estimates towards regions of persistence in a variety of contexts. The estimators presented here minimize trimmed frequency domain quasi-maximum likelihood (FDQML) objective functions without requiring specification of the low-frequency contaminating component. We provide two approaches, allowing for either thin or heavy-tailed data. When proper sample size-dependent trimmings are used, the FDQML estimators are consistent and asymptotically normal, asymptotically eliminating the presence of any spurious persistence. These asymptotic results also hold in the absence of additive low-frequency contamination, enabling the practitioner to robustly estimate model parameters without prior knowledge of whether contamination is present. Popular time series models that fit into the framework of this article include ARMA, stochastic volatility, GARCH and ARCH models. We explore the finite sample properties of the trimmed FDQML estimators of the parameters of some of these models, providing practical guidance on trimming choice. Empirical estimation results suggest that a large portion of the apparent persistence in certain volatility time series may indeed be spurious.

Joint work with Jonathan B. Hill (UNC-Chapel Hill).



October 9, 2013
12-1pm, 182 George St, Room 110
Model selection in a large compositional space
Roger Grosse
Massachusetts Institute of Technology

We often build complex probabilistic models by "composing" simpler models -- using one model to generate the latent variables for another model. This allows us to express complex distributions over the observed data and to share statistical structure between different parts of a model. I'll present a space of matrix decomposition models defined by the composition of a small number of motifs of probabilistic modeling, including clustering, low rank factorizations, and binary latent factor models. This compositional structure can be represented by a context-free grammar whose production rules correspond to these motifs. By exploiting the structure of this grammar, we can generically and efficiently infer latent components and estimate predictive likelihood for nearly 2500 model structures using a small toolbox of reusable algorithms. Using a greedy search over this grammar, we automatically choose the decomposition structure from raw data by evaluating only a small fraction of all models. The proposed method typically finds the correct structure for synthetic data and backs off gracefully to simpler models under heavy noise. It learns sensible structures for datasets as diverse as image patches, motion capture, 20 Questions, and U.S. Senate votes, all using exactly the same code.




October 23, 2013
12-1pm, 182 George St, Room 110
High-dimensional statistics
Sahand Negahban
Assistant Professor
Department of Statistics
Yale University

The focus of my research is to develop theoretically sound methods, which are both computationally and statistically efficient, for extracting information from large datasets. A salient feature of my work has been to understand how hidden low-complexity structure in large datasets can be used to develop computationally and statistically efficient methods for extracting meaningful information for high-dimensional estimation problems. My work borrows from and improves upon tools of statistical signal processing, machine learning, probability and convex optimization.



November 6, 2013
12-1pm, 182 George St, Room 110
Dirichlet process mixture inconsistency for the number of components, and dimension mixture models
Jeffrey Miller
Brown University

Note: The first half will essentially be a practice talk for my NIPS 2013 presentation.

For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of components occurring so far, that is, the posterior on the number of clusters in the observed data. However, it turns out that this posterior is not consistent; it does not converge to the true number of components.

Motivated by this finding, we examine an alternative approach to Bayesian nonparametric models. Many of the commonly-used nonparametric models are infinite-dimensional, such as Dirichlet or Pitman-Yor process mixtures (DPMs/PYMs), Hierarchical Dirichlet processes (HDPs), and Indian buffet processes (IBPs). A less common but very natural type of Bayesian nonparametric model is constructed by taking a family of finite-dimensional models and putting a prior on the dimension --- that is, taking a mixture of finite-dimensional models. Interestingly, this approach gives rise to combinatorial stochastic processes that closely parallel those of DPMs/PYMs, HDPs, and IBPs, and has certain advantages over the infinite-dimensional approach.



November 20, 2013
12-1pm, CIT 134
Interactive Demonstrations in the Brown Robotics Lab
Chad Jenkins
Associate Professor
Department of Computer Science
Brown University

This meeting will be a tour of the Brown Robotics Lab in the CIT Building.  This tour will consist of interactive demonstrations of the PR2 robot for perception and physical manipulation of tabletop objects using data from visual streams.  This demo will be complemented with a description of the underlying computational methods, such as processing of point clouds from RGBD data, object recognition, and adjustable autonomy for human-robot interaction.  Time permitting, demos of other robots, such as the Rethink Baxter, Suitable Beam, and AR.Drone quad-rotor helicopter, will occur with a description of their vision-related technical challenges.



December 4, 2013
12-1pm, 182 George St, Room 110
Big data in neuroscience: Where is the information?
Joachim Buhmann
Professor
Department of Computer Science
ETH Zurich

Neuroscience provides data at very different scales: microscopic imagery (ssTEM) measures information on the structure of neurons, diffusion tensor imaging tracks fibers between brain areas and the hemispheres, fMRI data measures spatial activity patterns to estimate dynamic causal models. At all three length scales, machine learning methods extract complex structures from vast amount of data. Model selection plays an important role in controlling the complexity of the explanation. Furthermore, the amount of supervision information is very scarce since human annotators display a high error rate on these ambiguous data sources. We advocate a maximum entropy approach to interpret such a data source and their model complexity is regularized by an information theoretic model selection principle.


Spring 2013


January 30, 2013
12-1pm, 182 George St, Room 110
Composite Likelihood
Nancy Reid
University Professor
Department of Statistics
University of Toronto

In complex multivariate settings, it is sometimes much simpler to construct a 'likelihood' from lower dimensional marginal or conditional distributions: the general construction was called composite likelihood in Lindsay (1988). Under mild conditions the resulting composite maximum likelihood estimate is consistent and asymptotically normal, with asymptotic variance of the 'sandwich' form that also arises in mis-specified models. A recent workshop at the Banff International Research Station on composite likelihood had as one goal to identify important directions for future research in the theory of composite likelihood, in software development, and in methodology for different areas of application. I will give an overview of the progress made at this workshop and highlight areas where more research is needed.



February 20, 2013
12-1pm, 182 George St, Room 110
Nonstationary Modeling Through Dimension Expansion
Luke Bornn
Assistant Professor
Department of Statistics
Harvard University

If atmospheric, agricultural, and other environmental systems share one underlying theme it is complex spatial structures, being influenced by such features as topography and weather. Ideally we might model these effects directly; however, information on the underlying causes is often not routinely available. Hence, when modeling environmental systems there exists a need for a class of spatial models which does not rely on the assumption of stationarity.

In this talk, we propose a novel approach to modeling nonstationary spatial fields. The proposed method works by expanding the geographic plane over which these processes evolve into higher dimensional spaces, transforming and clarifying complex patterns in the physical plane. By combining aspects of multi-dimensional scaling, group lasso, and latent variable models, a dimensionally sparse projection is found in which the originally nonstationary field exhibits stationarity. Following a comparison with existing methods in a simulated environment, dimension expansion is studied on a classic test-bed data set historically used to study nonstationary models. Following this, we explore the use of dimension expansion in modeling air pollution in the United Kingdom, a process known to be strongly influenced by rural/urban effects, amongst others, which gives rise to a nonstationary field.




February 27, 2013
12-1pm, 182 George St, Room 110
Bayesian model sampling in reinforcement learning
Michael Littman
Professor
Department of Computer Science
Brown University

Reinforcement learning (RL) is a subfield of machine learning concerned with decision makers that adapt their behavior given utility-based feedback.   While temporal difference methods dominate the field, there are a growing number of researchers examining Bayesian statistics as a way of learning about unfamiliar environments while modeling their own uncertainty and addressing the classic exploration/exploitation dilemma.  I will introduce the RL problem and provide some background into existing Bayesian methods, as well as presenting current efforts from our lab to generalize these approaches to broader model classes and more efficient computation.



March 20, 2013
12-1pm, 182 George St, Room 110
Sparse superposition codes: Communication by regression
Andrew Barron
Professor
Department of Statistics
Yale University

A solution is presented to the problem of provable, fast, reliable, capacity-achieving communications for the additive Gaussian noise channel. This is an important real communications problem for which the fundamental limits of reliability were initially established in the capacity theorem of Shannon in 1948. Not till the 1990s were practical high-rate schemes developed and these form an essential part of the cell phone revolution. However, the demonstration of their reliability is only empirical. The codes we develop here are sparse superposition codes based on a high-dimensional regression model, and they have fast encoders and decoders based on iterative regression fits. In this presentation we describe the framework permitting theoretical demonstration of the desired properties. The error probability is shown to scale favorably with the size of the code, namely, the error probability is exponentially small for every fixed communication rate below capacity. Remaining challenges are discussed in understanding how the complexity scales when the rate approaches capacity. This work is joint with Antony Joseph and Sanghee Cho.



April 3, 2013
12-1pm, 182 George St, Room 110
Learning to behave by reading
Regina Barzilay
Associate Professor
Department of Electrical Engineering and Computer Science
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

In this talk, I will address the problem of grounding linguistic analysis in control applications, such as game playing and robot navigation. We assume access to natural language documents that describe the desired behavior of a control algorithm (such as game strategy guides). Our goal is to demonstrate that knowledge automatically extracted from such documents can dramatically improve performance of the target application. First, I will present a reinforcement learning algorithm for learning to map natural language instructions to executable actions. This technique has enabled automation of tasks that until now have required human participation -- for example, automatically configuring software by consulting how-to guides. Next, I will present a Monte-Carlo search algorithm for game playing that incorporates information from game strategy guides. In this framework, the task of text interpretation is formulated as a probabilistic model that is trained based on feedback from Monte-Carlo search. When applied to the Civilization strategy game, a language-empowered player outperforms its traditional counterpart by a significant margin.



April 10, 2013
12-1pm, 182 George St, Room 110
Encoding binary neural codes in networks of threshold-linear neurons
Carina Curto
Assistant Professor
Department of Mathematics
University of Nebraska-Lincoln (UNL)

Networks of neurons in the brain encode preferred patterns of neural activity via their synaptic connections. Despite receiving considerable attention, the precise relationship between network connectivity and encoded patterns is still poorly understood. Here we consider this problem for networks of threshold-linear neurons whose computational function is to learn and store a set of binary patterns (e.g., a neural code) as "permitted sets" of the network.  We introduce a simple Encoding Rule that selectively turns "on" synapses between neurons that co-appear in one or more patterns.  The rule uses synapses that are binary, in the sense of having only two states ("on" or "off"), but also heterogeneous, with weights drawn from an underlying synaptic strength matrix S.  Our main results precisely describe the stored patterns that result from the Encoding Rule -- including unintended "spurious" states -- and give an explicit characterization of the dependence on S. As a consequence, we find that certain neural codes are "natural" in the context of these networks; i.e., the structure of the code closely matches the structure of emerging spurious states, allowing the full code to be accurately learned from a highly undersampled set of patterns. Interestingly, many commonly observed neural codes in cortical and hippocampal areas are natural in this sense. As an application, we construct networks that encode hippocampal place field codes nearly exactly, following presentation of only a small fraction of patterns. To obtain our results, we prove new theorems using classical ideas from convex and distance geometry, such as Cayley-Menger determinants, revealing a novel connection between these areas of mathematics and coding properties of neural networks.

This is joint work with Anda Degeratu and Vladimir Itskov.



April 17, 2013
12-1pm, 182 George St, Room 110
Towards a general theory of human learning and reasoning
Charles Kemp
Assistant Professor
Department of Psychology
Carnegie Mellon University

People learn and reason about animals, spatial relations, kinsfolk, and many other domains, and solve a broad range of inductive problems within each of these domains.  The full set of domains and the full set of inductive problems within these domains can be collectively described as the conceptual universe. I will present a systematic characterization of the conceptual universe that helps to clarify the relationships between familiar inductive problems such as property induction, categorization, and stimulus generalization, and that introduces new inductive problems for psychological investigation. I will illustrate the framework using case studies that include behavioral and computational studies of inductive reasoning, and a formal analysis of kinship classification across cultures.



May 8, 2013
12-1pm, 182 George St, Room 110
Scale-invariance and metrics on spaces of plane curves
Matt Feiszli

The study of metrics and distances on spaces of curves and surfaces is an active area of research with applications in vision and imaging. We will present metrics on spaces of plane curves using techniques from harmonic, complex, and geometric analysis. Their behavior can be best understood through notion of scale, and we will discuss geometric ways to understand their multiscale behavior. We will discuss their application to problems in shape deformation and shape matching, and demonstrate scale-invariant approaches to curve matching on standard test datasets.


Fall 2012


October 3, 2012
12-1pm, 182 George St, Room 110
Recovering Large Networks via Optimizing Non-likelihood Functions
Rossi Luo
Assistant Professor
Department of Biostatistics and Center for Statistical Sciences
Brown University

Graphical models are used to describe the relationships between multiple variables, and a class of undirected graphical models can be uncovered from estimating covariance matrices. Popular approaches include optimizing (regularized) likelihood functions, especially in the setting when the sample size is much smaller than the number of variables. In this high dimensional setting, I will describe a few non-likelihood optimization approaches to estimate large covariance and inverse covariance matrices. These approaches are based on exploring algebraic properties of these matrices and low dimensional assumptions, such as sparsity and low rank. Convergence rates of these estimators are obtained for finite samples, when the underlying distribution has either exponential or polynomial tails. All the proposed methods can be formulated as convex optimization problems. A first-order algorithm is proposed to solve one of the non-smooth optimization problems, and its global iterative complexity is also proved. Numerical merits of these methods are demonstrated using simulation and real datasets from brain sciences.



October 24, 2012
12-1pm, 182 George St, Room 110
Active Learning under Margin Assumptions
Sivan Sabato
Postdoctoral Researcher
Microsoft Research New England

We derive and analyze a new, efficient, pool-based active learning algorithm for half-spaces. Most previous algorithms show exponential improvement in the label complexity assuming that the distribution over the instance space is close to uniform. This assumption rarely holds in practical applications. Instead, we study the label complexity under a large-margin assumption---a much more realistic condition, as evident by the success of margin-based algorithms such as SVM. Our algorithm is computationally efficient and comes with formal guarantees on its label complexity. The analysis is based on new results for sub-modular optimization. The algorithm also naturally extends to the non-separable case and to non-linear kernels. We further show experimentally that our approach yields superior label complexity to previous active learners on real problems.

Joint work with Alon Gonen and Shai Shalev-Shwartz.




November 7, 2012
12-1pm, 182 George St, Room 110
Natural image statistics and image restoration
Yair Weiss
Professor

School of Computer Science and Engineering
The Hebrew University of Jerusalem

Learning a statistical model of natural images is a longstanding topic of research in disciplines ranging from engineering to computational neuroscience. Somewhat embarrassingly, however, algorithms based on these statistical models do not give state-of- the-art performance in image restoration, and are outperformed by "block matching" methods that do not have an explicit probabilistic model. In this work, we show that many widely used statistical models of images are actually not very good density models and a simple, unconstrained, Gaussian Mixture Model (GMM) can give much higher likelihood to unseen images.  Using the learned GMM, we obtain state-of-the-art image restoration performance and by examining what the GMM has learned we obtain new insights into the statistics of natural images that are not captured by most existing models.

Joint work with Daniel Zoran.




November 14, 2012
12-1pm, 182 George St, Room 110
Linking signaling pathways and dynamic regulatory networks
Anthony Gitter
Postdoctoral Researcher
Microsoft Research New England

Adaptation to ever-changing environmental conditions is vital to the survival of all organisms. In many stress responses upstream proteins detect an environmental stimulus and propagate signals to the nucleus where transcription factors activate or inhibit genes. Although these upstream proteins and temporal changes in gene expression can be readily observed, the pathways and transcription factors involved remain hidden. We present a strategy for integrating large-scale data to discover the signaling and transcriptional components of stress response. Our model captures the dynamics of transcription factor activity and the directed pathways that activate these regulators, which are inferred from an undirected protein interaction network. Because our approach requires little condition-specific data, it is widely applicable to many conditions and species. Joint work with Miri Carmi, Naama Barkai, and Ziv Bar-Joseph.



Spring 2012


February 1, 2012
12-1pm, 182 George St, Room 110
Statistical analysis of populations with interacting and interfering units
Edo Airoldi
Assistant Professor
Department of Statistics, and FAS Center for Systems Biology
Harvard University

A number of scientific endeavors of current national and international interest involve populations with interacting and/or interfering units. In these problems, a collection of partial measurements about patterns of interaction and interference (e.g., social structure and familial relations) is available, in addition to the more traditional measurements about unit-level outcomes and covariates. Formal statistical models for the analysis of this type of data have emerged as a major topic of interest in diverse areas of study. Probability models on networks date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online social networking websites such as Facebook and LinkedIn, and a host of more specialized professional networking communities has intensified interest in the study of networks, structured measurements and interference. In this talk, I will review a few ideas and open areas of research that are central to this burgeoning literature, placing emphasis on inference and other core statistical issues. Topics include elements of sampling and inference from non-ignorable (network sampling) designs, and semi-parametric modeling, with hints to the applications to social, biological and information networks that motivate these statistical problems.



February 8, 2012
12-1pm, 182 George St, Room 110
Exploiting Sparse Structure by Spectral Connectivity Analysis
Ann Lee
Associate Professor
Department of Statistics
Carnegie Mellon University

For naturally occurring data, the dimension of the given input space is often very large while the data themselves have a low intrinsic dimensionality. Spectral kernel methods are non-linear techniques for transforming data into a coordinate system that efficiently reveals the underlying structure -- in particular, the "connectivity" -- of the data. In this talk, I will focus on one particular technique -- diffusion maps -- but the analysis can be used for other spectral methods as well. I will give examples of various applications of the method in high-dimensional regression and density estimation. I will also present a new extension of the diffusion framework to comparing distributions in high-dimensional spaces with an application to image retrieval and texture discrimination. (Part of this work is joint with R.R. Coifman, D. Liu, C. Schafer and L. Wasserman.)



March 21, 2012
12-1pm, 182 George St, Room 110
The tree of life and the evolution of genome function
Casey Dunn
Manning Assistant Professor
Ecology and Evolutionary Biology
Brown University

There are estimated to be about 10 million species on the planet, though much of what we know about genome function is the result of work in about 10 species - the laboratory model organisms. Work on model organisms has been extraordinarily productive over the past 100 years, but our focus on such a small number of organisms means we have only looked through a keyhole at the diversity genome function in life on earth. Many interesting biological phenomena remain out of reach, we know little about gene function in the context of macro-evolution and ecosystem interactions, and we have only observed gene function within narrow environmental conditions. We are now on the cusp of being able to do functional genomics research in the other 10 million species on the plant, right in their natural environment without growing them in the lab. But to do so will require a complete retooling of data acquisition, entirely new analysis approaches, and a shift in perspective that embraces differences between species rather than obsess over global invariants. Here I outline some of these challenges, including the new analysis challenges we face.



March 28, 2012
12-1pm, 182 George St, Room 110
What is the neural foundation of the vision process?
Christoph von der Malsburg
Senior Fellow
Frankfurt Institute of Advanced Studies

The current consensus version of the neural basis of vision -- or of cognition in general -- makes it difficult to model some of the central operations to be performed during perception and learning.  I will offer the dynamic link architecture as an alternative physical basis for cognitive processes, will discuss it as a natural framework for vision and other cognitive processes, and will illustrate and support the argument with the help of a concrete model of invariant object recognition.



April 18, 2012
12-1pm, 182 George St, Room 110
LP Relaxations for Global Models in Natural Language Processing
Sebastian Riedel
Research Scientist
University of Massachusetts Amherst

Much work in NLP relies on pipelines of classifiers. For example, when mining biomedical text we usually first extract proteins and other entities, and then determine their relations to each other.  Such pipelines suffer from cascading errors: once an upstream processor errs, downstream processors often cannot recover. This has led to work that tackles NLP problems in a more global and joint manner.

In this talk I will first present our work on addressing biomedical event extraction as global optimization problem. This approach achieved state-of-the-art results on several benchmark datasets, and ranked first in the latest international competition on this task. I will illustrate the enabling technology behind our global approach: Linear Programming (LP) relaxations of the global optimization problem. In particular, I will show how we divide event extraction into tractable subproblems and apply dual decomposition to efficiently find LP optima.

In the second part of my talk I will illustrate our ongoing work on addressing the computational challenge that arises from a superlinear number of variables in our LP relaxations. I will show how we use row and column generation to reduce runtime of a global dependency parsing model by 8 times, without loss of optimality guarantees. Our method relies on upper bounds for the reduced costs of variables which we derive from "grammar constants" of our probabilistic model.



April 25, 2012
12-1pm, 182 George St, Room 110
Hallucinating scene detail and recognizing scene attributes
James Hays
Assistant Professor
Department of Computer Science
Brown University

In this talk I will discuss two recent projects. First I will present a massively data driven approach to super-resolution. In contrast to recent image enhancement literature, we show that image databases can be more useful than internal image statistics if the right representations are used. Our super-resolution method can insert scene-specific textures and transitions beyond the capabilities of prior work. Second, I will introduce the new SUN Attribute Database. We use crowd-sourcing to discover a taxonomy of attributes and label 14,000 images from 700 scene categories with more than one hundred attributes. We show the first results for attribute recognition and argue that attributes are a natural way to describe the space of scenes.



May 9, 2012
12-1pm, 182 George St, Room 110
Computational Challenges in Molecular Medicine
Donald Geman
Professor
Department of Applied Mathematics and Statistics
The Johns Hopkins University

Despite the "omics" revolution, advances in clinical practice through fundamental research in systems biology are rare.  First, I will talk about the technological, mathematical and translational barriers encountered in attempting to apply statistical learning to cancer genomics using high-throughput transcriptional data.  Then I will propose a methodology based on comparisons of molecular counts, and argue that these are easy to interpret, can account for combinatorial interactions among genes and gene products and might even support a mechanistic interpretation for the underlying decision rules.  These ideas will be illustrating by predicting disease phenotypes, the site of origin of metastatic tumors and pathway deregulation.



May 16, 2012
12-1pm, Metcalf Auditorium (101)
Division of Applied Mathematics & Center for Vision Research Seminar
Lessons from Photographing and Identifying the Word's Plant Species
Peter Belhumeur
Professor
Department of Computer Science
Columbia University

Columbia University, the University of Maryland, and the Smithsonian Institution are working on visual recognition software to help identify species from photographs. I will discuss our work on developing Leafsnap -- the first in a series of electronic field guides. As part of this work, we have completed photographing close to one third of the world's plant species and have begun capturing beautiful high-resolution images of live specimens. Our work has led us in new research directions for the visual recognition of human faces, dog breeds, and bird species, including the adoption of centuries-old techniques from taxonomy for the process of labeling images with visual attributes and object parts. In particular, I will show that it is possible to automatically recognize a wide range of visual attributes and parts in images and use them in numerous applications of computer vision.


Fall 2011


September 14, 2011
12-1pm, 182 George St, Room 110
Non-parametric Priors for Segmentation of Medical Images
Polina Golland
Associate Professor
Department of Electrical Engineering and Computer Science, and CSAIL
Massachusetts Institute of Technology

We propose a non-parametric probabilistic model for prior knowledge in segmentation of medical images. The resulting inference algorithms register individual training images to the new image, transfer the segmentation labels and fuse them to obtain the final segmentation of the test subject. Our generative model yields previously proposed label fusion algorithms as special cases, and also leads to a new variant that aggregates evidence for the segmentation label locally. We demonstrate the advantages of our approach in two clinical applications: segmentation of neuroanatomical structures and segmentation of the left heart atrium whose shape varies significantly across the population.



October 5, 2011
12-1pm, 182 George St, Room 110
Dynamic regulation of decision threshold by frontal cortex and basal ganglia
Michael Frank
Associate Professor
Department of Cognitive, Linguistic, & Psychological Sciences
Department of Psychiatry and Human Behavior
Brown University

The basal ganglia and frontal cortex interact to support reward-based decision making. While medial prefrontal cortex (mPFC) is implicated in processing uncertainty and reinforcement conflict,  the subthalamic nucleus (STN) is thought to act as a brake on corticostriatal function, preventing premature impulsive responding. Simulations using a neural circuit model combined with drift diffusion model analysis show that the STN acts to temporarily increase the decision threshold as a function of conflicting  cortical decision plans, and that this same decision threshold modulation accounts for human behavioral data. We tested these posited neural mechanisms in experiments with electrophysiological recordings over mPFC and in the STN (intraoperative recordings from Parkinson's patients undergoing deep brain stimulation surgery). We show a relationship between mPFC activity and decision threshold adjustment which is altered by STN manipulation.  These data support the hypothesis that frontal-basal ganglia communication regulates decision processes as a function of reinforcement conflict.



October 12, 2011
12-1pm, 182 George St, Room 110
Spectral Methods for Learning Graphical Models
Sham Kakade
Senior Research Scientist
Microsoft Research New England
Associate Professor of Statistics
The Wharton School, University of Pennsylvania

This work presents a methodology for learning graphical models with hidden nodes through algebraic techniques (in particular, matrix decomposition and spectral methods), using independent samples of the observed variables. The talk focuses on tree models, and covers two aspects of the underlying learning problem: parameter estimation and structural learning. The underlying idea is to utilize the spectral decomposition of the second moment matrix to reveal the latent structure.

The first part is concerned with parameter estimation. Here, we present an an efficient and provably correct algorithm for learning HMMs (i.e. recovering the correct HMM dynamics), with a sample complexity depending on some mild conditions of the underlying system. The algorithm is also simple, employing only a singular value decomposition and matrix multiplications, and does not suffer from local minimum issues in non-convex optimization (such as for more traditional approaches, including the EM algorithm), and it handles high dimensional observations and long range dependencies more easily. The method can be extended to estimating parameters for nonlinear systems and general tree structured graphical models with unobserved nodes.

The second part is concerned with structural learning, where we provide the Spectral Recursive Grouping algorithm, an efficient and simple procedure for recovering the underlying tree topology of a broad class of multivariate tree models with hidden nodes. Exact recovery of the tree structure can be established based on certain natural dependencies on statistical and structural properties of the underlying joint distribution.

Join work with: Daniel Hsu, Tong Zhang; Anima Anandkumar, Kamalika Chaudhuri, Le Song




October 19, 2011
12-1pm, 182 George St, Room 110
Statistical challenges in neural data analysis
Liam Paninski
Associate Professor
Department of Statistics
Columbia University

Systems and circuit neuroscience have recently experienced something of a renaissance, driven by a remarkable explosion in the development of groundbreaking new experimental tools. These methods have in turn opened up a variety of challenging statistical problems, in which we must often perform some high-dimensional inference under strong computational constraints (for example, in some cases real-time processing is required). This talk will review some recent progress on three exemplary problems, each of fundamental neuroscientific importance: 1) Optimal filtering and smoothing of voltage signals on large, complex dendritic trees; 2) Optimal decoding of sensory information from the activity of large neural populations, and 3) Inference of connectivity in large neuronal networks given limited, noisy observations.



November 2, 2011
12-1pm, Barus & Holley Room 190
Perceptual Fragments: Bottom-Up and Top-Down Use of Shape in Object Recognition
Benjamin Kimia
Professor
Division of Engineering
Brown University

The bottom-up “segmentation followed by recognition” strategy has for some time now given way to feature-based discriminative recognition with significant success.  As the number of categories and exemplars per category increases, however, low-level features are no longer sufficiently discriminative, motivating the construction and use of more complex features. It is argued here that these complex features will necessarily be encoding shape and this in turn requires curves and regions, thus reviving aspects of bottom-up segmentation strategies. We suggest that the demise of segmentation was due to prematurely committing to a grouping in face of ambiguities and propose a framework for representing multiple grouping options in a containment graph. Specifically, we use contour symmetry to partition the image into atomic fragments and define transforms to iteratively grow these atomic fragments into mote distinctive perceptual fragments, the nodes of the containment graph. We also briefly present a fragment-based language for generating shapes and the use of fragments in top-down category recognition. The bottom-up and top-down processes are then integrated by interaction through the mid-level representation of perceptual fragments.



November 30, 2011
12-1pm
, 182 George St, Room 110
Exploring the role of ventral premotor cortex in reach-to-grasp movements: neural trajectories through spike train similarity space
Carlos Vargas-Irwin
Postdoctoral Research Associate in Neuroscience
Brown University

Dimensionality reduction applied to neural ensemble data has led to the concept of a 'neural trajectory', a low-dimensional representation of how the state of the network evolves over time. Here we present a novel neural trajectory extraction algorithm which combines spike train distance metrics  (Victor and Purpura, 1996) with dimensionality reduction based on local neighborhood statistics (van der Maaten and Hinton, 2008.) . We apply this technique to describe and quantify the activity of primate ventral premotor cortex neuronal ensebles in the context of a cued reaching and grasping task with instructed delay.

References:
Victor and Purpura. Nature and precision of temporal coding in visual cortex: a metric-space analysis. J Neurophysiol. 1996 Aug;76(2):1310-26.
L.J.P. van der Maaten and G.E. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.



2010-2011


October 6, 2010
12-1pm,
CIT 241 (Swig Boardroom)  *** change in location ***
Learning Probabilistic Models with Deep Hierarchical Structures
Ruslan Salakhutdinov
Postdoctoral Fellow
Department of Brain and Cognitive Sciences and CSAIL
Massachusetts Institute of Technology

Building intelligent systems that are capable of extracting higher-order knowledge from high-dimensional data and successfully transferring that knowledge to learning new concepts lies at the core of solving many AI related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that involve many layers of nonlinear processing.

In this talk I will first introduce a broad class of probabilistic generative models called Deep Boltzmann Machines (DBMs) that contain many layers of latent variables. I will describe a new learning algorithm for this class of models that uses variational methods and Markov chain Monte Carlo (MCMC). This new learning algorithm, in addition to a bottom-up pass, can incorporate top-down feedback, which allows DBMs to better propagate uncertainty about ambiguous inputs. I will show that these deep models can learn interesting representations and can be successfully applied in many application domains, including information retrieval, object recognition, and nonlinear dimensionality reduction. In the second part of the talk, I will describe new ways of developing more complex systems that combine Deep Boltzmann Machines with more structured hierarchical Bayesian models. I will show how these hybrid models can learn a deep hierarchical structure for sharing knowledge across hundreds of visual categories, which allows efficient learning of new categories from few, even just one, examples -- a problem known as 'one-shot learning'.




October 20, 2010
12-1pm, 182 George St, Room 110
Discovering Influential Variables: A Partition Retention Approach
Herman Chernoff
Professor Emeritus
Department of Statistics, Harvard University
Department of Applied Mathematics, MIT

A general approach, pioneered by S.H. Lo and T. Zheng, the method of Partition Retention, is designed to identify those variables among many possible explanatory variables, which have an influence on a dependent variable. This method can be successful when the influence depends on the interaction of several of these variables and the marginal influences of the individual variables are negligible.  In those cases where the number of candidates is huge, thousands, this method involves the idea of resuscitating apparently "dead" or useless variables which depend on interactions.  The method is computer intensive and involves measuring influence of a small group of candidates, from which the least influential are dropped one at a time until only promising ones are retained.  This procedure is repeated many times with small randomly selected subgroups of the candidate variables, and attention is focused on those which are retained most often.




Wednesday, November 3, 2010
12-1pm, 182 George St, Room 110
Bottom-up and top-down processing in visual perception
Thomas Serre
Assistant Professor
Department of Cognitive, Linguistic, & Psychological Sciences
Brown University

Perception involves a complex interaction between feedforward sensory-driven information and feedback attentional, memory, and executive processes that modulate such feedforward processing. A mechanistic understanding of feedforward and feedback integration is a necessary step towards elucidating key aspects of visual and cognitive functions and dysfunctions.

In this talk, I will describe a computational framework for the study of visual perception. I will present computational as well as experimental evidence suggesting that bottom-up and top-down processes make a distinct and essential contribution to the recognition of complex visual scenes. A feedforward hierarchical architecture may provide a satisfactory account of "immediate recognition" corresponding to the first few hundred milliseconds of visual processing. However, such an architecture may be limited in recognizing complex visual scenes. I will show how attentional mechanisms and cortical feedback may help improve object recognition performance in complex cluttered scenes.




Wednesday, November 17, 2010
12-1pm, 182 George St, Room 110
Point process adaptive filters and the analysis of ensemble neural spiking activity
Uri Eden
Assistant Professor
Department of Mathematics and Statistics
Boston University

Although it is well known that brain areas receive, process and transmit information via sequences of sudden, stereotyped electrical impulses, called action potentials or spikes, most analyses of neural data ignore the localized nature of these events. The theory of point processes offers a unified approach to modeling the firing properties of spiking neural systems, and assessing goodness-of-fit between a neural model and observed spiking data.

We develop a point process modeling framework and state space estimation algorithms to describe and track the evolution of dynamic representations from large neural ensembles.  This allows us to derive a toolbox of estimation algorithms and adaptive filters to address questions of static and dynamic encoding and decoding.  In our analysis of these filtering algorithms, we draw analogies to well-studied linear estimation algorithms for continuous valued processes, such as the Kalman filter and its discrete and continuous time extensions.

These methods will be illustrated in the analysis of spatially specific spiking activity in rat hippocampus. Using simple point process models, we are able to accurately characterize the localized spiking activity of these neurons as a function of the animal's position in its environment, track changes in their firing properties, reconstruct the animal's past movements, and predict its future behavior from a population of spiking neurons.




February 9, 2011
12-1pm,
182 George St, Rm 110
Division of Applied Mathematics & Center for Vision Research Seminar
Learning on Analytic Manifolds
Fatih Porikli
Senior Principal Research Scientist and Technical Manager, Imaging Group
Mitsubishi Electric Research Labs, Cambridge, MA

A large number of natural phenomena can be formulated on analytic manifolds. More specifically in computer vision, such underlying notions emerge in multi-factor analysis including feature selection, pose estimation, structure from motion, appearance tracking, and shape embedding. Unlike Euclidean spaces, analytic manifolds does not exhibit global homeomorphism, thus, differential geometry is applicable only locally. This prevents application of conventional inference and learning methods, which require vector norms.
 
Recently we introduced appearance based object descriptors and motion transformations that exhibit Riemannian manifold structure on positive definite matrices that enable projections of the original problems onto tangent spaces. In this manner, we do not need to flatten the underlying manifold or discover its topology. This talk will demonstrate entertaining results of manifold learning on human detection, regression tracking, unusual event analysis, and affine pose estimation.



February 16, 2011
12-1pm,
182 George St, Rm 110
Patterns of Thought (in humanoid robots)
Yiannis Aloimonos
Professor
Department of Computer Science
University of Maryland

Patterns of thought, for a cognitive system that interprets a scene or performs a task, consist of perceptual, motor and linguistic representations of the world, controlled by an attention mechanism.  The role of the attention mechanism is to facilitate parsing of the different representations according to a grammar of action.  In this talk I will introduce, for the first time, an action grammar.  It is really a disguised version of Chomsky’s Universal Grammar adapted in a minimalist framework.  Parsing requires an attention mechanism comprising visual human and object filters, active segmentation routines and a context implementing system utilizing language.  I will describe our developments in learning visual filters, achieving robust segmentation techniques and integrating vision with language and our implementations of these ideas in the field of Humanoid Robotics and the general area of video interpretation.



March 9, 2011
12-1pm,
182 George St, Rm 110
A history of applying principal component analyses to human population genetic data
Sohini Ramachandran
Assistant Professor
Department of Ecology and Evolutionary Biology, and
Center for Computational Molecular Biology
Brown University

Nearly 30 years ago, Luca Cavalli-Sforza and colleagues pioneered the use of PCA in population genetics, producing maps that summarized observed human genetic variation across many continental regions.  Patterns in these maps were interpreted as signatures of specific migration events.  As genetic datasets have grown in the number of markers used and as sequencing technology has improved, PCA has been widely applied to population genetic datasets.  I will present a review of how PCA is used in population genetics and discuss the conclusions that have been drawn via this method, as well as other observed genetic patterns in geographic space in my research on both the inference of past migrations using extant human genetic data and the genetic underpinnings of disease.



March 16, 2011
12-1pm,
182 George St, Rm 110
Solving Inference, Optimization, and Constraint Satisfaction Problems with the Divide & Concur and Belief Propagation Message-Passing Algorithms
Jonathan Yedidia
Distinguished Member Research Staff
Mitsubishi Electric Research Labs, Cambridge, MA

In "probabilistic inference" problems one tries to estimate the true state of some quantities of interest, given only noisy or ambiguous sensor measurements of related quantities.  Such problems arise in a very wide variety of fields, with applications in communications, signal processing, computer vision, speech and audio recognition, machine learning, physics, and robotics.

In this talk, I will describe and compare two particularly important algorithms that can solve probabilistic inference problems, as well as related constraint satisfaction and optimization problems: the celebrated "belief propagation" (BP) algorithm; and the "divide and concur" (D&C) algorithm.  I will show that the D&C algorithm, which was recently introduced by Gravel and Elser for solving non-convex constraint satisfaction problems, can also be understood as an message-passing algorithm that can be used on an even wider class of inference and optimization problems than BP.  Although less well-known, the D&C algorithm has some notable advantages compared with BP, in that it more naturally deals with problems with continuous valued variables, or with variables that lack local evidence.  Another advantage of D&C is that its "difference-map" dynamics enables it to avoid "traps."

If time permits, I will also describe a new decoder (developed with Yige Wang and Stark Draper) for low-density parity check codes that combines the ideas of the D&C and BP algorithms. This "difference-map belief propagation" (DMBP) decoder significantly improves error-floor performance compared with state-of-the-art BP decoders, while maintaining a similar computational complexity.



April 6, 2011
12-1pm,
182 George St, Rm 110
Division of Applied Mathematics & Center for Vision Research Seminar
A High-Throughput Screening Approach to Biologically-Inspired Object Recognition
David Cox
Principal Investigator, Visual Neuroscience Laboratory
The Rowland Institute at Harvard

Biological visual systems are currently unrivaled by artificial systems in their ability to recognize faces and objects in highly variable and cluttered real-world environments. Biologically-inspired computer vision systems seek to capture key aspects of the computational architecture of the brain, and such approaches have proven successful across a range of standard object and face  recognition tasks. However, while many models of biological object recognition share a common set of "broad-stroke" properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model. Since the number of such parameters (explicit or implicit) is typically large, and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored. Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea, or because the correct "parts" have not been tuned correctly, assembled at sufficient scale, or provided with enough training.

In this talk, I'll present a high-throughput search approach for exploring the range of biologically-inspired visual models, that emphasizes simplicity, careful evaluation procedures, and scale.  Systems discovered using this approach are shown to perform surprisingly well across a range of visual problem domains and achieve state-of-the-art performance on a collection of challenging real-world face recognition datasets.  Finally, I'll discuss current limitations and future opportunities of this approach and argue for renewed attention to data set quality and model scalability.



April 20, 2011
12-1pm,
182 George St, Rm 110
Self-folding of polyhedra-- experiments and a little theory
Govind Menon
Associate Professor
Division of Applied Mathematics
Brown University

A fascinating development in materials chemistry is the construction of `large' components by self-assembly. I will describe some fascinating experiments of David Gracias' lab, and some preliminary mathematical work on mesoscale polyhedra that self-fold. Many problems are open, and the main purpose of this talk is to advertise some of them.



May 4, 2011
12-1pm,
182 George St, Rm 110
Logistic Regression on Data Streams
Kevin Kochanek
Research Mathematician
US Department of Defense

The emergence of high-volume data sources in fields such as finance, telecommunications, and bio-medicine has led to a surge of interest in streaming data analysis. In this talk we consider the problem of constructing generalized linear models on data streams. We also present a novel streaming adaptive histogramming algorithm to facilitate our approach that is also useful in other contexts.



May 11, 2011
12-1pm,
182 George St, Rm 110
Division of Applied Mathematics & Center for Vision Research Seminar
Statistical structures of natural scenes and neural ensembles activities
Tai-Sing Lee
Associate Professor
Computer Science Department and Center for the Neural Basis of Cognition
Carnegie Mellon University

Statistical structures of the natural environment could provide important clues for understanding the neural representations and functional circuitry underlying visual perception. We studied the statistical structures of 3D scenes, and evaluated whether these structures are encoded in the tuning properties and functional connectivities of neurons in the early visual cortex.  We found that information encoded in the  distribution of disparity-tuned neurons at the population level matches the statistical structures of natural scenes in a number of aspects: the distribution of tuning curves, the correlation in tunings to different visual cues, and the functional connectivity of neurons. We also found that the ensemble activities of neurons support the notion of a probabilistic population code that allows the encoding of statistical priors of the natural scenes, the uncertainty and the posterior distribution of inferred perceptual attributes.  These findings provide preliminary empirical insights on how pattern theory can be realized at the level of neuronal population and neuronal circuits.


2009-2010


September 16, 2009
12-1pm,
182 George St, Rm 110
Organizational meeting & Conditional inference for non-stationary data
Matthew Harrison
Assistant Professor
Division of Applied Mathematics
Brown University

This is a new time and a new format for the pattern theory seminar, so we will begin with an organizational meeting.  Bring ideas for things you'd like to see in our seminar series.  With the remaining time, I will discuss a semi-parametric conditional inference perspective for modeling data that have fast (and interesting) dynamics confounded with slow, non-stationary (and not so interesting) dynamics.  This is work in progress with Asohan Amarasingham and Stuart Geman.



September 23, 2009
12-1pm, 182 George St, Rm 110
Latent Socio-Spatial Process Model for Social Networks
Crystal Linkletter
Assistant Professor
Department of Community Health and Center for Statistical Sciences
Brown University

With concerns of bioterrorism, the advent of new epidemics that spread with person-to-person contact, such as SARS, and the rapid growth of on-line social networking websites, there is currently great interest in building statistical models that emulate social networks. Stochastic network models can provide insight into social interactions and increase understanding of dynamic processes that evolve through society. A major challenge in developing any stochastic social network model is the fact that social connections tend to exhibit unique inherent dependencies. For example, they tend to show a lot of clustering and transitive behavior, heuristically described as “a friend of a friend is a friend.” It might be reasonable to expect that covariate similarities, or “closeness” in social space, should somehow be related to the probability of connection for some social network data. The relationship between covariates and relations is likely to be complex, however, and may in fact be different in different regions of the covariate space. Here, we present a new socio-spatial process model that smoothes the relationship between covariates and connections in a sample network using relatively few parameters, so the probabilities of connection for a population can be inferred and likely social network structures generated. Having a predictive social network model is an important step toward the exploration of disease transmission models that depend on an underlying social network.



September 30, 2009
12-1pm, 182 George St, Rm 110
Learning maximum-entropy models of salience via EM
Micha Elsner
Graduate Student
Department of Computer Science
Brown University

Coreference resolution is the task of linking a nominal expression (such as a pronoun) to its antecedent in text or speech data. We propose a maximum entropy model which ranks the syntactic environments in which antecedents tend to occur. We learn this model in an unsupervised way using the Expectation/Maximization (EM) algorithm. As an initial application, we describe a consistent, probabilistic reformulation of the pronoun resolution model given in (Charniak+Elsner 09); we then extend this to a high-precision model predicting coreference between noun phrases with the same head word. This is joint work with Eugene Charniak and Mark Johnson; it is very much still in progress, so comments and suggestions are welcome.



October 7, 2009
12-1pm, 182 George St, Rm 110
The probabilistic language of thought
Noah Goodman
Research Scientist
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology

Logic and probability (aka. 'symbols and statistics', 'structured representation and graded inference') are key themes of cognitive science that have long had an uneasy coexistence. I will describe a Probabilistic Language of Thought approach that brings them together into compositional representations with probabilistic meaning. This provides a view of cognition in which mental representations describe causal "working models" of the world, that can be used for reasoning and learning by probabilistic inference. Using this framework I will investigate human concept learning, beginning with simple categorization tasks and extending to acquisition of abstract concepts. I will describe a model of learning the meanings of number words that predicts the staged progression exhibited by children (including the striking conceptual reorganization associated with the "cardinal principle" transition). I will then briefly describe how a probabilistic language of thought explains phenomena of human reasoning, including both classic syllogistic reasoning tasks and more complex reasoning involving causality and agency.



October 14, 2009
12-1pm, 182 George St, Rm 110
SIFT Flow: Dense Scene Alignment and Its Applications
Ce Liu
Postdoctoral Researcher
Microsoft Research New England

In this talk I will introduce dense scene alignment and how it can be applied to a number of computer vision problems varying from satellite image registration to object recognition and scene parsing. We propose SIFT flow that establishes dense, semantically meaningful correspondence between two images across scenes by matching pixel-wise SIFT features. Using SIFT flow, we develop a new framework for image parsing by transferring metadata information, such as annotation, motion and depth, from the images in a large database to an unknown query image. We demonstrate this framework using new applications such as predicting motion from a single image and motion synthesis via object transfer. Based on SIFT flow, we introduce a nonparametric scene parsing system using label transfer, with very promising experimental results suggesting that our system outperforms state-of-the-art techniques based on training classifiers.




October 21, 2009
12-1pm, 182 George St, Rm 110
Recent Advances on the Geometry of the Riemannian Manifold of Landmarks
Mario Micheli

Assistant Adjunct Professor
Department of Mathematics
University of California, Los Angeles

In the past few years there has been a growing interest, in diverse scientific communities, in endowing "shape spaces" with Riemannian metrics, so to be able to measure similarities between shapes and perform statistical analysis on data sets (e.g. for object recognition, target detection and tracking, classification, and automated medical diagnostics). The geometry of such spaces has started to emerge only very recently; in this talk we will explore the sectional curvature for the Riemannian manifold of landmark points (which is one of the simplest, in that it is finite-dimensional) and discuss its effects on applications.



October 28, 2009
12-1pm, 182 George St, Rm 110
Statistical Analysis of Climate Ecosystem Dynamics
Surajit Ray
Assistant Professor
Department of Mathematics & Statistics
Boston University

Functional approaches to modeling dynamics of biological systems, trends in financial cycle, seasonal measurements of spectral bands in remote sensing, are becoming increasingly popular as a data analysis tool.Clustering and classification is often an important final objective of functional data analysis. But most model based clustering techniques rely on the assumptions of normal or T distributions and as such are not appropriate for clustering functional data, which often lie on non-linear manifolds. In this talk we will present a novel model based clustering techniques for analyzing functional data. Our method is based on the modal clustering methodology developed by Li, Ray and Lindsay (2007), but new visualization, computational and inferential techniques need to be designed especially for functional data. The talk will also focus on parallelization of modal clustering. Application of functional clustering to remote sensing data will also be discussed.




*November 2, 2009 (Monday)
3:30-4:30 pm  [refreshments beginning at 3:15]
121 South Main St, Rm 245  
Center for Statistical Sciences / Division of Applied Mathematics Seminar
Application of Heteroskedastic Spatial Models to Computer Experiments
Richard A. Davis
Howard Levene Professor of Statistics
Department of Statistics
Columbia University

We consider modeling a deterministic computer response as a realization from a stochastic heteroskedastic process (SHP), which incorporates a spatially-correlated volatility process into the traditional spatially-correlated Gaussian process (GP) model. Unconditionally, the SHP is a stationary non-Gaussian process, with stationary GP as a special case. Conditional on a latent process, the SHP is a non-stationary GP. The sample paths of this process offer more modeling flexibility than those produced by a traditional GP, and can better reflect prediction uncertainty. GP prediction error variances depend only on the locations of inputs, while SHP can reflect local inhomogeneities in a response surface through prediction error variances that depend on both input locations and output responses.

We use maximum likelihood for inference, which is complicated by the high dimensionality of the latent process. Accordingly, we develop an importance sampling method for likelihood computation and use a low-rank kriging approximation to reconstruct the latent process. Responses at unobserved locations can be predicted using empirical best predictors or by empirical best linear unbiased predictors. Prediction error variances are also obtained. In examples with simulated and real computer experiment data, the SHP model is superior to traditional GP models. (This is joint work with Jay Breidt, Wenying Huang, and Ke Wang.)



*November 6, 2009 (Friday)
4:00-5:30 pm  [refreshments beginning at 3:45 pm]
Marcuvitz Auditiorium (Room 220), Sidney Frank Hall, 185 Meeting St.
Division of Applied Mathematics / Center for Vision Research Seminar
Understanding Visual Scenes
Antonio Torralba

Esther and Harold E. Edgerton Associate Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Human visual scene understanding is remarkable: with only a brief glance at an image, an abundance of information is available - spatial structure, scene category and the identity of main objects in the scene. In traditional computer vision, scene and object recognition are two visual tasks generally studied separately. However, it is unclear whether it is possible to build robust systems for scene and object recognition, matching human performance, based only on local representations. Another key component of machine vision algorithms is the access to data that describe the content of images. As the field moves into integrated systems that try to recognize many object classes and learn about contextual relationships between objects, the lack of large annotated datasets hinders the fast development of robust solutions. In the early days, the first challenge a computer vision researcher would encounter would be the difficult task of digitizing a photograph. Even once a picture was in digital form, storing a large number of pictures (say six) consumed most of the available computational resources. In addition to the algorithmic advances required to solve object recognition, a key component to progress is access to data in order to train computational models for the different object classes. This situation has dramatically changed in the last decade, especially via the internet, which has given computer vision researchers access to billions of images and videos. In this talk I will describe recent work on visual scene understanding that try to build integrated models for scene and object recognition, emphasizing the power of large database of annotated images in computer vision.



November 11, 2009
12-1pm, 182 George St, Rm 110
Generative Models for Image Analysis
Lo-Bin Chang

Graduate Student
Division of Applied Mathematics
Brown University

A probabilistic grammar for the grouping and labeling of parts and objects, when taken together with pose and part-dependent appearance models, constitutes a generative scene model and a Bayesian framework for image analysis. To the extent that the generative model generates features, as opposed to pixel intensities, the posterior distribution (i.e. the conditional distribution on part and object labels given the image) is based on incomplete information; feature vectors are generally insufficient to recover the original intensities. I will propose a way to learn pixel-level models for the appearances of parts.  I will demonstrate the utility of the models with some experiments in Bayesian image classification.



November 18, 2009
12-1pm, 182 George St, Rm 110
Spectral Filtering Approaches to Machine Learning
Lorenzo Rosasco
Postdoctoral Fellow
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology

In this talk we discuss a computational framework  for learning complex high dimensional data.  The proposed methods rely on estimating regularized version of data-dependent kernel matrices, where regularization is achieved filtering out the unstable components corresponding to small eigenvalues.  In many cases these matrices can be interpreted as empirical versions of underlying covariance matrices, integral operators or closely related objects, such as diffusion operators.  We will describe different strategies to implement regularization and discuss the corresponding computational properties.  The sample complexity of the methods is analyzed and shown to be optimal for a large class of problems.  Experimental results in the context of supervised learning classification and vector fields estimation will be presented.



*November 20, 2009 (Friday)
4:00-5:30 pm, 182 George St, Rm 110   [refreshments beginning at 3:45]
Applied Mathematics / Center for Statistical Sciences Seminar
Error controls for multiple hypothesis testing
Zhiyi Chi

Associate Professor
Department of Statistics
University of Connecticut

Multiple hypothesis testing is challenging when signals in data are weak, the distributions of noise and signals are not completely known, or there is statistical dependency in data.  I will present some theoretical results on these issues.  I will show that self-normalized large deviations can be useful in dealing with experiment designs for weak signals, linear programming can be used to evaluate p-values when data distributions are not completely known, and techniques similar to nonlinear filtering can be used to analyze the performance of multiple testing on hidden states of Markov models that may be nonstationary.




February 3, 2010
12-1pm,
182 George St, Rm 110
An analysis of connectivity in neuronal population recordings
Asohan Amarasingham
Postdoctoral Fellow
Center for Molecular and Behavioral Neuroscience
Rutgers University

Little is known about how neurons interact through their spike trains in natural conditions. A chief barrier is experimental: simultaneous neuronal recordings are extremely sparse. But recent advances are expanding the number of neurons that can be observed simultaneously in a behaving animal. We will describe a recent study in which we combined simultaneous recordings of medial prefrontal cortex neurons in behaving rats with careful statistical analysis to assess the evidence for functional signatures of anatomical and physiological connectivity in that neuronal population. We will use the scientific setting as a context in which to explore some basic issues in neurophysiological data analysis, with particular emphasis on spike train-resampling techniques. This is joint work with Shigeyoshi Fujisawa, Gyorgy Buzsaki, Matthew Harrison, and Stuart Geman.



Cancelled (winter storm)
February 10, 2010
12-1pm,
182 George St, Rm 110
Detailed Human Shape and Pose from Images
Alex Balan
Graduate Student
Department of Computer Science
Brown University

Automating the process of measuring human shape characteristics and estimating body postures from images lies at the core of many applications in computer vision, computer graphics, robotics and bio-mechanics. In its most general form, this problem is severely under-constrained. It can be made more tractable by employing simplifying assumptions and relying on domain specific knowledge, or by engineering the environment appropriately.

In this thesis we demonstrate that using a data-driven model of the human body supports the recovery of both human shape and articulated pose from images, and has many benefits over previous body models. Specifically, we represent the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations. We show that the parameters of the SCAPE model can be estimated directly from image data in a variety of imaging conditions.

We first consider the case of multiple calibrated and synchronized camera views and assume the subject wears tight-fitting clothing. We define a cost function between image silhouettes and a hypothesized mesh and formulate the problem as an optimization over the body shape and pose parameters. Second, we relax the tight-fitting clothing assumption and develop a robust method that accounts for the fact that observed silhouettes of clothed people do not provide tight bounds on the true 3D shape. We think of these silhouettes as providing only weak constraints, but collect many of them while observing the subject in many poses. These, together with strong constraints from regions detected as skin, can be combined with a prior expectation of typical shapes to infer the most likely shape model under the clothes. Results on a novel database of thousands of images of clothed and "naked" subjects suggest that this method may be accurate enough for biometric shape analysis from images.



*February 12, 2010 (Friday)
4:00-5:30pm, Barus & Holley 190 [refreshments begin at 3:45pm]

Division of Applied Mathematics / Center for Vision Research Seminar
Learning Hierarchies of Sparse Visual Features
Yann LeCun
Silver Professor of Computer Science and Neural Science
The Courant Institute of Mathematical Sciences and Center for Neural Science
New York University

Intelligent tasks such object recognition, auditory scene analysis, or language understanding require the construction of good internal representations of the world. Internal representations (or "features") must be invariant (or robust) to irrelevant variations of the input, but must preserve the information relevant to the task. An important goal of our research, and an important challenge for Machine Learning over the next few years, is to devise methods that can automatically learn good internal representations from labeled and unlabeled data. Theoretical and empirical evidence suggest that the visual world is best represented by a multi-stage hierarchy, in which features in successive stages are increasingly global, invariant, and abstract. The main question is how can one train such deep architectures from unlabeled data and limited amounts of labeled data.

We describe a class of methods to train multi-stage systems in which each stage performs a series of convolutions followed by simple non-linearities. The unsupervised learning phase is based on sparse coding methods, but includes a feed-forward predictor that gives a quick approximation of the sparse code. A number of such stages are stacked and trained sequentially in an unsupervised manner. The entire system is then refined in a supervised manner.

An application to category-level object recognition with invariance to pose and illumination will be described.  By stacking multiple stages of sparse features, and refining the whole system with supervised training, state-the-art accuracy can be achieved on standard datasets with very few labeled samples. A real-time demo will be shown. Another application to vision-based navigation for off-road mobile robots will be shown. After a phase of off-line unsupervised learning, the system autonomously learns to discriminate obstacles from traversable areas at long range using labels produced with stereo vision for nearby areas.

This is joint work with Y-Lan Boureau, Karol Gregor, Raia Hadsell, Koray Kavakcuoglu, and Marc'Aurelio Ranzato.



February 17, 2010
12-1pm,
182 George St, Rm 110
Message Passing Algorithms for Compressed Sensing
Andrea Montanari
Assistant Professor
Department of Electrical Engineering and Department of Statistics
Stanford University

Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics.  Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis.  Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications.

Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization.

We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models.

Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.

This is based on joint work with D.L. Donoho, A. Maleki and with M. Bayati.



March 3, 2010
12-1pm,
182 George St, Rm 110
Topic Models: Priors, Stop Words and Languages
Hanna Wallach
Senior Postdoctoral Research Associate
Department of Computer Science
University of Massachusetts Amherst

In this talk, I will compare several classes of structured priors for topic models, and show that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. This combination of priors substantially increases the robustness of topic models to the highly skewed word frequency distributions common in natural language. I will also present two topic models that cannot easily rely on standard stop word pruning techniques and therefore require such robustness. The first is a topic-based predictive language model that incorporates both n-gram statistics and latent topic variables, while the second is a polylingual topic model that discovers topics aligned across multiple languages from massive collections of interlinked documents.



March 10, 2010
12-1pm,
182 George St, Rm 110
Respondent-Driven Sampling for Networks: Degrees of Uncertainty with Uncertain Degrees
Joe Blitzstein
Assistant Professor
Department of Statistics
Harvard University

Respondent-driven sampling (RDS) is a network sampling design (introduced by Heckathorn) which is widely and increasingly being used to study properties of individuals in a social network, e.g., HIV prevalance. The standard estimators make many strong assumptions, including that the degree d of an individual in the sample is known exactly and that this individual is equally likely to recruit any of his or her d neighbors in the social network.

Classical issues of model-based vs. design-based inference arise here, with new challenges from having also to account for the network structure. We explore the bias-variance tradeoff encountered in this setting, and show how the uncertainty in the measurement of degrees propagates into uncertainty in the RDS estimates. We then show how Bayesian and importance sampling ideas can be used to give obtain more reliable estimates.



*March 12, 2010 (Friday)
4-5:30pm,
182 George St, Rm 110
Graphs and polytopes: learning structures with linear programming relaxations
Tommi Jaakkola
Professor
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Combinatorial problems are commonplace in modeling. They arise in predicting likely values for variables (e.g., selection/orientation of residues in protein design) or learning the model structure itself. In recent years, a lot of effort has gone into developing linear programming (LP) relaxations for finding the most likely values for the variables in a given model. In this talk, I will focus on the related combinatorial problem of learning the model structure, i.e., solving for the highest scoring Bayesian network (BN) graph given data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal graph structure. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. I will also illustrate the strengths and weaknesses of the approach in the context of sample structure learning problems.

The talk covers joint work with David Sontag, Amir Globerson, and Marina Meila.



March 17, 2010
12-1pm,
182 George St, Rm 110
Learning in Social Networks with Signals of Bounded Informativeness
Ilan Lobel
Postdoctoral Researcher
Microsoft Research New England

We study the (perfect Bayesian) equilibrium of a model of learning over a general social network. Each individual receives a signal about the underlying state of the world, observes the past actions of a stochastically-generated neighborhood of individuals, and chooses one of two possible actions. The stochastic process generating the neighborhoods defines the network topology (social network). We characterize pure-strategy equilibria for arbitrary stochastic and deterministic social networks and determine conditions under which there will be asymptotic learning, that is, the conditions under which, as the social network becomes large, individuals converge (in probability) to taking the right action. In previous work, we showed that when the likelihood ratio of the signals is unbounded, there is asymptotic learning when there is some minimal amount of "expansion in observations". In this talk, we show that with signals of bounded likelihood ratio, asymptotic learning is achieved if three conditions hold: expanding observations holds, there exists a subsequence of agents with unpersuasive neighborhoods and there exists a uniform lower bound on the probability of observing the history of past actions. This talk is based on joint work with Daron Acemoglu (MIT Economics), Munther Dahleh (MIT EECS) and Asu Ozdaglar (MIT EECS).



March 24, 2010
12-1pm,
182 George St, Rm 110
Unsupervised Part of Speech Tagging: From graphical models to statistical models to "biological" models
Michael Lamar
Graduate student
Division of Applied Mathematics
Brown University

A longstanding problem in computational linguistics is the automatic extraction of part of speech categories from a corpus.  We introduce this problem and discuss several purely contextual, unsupervised approaches to this problem beginning with Hidden Markov Models which, until recently, have set the standard for performance.  We then formulate two new alternative approaches.  The first is a purely statistical treatment which incorporates singular value decomposition and clustering.  The second is a self-organizing model inspired by simple biological processes.  We compare these new approaches to the HMM's and discuss why they achieve better performance.



April 7, 2010
12-1pm,
182 George St, Rm 110
Geometry of the space of 2D shapes equipped with the Weil-Petersson metric
Sergey Kushnarev
Graduate Student
Division of Applied Mathematics
Brown University

The study of planar simple closed curves (or "2D shapes") and their similarities is a central problem in the field of computer vision. It arises in the task of characterizing and classifying objects from their observed silhouette. Defining natural distance between 2D shapes creates a metric on the infinite-dimensional space of shapes. In this talk I will describe one particular metric, which comes from the conformal mapping of the 2D shapes, via the theory of Teichmuller spaces. In this space every simple closed curve (or a 2D shape) is represented by a smooth self-map of a circle.  I will talk about a specific class of soliton-like geodesics on the space of shapes, called teichons. Some numerical examples of geodesics and effects of the curvature will be demonstrated.



*April 9, 2010 (Friday)
4-5:30pm,
182 George St, Rm 110
Analysis of Molecular Networks
Mark Gerstein
Albert L. Williams Professor of Biomedical Informatics
Departments of Molecular Biophysics and Biochemistry and of Computer Science
Yale University

My talk will be concerned with understanding protein function on a genomic scale. My lab approaches this through the prediction and analysis of biological networks, focusing on protein-protein interaction and transcription-factor-target ones. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various topological statistics. In particular, I will discuss a number of recent analyses: (1) Improving the prediction of molecular networks through systematic training-set expansion; (2) Showing how the analysis of pathways across environments potentially allows them to act as biosensors; (3a) Analyzing the structure of the regulatory network indicates that it has a hierarchical layout with the "middle-managers" acting as information bottlenecks; (3b) Showing these middle managers tend be arranged in various "partnership" structures giving the hierarchy a "democratic character"; (4) Showing that most human variation occurs on the periphery of the protein interaction network; and (5) Developing useful web-based tools for the analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Analysis of Diverse Regulatory Networks in a Hierarchical Context: Consistent Tendencies for Collaboration in the Middle Levels. N Bhardwaj et al. PNAS (2010, in press)

Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context.  PM Kim, JO Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of Biological Networks from Limited and Uneven Reliable Interactions.  KY Yip, M Gerstein (2008) Bioinformatics

Quantifying environmental adaptation of metabolic pathways in metagenomics. T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein (2009) PNAS




April 14, 2010
12-1pm,
182 George St, Rm 110
Preserving knowledge through media transitions: ushering the heritage of India into the digital age
Peter Scharf
Director of The Sanskrit Library and Senior Lecturer in Sanskrit
Department of Classics
Brown University

Human beings express knowledge in various modes: through images in visual art; through movement in dance, theatrical performance, and gestures; and through speech in spoken language.  Each of these means of expression includes means to encode knowledge, and each is used to express knowledge originally encoded in one of the others.  In particular, writing represents knowledge originally encoded in speech.  Certain media dominate as the primary methods for the transmission of detailed information at different times and places.  Oral tradition dominated the tradition of Sanskrit in India in the first and second millennia B.C.E.  Writing overtook orality in the first millennium C.E. and dominated until replaced by printing in  the 19th century.  At the dawn of the 21st century, the digital medium is replacing printing as the dominant means of knowledge transmission.

While brittle editions of printed Sanskrit books are removed from easy access to library annexes, the Sanskrit Library is working to usher the heritage of India into the digital age.  Scharf designed an encoding scheme to represent Sanskrit accurately and revised the Unicode Standard to include characters necessary for the representation of the ancient Vedic texts.  He directed an NSF-funded project 2006-2009 to build a digital Sanskrit library that integrates lexical resources, linguistic software, and machine-readable texts.  Now he directs an NEH-funded project to produce high-quality digital images of manuscripts at Brown University and the University of Pennsylvania and to develop word-spotting technology to facilitate searching even in digital images not amenable to optical character recognition (OCR).



*April 16, 2010 (Friday)
4-5:30pm,
182 George St, Rm 110
Division of Applied Mathematics / Center for Statistical Sciences Seminar

Multi-Resolution Inference of Stochastic Models from Partially Observed Data
Samuel Kou
Professor
Department of Statistics
Harvard University

Stochastic models, diffusion models in particular, are widely used in science, engineering and economics. Inferring the parameter values from data is often complicated by the fact that the underlying stochastic processes are only partially observed. Examples include inference of discretely observed diffusion processes, stochastic volatility models, and double stochastic Poisson (Cox) processes. Likelihood based inference faces the difficulty that the likelihood is usually not available even numerically. Conventional approach discretizes the stochastic model to approximate the likelihood. In order to have desirable accuracy, one has to use highly dense discretization. However, dense discretization usually imposes unbearable computation burden. In this talk we will introduce the framework of Bayesian multi-resolution inference to address this difficulty. By working on different resolution (discretization) levels simultaneously and by letting the resolutions talk to each other, we substantially improve not only the computational efficiency, but also the estimation accuracy. We will illustrate the strength of the multi-resolution approach by examples.



April 21, 2010
12-1pm,
182 George St, Rm 110
Segmentation of Image Ensembles via Latent Atlases
Tammy Riklin-Raviv
Postdoctoral Associate
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

The images acquired via medical imaging modalities are frequently subject to low signal-to-noise ratio, bias field and partial volume effects. These artifacts, together with the naturally low contrast between image intensities of some neighboring structures, make the extraction of regions of interest (ROIs) in clinical images a challenging problem.  Probabilistic atlases, typically generated from comprehensive sets of manually labeled examples, facilitate the analysis by providing statistical priors for tissue classification and structure segmentation. However, the limited availability of training examples that are compatible with the images to be segmented renders the atlas-based approaches impractical in many cases.  In the talk I will present a generative model for joint segmentation of corresponding regions of interest in a collection of aligned images that does not require labeled training data. Instead, the evolving segmentation of the entire image set supports each of the individual segmentations. This is made possible by iteratively inferring a subset of the model parameters, called the spatial parameters, as part of the joint segmentation processes. These spatial parameters are defined in the image domain and can be viewed as a latent atlas. Our latent atlas formulation is based on probabilistic principles, but we solve it using partial differential equations and energy minimization criteria. We evaluate the method successfully for the segmentation of cortical and subcortical structures within different populations and of brain tumors in a single-subject multi-modal longitudinal experiment.



*April 23, 2010 (Friday)
4-5:30pm,
182 George St, Rm 110
A Generative Model Approach to Fraud Detection
Brian Lucena
Chief Mathematician
Guardian Analytics Inc.

Abstract: Standard discriminative approaches rely on seeing lots of examples of both classes in order to distinguish between the two.  However, in the specific case of online banking fraud, these techniques are hampered by two major practical concerns.  First, there is a huge asymmetry in the amount of data available about each class – hundreds of millions of legitimate transactions and perhaps a few hundred fraudulent ones.  Second, fraudsters quickly change their tactics, and so even those few data points on fraud may not be very useful in predicting future fraud.  Consequently, we take a model-based approach to exploit the natural differences between individual users.  This talk will describe the FraudMAP system and discuss several interesting problems that arise in its development.



*May 10, 2010 (Monday)
4:00-5:30pm,
182 George St, Rm 110
Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes
Erik Sudderth
Assistant Professor
Department of Computer Science
Brown University

We explore statistical frameworks for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases.  Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman-Yor (PY) process.  This generalization of the Dirichlet process leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image.  Generalizing previous applications of PY priors, we use non-Markov Gaussian processes to infer spatially contiguous segments which respect image boundaries.  Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes.




[ back to main page ]