My group is currently studying statistical methods that are agnostic about certain aspects of the underlying distribution of the data. This seems increasingly desirable in many modern problems because of the inherent difficulties that arise when modeling high-dimensional distributions. Applications in neuroscience are an area of special focus. Here we would like statistical methods that can resolve scientific questions about processes operating on a given time scale, while being agnostic about processes operating at different time scales.
Some of the interesting side projects that have arisen along the way include sampling algorithms for graphs with specified degree sequences (or tables with specified margins), a simple procedure to provably control the type I error rate of Monte Carlo hypothesis tests based on importance sampling, and improved algorithms for brain-machine interfaces for people with tetraplegia.
Dirichlet process mixtures (DPMs) are often applied when the data is assumed to come from a mixture with finitely many components, but one does not know the number of components s. In many such cases, one desires to make inferences about s, and it is common practice use the posterior distribution on the number of components occurring so far. It turns out that this posterior is not consistent for s. That is, we have proven that given unlimited i.i.d. data from a finite mixture with s0 components, the posterior probability of s0 does not converge to 1. The same result holds for Pitman-Yor process mixtures, and many other related nonparametric Bayesian priors. Motivated by this finding, we examine an alternative approach to Bayesian nonparametric mixtures, which we refer to as a mixture of finite mixtures (MFM). In addition to being consistent for the number of components, MFMs are very natural and possess many of the attractive features of DPMs, including: efficient approximate inference (with MCMC), consistency for the density (at the optimal rate, under certain conditions), and appealing equivalent formulations ("restaurant process", distribution on partitions, stick-breaking, and random discrete measures). Our findings suggest that MFMs are preferable to DPMs when the data comes from a finite mixture.
Here's a slide from Jeff's ICERM presentation comparing inference in a toy example. The posterior (blue curve) is not concentrating around the true number of components (s0=5) for the DPM (left), but it is for the MFM (right). More data does not change things -- the DPM posterior is not consistent, but the MFM posterior is consistent. Forcing the DPM prior to remain stable by varying the concentration parameter with the amount of data does not help (not shown). Pruning small components does appear to help (not shown), but we do not have a proof, and using MFMs instead of DPMs seems more natural in this case.
We are studying the conditional distribution of independent integer-valued random variables arranged in a matrix given the sequences of row and column sums (the margins). An important special case is the uniform distribution over binary matrices with specified margins.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0
1 1 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 1 0 0 0
1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 0 1 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In many cases p-values for a hypothesis test can be approximated by Monte Carlo methods. The resulting approximate p-values are just that: approximate. They need not control the type I error rate at the nominal level. There are two cases where this is a big deal: (1) if the Monte Carlo sampling algorithms have high variability, so that the approximation errors could be large; or (2) if one wishes to adjust the p-values to account for multiple hypothesis tests, which usually causes the approximation errors to get greatly magnified. This work shows how to easily create Monte Carlo p-values using importance sampling that, although still approximations of the original p-value, nevertheless control the type I error rate at the nominal level. I would argue that this technique should always be used when reporting p-values based on importance sampling. Interesting applications include (1) the construction of Monte Carlo confidence intervals that preserve the nominal coverage probabilities, and (2) accelerated Monte Carlo multiple testing, including neuroscience applications.
The spiking electrical activity of many synaptically coupled neurons can reveal evidence about their underlying circuitry. Statistical approaches to inferring this circuitry are hampered by the myriad of other processes that influence neural firing. We are exploring conditional inference as a promising tool for addressing these challenges. This is joint work with CCNY Professor Han Amarasingham and in collaboration with Professor Gyuri Buzsaki's neurophysiology lab at NYU and Professor Shige Fujisawa's neurophysiology lab at RIKEN. The figure below shows an example network inferred from data recorded from the prefrontal cortex of an awake, behaving rodent (triangle=excitatory; circle=inhibitory; square=unknown).
This has been a research topic of mine for many years, beginning (for me) with parts of my PhD dissertation under the supervision of Stu Geman and in collaboration with Han Amarasingham (also a former student of Stu Geman). The idea has matured significantly in recent years and is a direct precursor to our work on inferring lag-lead relationships in spike trains. The gist of the idea is summarized in this figure:
The original spike train data is X, with neurons i and j, and each 1 ms time bin records the number of spikes in that time bin. We can easily see that neuron j tends to fire 2 ms (time bins) after neuron i. If we had used 4 ms time bins (shown in S), then that information about precise spike timing appears to be obscured. However, before asserting that the relationship between the spikes is precisely timed, we should check to make sure that S doesn't encode this apparently precise relationship between between i and j. Of all possible spike trains that are consistent with S, what fraction have most of neuron j's spikes following exactly 2 ms after neuron i's spikes? If this fraction is small, then we would conclude the spikes are indeed precisely timed. But if this fraction is large, then we prefer to withhold judgement: the apparent precise timing can be easily explained by coarser modulations in firing rate. The name "jitter" comes from the observation that a randomly chosen spike train subject to S can be created by randomly perturbing (or jittering) the original spike times within their respect coarse time bins. Much of our original work was motivated by neurophysiological questions related to synchrony, i.e., zero-lag alignments of spikes, instead of the nonzero lags in the above example.
Last updated January 28, 2013
© Matthew Harrison
Any opinions, findings, conclusions or recommendations expressed in this material or material obtained from this website are solely those of the author and do not necessarily reflect the views of Brown University, NSF, DARPA, NIH or any other sponsor.