STUART GEMAN | James Manning Professor of Applied Mathematics
Learning in biological systems, measured by performance as a function of the number of training samples, is strikingly efficient when compared to artificial systems. These observations apply equally to individuals (children learn to recognize tens-of-thousands of categories in their first eight years) and to species (evolution outpaces our best models of selection and fitness). A prototype problem is computer vision. Humans outperform computers despite computer-vision training sets with far more examples than any human being will see in a lifetime. My hypothesis is that the dual principles of re-usability and hierarchy, or what cognitive scientists call compositionality, form the foundation for efficient learning in biological systems. Re-usability and hierarchy are prominent architectural themes of the world around us, and it is logical that they would form the basis for our internal generative representations ("the mind's eye") as well. Using the tools of probability modeling and statistical inference, I study the implications of these ideas for representation and computation in the micro-circuitry of the brain as well as their applications to artificial vision systems.
Statistical Analysis of Neurophysiological Data
The statistical analysis of neuronal data in awake animals presents unique challenges. The status of tens of thousands of pre-synaptic neurons, not directly influenced by the experimental paradigm, is largely out of the control of the experimenter. Time-honored assumptions about “repeated” samples are untenable. These observations essentially preclude the gathering of statistical evidence for a lack of precision in the cortical micro circuitry, but do not preclude collecting statistical evidence against any given limitation on precision or repeatability. Statistical methods are being devised to support the systematic search for fine-temporal structure in stable multi-unit recordings.
Neural Representation and Neural Modeling
We can imagine our house or apartment with the furniture rearranged, the walls repainted, and the floors resurfaced or re-covered. We can rehearse a tennis stroke, review a favorite hike, replay a favorite melody, or recall a celebrated speech “in our mind’s eye,” without moving a limb or receiving a sound. It is a mistake to model cortical function without acknowledging the cortical capacity for manipulating structured representations and simulating elaborate motor actions and perceptual stimuli.
It is tempting to model networks of neurons as networks of integrate-and-fire units, but integration is linear and overwhelming evidence demonstrates the highly nonlinear, and in fact space and time-dependent nature, of dendritic processing. An argument can be made that these nonlinearities, by their nature, promote a rich and local-correlative structure, as anticipated by Abeles, von der Malsburg and others, within the microcircuits. These spatio-temporal patterns, with their correlation-induced topologies, would be good candidates for the basic units of cognitive processing.
Statistical Analysis of Natural Images
Take a digital photo of a natural outdoor scene. For simplicity, convert the photo from color to black and white. The photo can be reduced, or scaled, to make a new (smaller) picture, say half the size in both dimensions. In comparison to the original picture, the new picture is of a scene in which each of the original objects, and in fact every imaged point, has been relocated twice as far from the camera. This “stretching” is artificial in that it does not correspond to any movement of the camera in the real world. Yet the picture looks perfectly normal, and the local spatial statistical structure (e.g. the distribution of values of horizontal or vertical derivatives) is almost indistinguishable from the local spatial statistical structure of the original. “Images of natural scenes are scale invariant.” The source of scale invariance in natural images is an enduring mystery.
Timing and Rare Events in the Markets
When it comes to the prices of stocks and other securities, it seems that rare events are never rare enough. But they are too rare for meaningful statistical study. In order to test financial models of price fluctuations, focused on excursions, the issue of small samples can be side stepped by declaring an event “rare” if it is unusual relative to the interval of observation. Every interval has its own rare events, by fiat, and in fact as many as we need. Different classes of models have different invariants to the timings of these “rare” events. These invariants open the door to combinatorial-type hypothesis tests, under which many of the usual models do not hold up very well.
Summary: The “ROC gap,” that separates biological from machine vision performance is largely due to the problem of reusability — parts and subparts of objects of interest form parts and subparts of “background” objects. Hierarchical models can avoid most false detections by explaining background in terms of the parts and subparts of the objects of interest. In hierarchical models, objects come equipped with their own background models.
Summary: The important questions are about structure and representation, not about learning per se.
Summary: There is no such thing as a repeated trial in cortical neurophysiology; hence we can test for an excess, but never a lack, of precision.
Summary: Models of price trajectories should fit observed trajectories; it is not enough to fit the marginal distributions on prices and returns. But market mechanics are nonstationary at large time scales and market volatility fluctuates at extremely short time scales, both of which make testing for fit a challenging statistical problem. There are striking empirical invariants to time scale, which can be used to devise statistical tests for a variety of models of price movement.
Summary: Some tutorials, some results about estimation, some extensions of probabilistic context-free grammars to context-sensitive grammars.
Summary: Some ideas about using Markov random fields for Bayesian image analysis, and Monte Carlo methods for computing, including a first proof of convergence for simulated annealing.
Summary: A straight-forward look at dynamic programming on general dependency graphs, with applications in image processing and algebraic coding.
Summary: A theoretical justification of the much-used mode estimator in predictive coding.
Summary: Natural images scale because the world is flat.
Summary: Some asymptotic results on non-parametric (tabula rasa) estimation, mostly using metric entropy and Grenander's Method of Sieves. A proof of the (sometimes) consistency of "ordinary cross validation."
Summary: First strong limits for the norm and spectral radius of random matrices; applications to regular behavior in random systems, such as (near) limit cycles in a high-dimensional dynamical system with random coefficients.
Division of Applied Mathematics - Brown University - Providence - Rhode Island 02912