2008 Bahadur Lectures,
Rare Events in the Financial Markets
When it
comes to the prices of stocks and other securities, it seems that rare events are never rare enough. But
they are too rare for meaningful statistical study. In order to test financial
models of price fluctuations, focused on excursions, I will side step the issue
of small samples by declaring an event rare if it is unusual relative to the interval of observation.
Every interval has its own rare events, by fiat, and in fact as many as we
need. Different classes of models have
different invariants to the timings of these rare
events. These invariants open the door
to combinatorial-type hypothesis tests, under which many of the usual models do
not hold up very well. I will give evidence for very rapidly changing dynamics
and discuss the implications for model building.
On the Peculiar Statistics of Natural Images
Take a
digital photo of a natural outdoor scene. For simplicity, convert the photo
from color to black and white. The photo can be reduced, or scaled, to make a
new (smaller) picture, say half the size in both dimensions. The new picture is
of a scene in which each of the original objects, and in fact every imaged
point, has been relocated twice as far from the camera. This stretching is
artificial in that it does not correspond to any movement of the camera in the
real world. Yet the picture looks perfectly normal, and the local spatial
statistical structure (e.g. the distribution of values of horizontal or
vertical derivatives) is largely indistinguishable from the local spatial
statistical structure of the original. Images of
natural scenes are scale invariant. On the
other hand, mathematical models of images, or more generally of spatial
processes, are never scale invariant unless they are trivial (constant gray
level, i.e. blank pictures) or exotic (lacking a direct definition in terms of
image intensities). The source of scale invariance in natural images is an
enduring mystery. I will propose some explanations and make some connections to
perception and image coding.
Remarks on Learning Theory and Compositional Vision
Google
and the Vapnik-Chervonenkis Dimension
Google
engineers routinely train query classifiers, for ranking advertisements or
search results, on more words than any human being sees or hears in a
lifetime. A human being who sees a
meaningfully new image every second for one-hundred years will not see as many
images as Google has in its libraries, all of which are available for training
object detectors and image classifiers.
Yet by human standards the state-of-the-art, in computer understanding
of language and computer-generated image analysis, is primitive. What explains the gap? Why cant learning theory tell us how to
make machines that learn as efficiently as humans? Upper bounds on the number of training
samples needed to learn a classifier as rich and
competent as the human visual system can be derived using the Vapnik-Chervonenkis dimension, or the metric entropy, but
these suggest that not only does Google need more examples, but all of
evolution might fall short. I will make
some proposals for efficient learning and offer some mathematics to support
them.
Recent Papers
T.-L. Chen and S. Geman. On the Minimum Entropy of a Mixture of Unimodal and Symmetric Distributions. IEEE Trans. Inf. Theory, 54(7), 2008, 3166-3174. (pdf)
(An
obvious and much-used result is surprisingly hard to prove. We used Hardy and Littlewood's
``rearrangement of functions" to turn unimodal
densities into monotone densities, which are much easier to work with.)
S. Geman, A. Amarasingham, M.T. Harrison, and N. Hatsopoulos. The Statistical Analysis of Temporal Resolution in the Nervous System. (pdf)
(When it
comes to neurophysiological data, there's no such
thing as ``repeated trials," at least not in any of the usual statistical
senses. This makes for some unique
challenges. We propose a suite of model
and estimation-free statistical methods for exploring the much-debated issue of
effective resolution of spike timing in the nervous system.)
M.T. Harrison and S. Geman. A Rate and History-Preserving Resampling Algorithm for Neural Spike Trains. Neural Computation, Vol. 21(5), 2009, 12441258. (pdf)
(From the
same suite: a way to produce surrogate spike trains that preserve, exactly,
observed inter-spike intervals up to any specified amount of time. This corrects for absolute and relative
refractory periods, bursting, and all other possible spike-to-spike
interactions across a chosen temporal extent.
Arbitrary statistics (e.g. number of synchronies) can then be calibrated
from the ensemble of surrogate spike trains.)
L.-B. Chang and S. Geman. Stock prices and the peculiar statistics of large returns. (submitted for publication) 2009. (pdf)
(Theories and
models about market dynamics can be critically examined using methods from
conditional inference. Some explanations
for volatility clustering and models for stochastic volatility do not hold up
to the data.)