2008 Bahadur Lectures, University of Chicago

 

          Rare Events in the Financial Markets

                  

When it comes to the prices of stocks and other “securities,” it seems that rare events are never rare enough. But they are too rare for meaningful statistical study. In order to test financial models of price fluctuations, focused on excursions, I will side step the issue of small samples by declaring an event “rare” if it is unusual relative to the interval of observation. Every interval has its own rare events, by fiat, and in fact as many as we need.  Different classes of models have different invariants to the timings of these “rare” events.  These invariants open the door to combinatorial-type hypothesis tests, under which many of the usual models do not hold up very well. I will give evidence for very rapidly changing dynamics and discuss the implications for model building.

(pdf)

 

 

          On the Peculiar Statistics of Natural Images

 

Take a digital photo of a natural outdoor scene. For simplicity, convert the photo from color to black and white. The photo can be reduced, or scaled, to make a new (smaller) picture, say half the size in both dimensions. The new picture is of a scene in which each of the original objects, and in fact every imaged point, has been relocated twice as far from the camera. This “stretching” is artificial in that it does not correspond to any movement of the camera in the real world. Yet the picture looks perfectly normal, and the local spatial statistical structure (e.g. the distribution of values of horizontal or vertical derivatives) is largely indistinguishable from the local spatial statistical structure of the original. “Images of natural scenes are scale invariant.” On the other hand, mathematical models of images, or more generally of spatial processes, are never scale invariant unless they are trivial (constant gray level, i.e. blank pictures) or exotic (lacking a direct definition in terms of image intensities). The source of scale invariance in natural images is an enduring mystery. I will propose some explanations and make some connections to perception and image coding.

(pdf)

 

 

Remarks on Learning Theory and Compositional Vision

 

Google and the Vapnik-Chervonenkis Dimension

 

Google engineers routinely train query classifiers, for ranking advertisements or search results, on more words than any human being sees or hears in a lifetime.  A human being who sees a meaningfully new image every second for one-hundred years will not see as many images as Google has in its libraries, all of which are available for training object detectors and image classifiers.  Yet by human standards the state-of-the-art, in computer understanding of language and computer-generated image analysis, is primitive.  What explains the gap?   Why can’t learning theory tell us how to make machines that learn as efficiently as humans?  Upper bounds on the number of training samples needed to learn a classifier as rich and competent as the human visual system can be derived using the Vapnik-Chervonenkis dimension, or the metric entropy, but these suggest that not only does Google need more examples, but all of evolution might fall short.  I will make some proposals for efficient learning and offer some mathematics to support them.

(pdf)

 

 

Recent Papers

 

T.-L. Chen and S. Geman.  On the Minimum Entropy of a Mixture of Unimodal and Symmetric Distributions. IEEE Trans. Inf. Theory, 54(7), 2008, 3166-3174. (pdf)

 

(An obvious and much-used result is surprisingly hard to prove.  We used Hardy and Littlewood's ``rearrangement of functions" to turn unimodal densities into monotone densities, which are much easier to work with.)

 

 

S. Geman, A. Amarasingham, M.T. Harrison, and N. Hatsopoulos.  The Statistical Analysis of Temporal Resolution in the Nervous System. (pdf)

 

(When it comes to neurophysiological data, there's no such thing as ``repeated trials," at least not in any of the usual statistical senses.  This makes for some unique challenges.  We propose a suite of model and estimation-free statistical methods for exploring the much-debated issue of effective resolution of spike timing in the nervous system.)

 

 

M.T. Harrison and S. Geman. A Rate and History-Preserving Resampling Algorithm for Neural Spike Trains.  Neural Computation, Vol. 21(5), 2009, 1244–1258. (pdf)

 

(From the same suite: a way to produce surrogate spike trains that preserve, exactly, observed inter-spike intervals up to any specified amount of time.  This corrects for absolute and relative refractory periods, bursting, and all other possible spike-to-spike interactions across a chosen temporal extent.  Arbitrary statistics (e.g. number of synchronies) can then be calibrated from the ensemble of surrogate spike trains.)

 

L.-B. Chang and S. Geman. Stock prices and the peculiar statistics of large returns.  (submitted for publication) 2009. (pdf)

 

(Theories and models about market dynamics can be critically examined using methods from conditional inference.  Some explanations for volatility clustering and models for stochastic volatility do not hold up to the data.)