Statistics of Natural Images & Clutter

The Statistics of Images and Models for Clutter: a basis for evaluating the Power of Decision Methods

by David Mumford

In his first papers setting up information theory, Shannon used empirical data and tabulated statistics for strings of characters encountered in english text, arriving at an estimate of the entropy of english. For analyzing real valued signals, however, Gaussian models, which ignored all statistics except for the power spectrum, were used almost exclusively until very recently. In some engineering applications, colored noise Gaussian models are reasonable approximations to the truth. Ten years ago, however, David Field, noted that differences of pixel values in images have distributions with large kurtosis and hence are highly non-Gaussian. This has been the starting point for major new theories of "sparse coding" and image compression. But even after a decade of work in this direction, a full description of the nature of local image statistics, i.e. of the joint distribution of pixels in a small neighborhood, is still missing.

Such a description is needed in the design and evaluation of object recognition algorithms. To recognize any object, one needs to know what images of that object typically look like; but one also needs to know what the rest of scene is looks like in order to know the probability that a part of the image represents the object and not the background. Object recognition is an example of hypothesis testing and the power of a test is determined by the tradeoff between false alarms in the background and missed correct examples (errors of type I and II respectively). Assuming the background is an image is given by a typical Gaussian noise model always leads to large underestimates of the probability of false alarms.

A basic reason why modeling backgrounds is hard is that they are virtually always filled with "clutter". This is a consequence of the second fundamental observation about image statistics: they are very nearly scale invariant. This means that if we look at a reasonably diverse sample of images made up of arrays of 2n by 2n pixels, then there is no difference between the statistics of the n by n images obtained i) by looking at windows in the large images and ii) by averaging the large images in disjoint 2 by 2 blocks. A Corollary of this scale invariance is that images are filled with objects of all sizes, with more smaller ones than larger ones and the total effect is summarized by the word "clutter". In order to evaluate error rates of types I (false alarms), we need a good model of clutter. This model must be more sophisticated than simple marginals on wavelet coefficients. It must incorporate all local structure which may accidentally combine into a structure resembling the sought-for object.

The metric pattern theory group has been pursuing as one of its primary objectives the creation of such models. We have two large databases of calibrated natural images, courtesy of Prof. van Hateren and of British Aerospace. One aspect of our research has been the empirical mining of these databases. A second thrust has been the study of synthetic image models obtained by Poisson processes of elementary objects varying in position, scale, color, etc. A third has been the theoretical analysis of image models using infinite divisibility and using maximum entropy models based on filter responses. Preprints on all these thrusts are listed below.