Vision Introductory Comments

What is "vision"? It is not usually considered as a standard field of applied mathematics but in the last few decades it has assumed an identity of its own as a multi-disciplinary area drawing in engineers, computer scientists, statisticians, psychologists and biologists as well as mathematicians. For me, its importance is that it is a point of entry into the larger problem of the scientific modeling of thought and the brain. Vision is a cognitive skill that, on the one hand, is mastered by many lower animals while, on the other hand, has proved very hard to duplicate on a computer. This level of difficulty makes it an ideal test bed for theorizing on the subtler talents manifested by humans.

What are the key mathematical tools appropriate for modeling the brain and cognitive skills? I have argued that these are the use of Bayesian inference based on graphical structures inferred from patterns observed in the world. These ideas are called "Pattern Theory" which is based on the ideas of Ulf Grenander and his school at Brown, starting in the 70's. His aim was to analyze from a statistical point of view the patterns in all 'signals' generated by the world, whether they be images, sounds, written text, DNA or protein strings, spike trains in neurons, time series of prices or weather, etc. The sunflower in the background here is a classic example of startling patterns in nature. The theory seeks classes of stochastic models which can capture all these patterns that we see in nature along with their natural variability, so that random samples from these models have the same 'look and feel' as the samples from the world itself. Then the detection of patterns in noisy and ambiguous signals can be achieved by the use of Bayes's rule, a method that can be described as 'analysis by synthesis'. We describe these ideas in more detail on the page "Pattern Theory".

Vision has proved to be a perfect area in which to develop the ideas of Pattern Theory. In vision, a 3D world is observed in highly variable illumination and in a 2D projection onto the retina or camera sensors. Its objects have highly variables shapes and surface coloring all of which modify greatly the ultimate image. Unwinding all these intertwined factors in order to infer the nature of world around you has proved to be very challenging and, as of this writing, is only partially solved.

Below, a photo of my inspiration, Ulf Grenander, at his summer house in Sweden. He was the first to understand that Bayesian inference and graphical models were the best mathematical tools with which to model virtually all cognitive processes.

Ulf Grenander

David Mumford

Blog and Archive for Reprints, Notes, Talks

Vision Introductory Comments