Work on Segmentation and Parsing of Images

To reconstruct a 3D world from a 2D image, the most obvious need is to find the distinct objects in the image, that is to decompose the domain of the image into pieces, each showing part of the surface of a different object. In many situations, you can expect that each object has distinctive brightness, color and texture, that this brightness, color and texture vary smoothly across its surface and that finally this brightness, color and texture will change abruptly at the edges, that is the boundaries between these pieces.

For example, the cobblestone paving in the background to this page breaks up into two regions with quite distinct textures, where the stones are larger or smaller. This was the idea behind my work with Jayant Shah (using here only brightness and neither color nor texture). We translated the above characterization into a variational problem and analyzed it as best we could. The variational problem turned out to be identical to one Di Giorgi has proposed for fractured solids, but it is also too simple for real vision use. However the technique can be adapted to many situations and made into an effective tool if used properly (faster algorithms also had to be discovered). Some of this is discussed in Ch.4 of Agnes's and my book. A suitable variant seems to work very well with stereo vision, using 'disparity' (displacement between left and right eye images) jumps to locate edges.

But does an image really have one best segmentation? In fact, objects typically have parts or other objects attached or laid or even painted on them. A more complete model of the objects in an image puts them in a parse tree, a grammatically constrained structure with vertical edges connecting parts to whole. This is also discussed in Ch.3 of Agnes's and my book and in my monograph with Song-Chun Zhu (last item below).

Boundary Detection by Minimizing Functionals I (with J. Shah), in Image Understanding 1989, Ablex Press, preliminary version in 1985 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1985. Scanned reprint.
Optimal Approximations of Piecewise Smooth Functions and Associated Variational Problems (with J. Shah), Comm. in Pure and Appl. Math., 1989, 42, pp.577-685.
Scanned reprint and DASH reprint.
The 2.1D Sketch (with M. Nitzberg), in Proc. of 3rd IEEE International Conference on Computer Vision (ICCV), 1990, pp.138-144.
Scanned reprint and DASH reprint.
Texture Segmentation by Minimizing Vector-Valued Energy Functionals: the coupled membrane model (with Tai Sing Lee and Alan Yuille), Proc. European Conf. Comp. Vision, 1992, Lecture Notes in Computer Science 588, pp. 165-173. Scanned reprint.
A Bayesian Treatment of the Stereo Correspondence Problem Using Half-Occluded Regions, (with P. Belhumeur), Proc. IEEE Conf. Comp. Vision and Pattern Recognition, 1992 (CVPR), pp. 506-512.
Scanned reprint and DASH reprint .
Filtering, Segmentation and Depth, (with Mark Nitzberg and Takahiro Shiota), Springer Lecture Notes in Computer Science 662, 1993.
Chordal completions of planar graphs (with F.R.K. Chung), J. of Combinatorics, {62, 1994, pp.96-106. Scanned reprint.
Review of Variational Methods in image segmentation, by J-M Morel & S. Solimini, Bull. Amer. Math. Soc., 33, 1996, 211-216. Digital reprint.
A stochastic grammar of images (with Song-Chun Zhu), Foundations and Trends in Computer Graphics and Vision, 2, 2007, pp. 259-362.
Digital reprint and DASH reprint.

David Mumford

Blog and Archive for Reprints, Notes, Talks

Work on Segmentation and Parsing of Images