What are Textons?

Song-Chun Zhu, Cheng-en Guo, Yingnian Wu, and Yizhou Wang
Textons refer to fundamental micro-structures in generic natural images and thus constitute the basic elements in early (pre-attentive) visual perception. However, the word “”texton”” remains a vague concept in the literature of computer vision and visual perception, and a precise mathematical definition has yet to be found. In this article, we argue that the definition of texton should be governed by a sound mathematical model of images, and the set of textons must be learned from, or best tuned to, an image ensemble. We adopt a generative image model that in image is a superposition of bases from an over-complete dictionary, then a texton is defined as a mini-template that consists of a varying number of image bases with some geometric and photometric configurations. By analogy to physics, if image bases are like protons, neutrons and electrons, then textons are like atoms. Then a small number of textons can be learned from training images as repeating micro-structures. We report four experiments for comparison. The first experiment computes clusters in feature space of filter responses. The second use transformed component analysis in both feature space and image patches. The third adopts a two-layer generative model where an image is generated by image bases and image bases are generated by textons. The fourth experiment shows textons from motion image sequences, which we call movetons.
2002-09-01