Learning Descriptive and Generative Models with Short-Run MCMC
Erik Lennart Nijkamp
PhD, 2021
Zhu, Song-Chun
What is vision? The mystery of how the visual cortex extracts abstract concepts from a plethora of visual sensory stimuli has captivated pioneers such as Herrmann von Helmholtz and David Marr for the past century. \textit{Helmholtz} states, what we see is the solution to a computational problem; our brains compute the most likely causes for the photon absorptions within our eyes. In his monumental work “Vision'', \textit{Marr} conceptualizes the process of vision as a set of representations, starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment. \textit{David Bryant Mumford} proposes hierarchical Bayesian inference as a means to understand the visual cortex. In the context of predictive coding theory, Mumford argues that the function of the hierarchical structure in the cortex is to reconcile representations and predictions of sensory stimuli at multiple levels. The assumption is that the dynamics of neural activity is guided towards minimizing the discrepancy or error between the input representation at each level and the prediction originating from a higher-level representation. \textit{Song-Chun Zhu} and \textit{Ying Nian Wu} propose a holistic realization of Marr's paradigm with rigorous statistical modeling in their work “Computer Vision – Statistical Models for Marr's Paradigm''.
2021