Video and Image Analysis Using Local Information

Xiaochen Lian
Ph.D., 2017
Alan Loddon Yuille
Local information is very crucial in many image and video analysis tasks. In this thesis, we introduce four representative works in exploiting local information. We first introduce a set of per-pixel labeling datasets, which provide a good platform for studies of using local information in image analysis. Based on this dataset, we propose a novel segmentation method which utilizes local appearance consistency for car semantic part parsing task. We then address the attention issue in video action recognition tasks, by designing a latent attention module, which is jointly learned with video recognition components. Last, we improve the attention mechanism to explicitly detect spatial and spatio-temporal regions that are related to actions (ROIs).