Gaining Justified Human Trust by Improving Explainability in Vision and Language Reasoning Models
Arjun Reddy Akula
PhD, 2021
Zhu, Song-Chun
In recent decades, artificial intelligence (AI) systems are becoming increasingly ubiquitous from low risk environments to high risk environments such as chatbots, medical-diagnosis and treatment, self-driving cars, drones and military applications. However understanding the behavior of AI systems built using black box machine learning (ML) models such as deep neural networks remains a significant challenge as they cannot explain why they reached a specific recommendation or a decision. Explainable AI (XAI) models, through explanations, address this issue by making the underlying inference mechanism of AI systems transparent and interpretable to expert users (system developers) and non-expert users (end-users). Moreover, as the decision making is being shifted from humans to machines, transparency and interpretability achieved with reliable explanations is central to solving AI problems such as safely operating self-driving cars, detecting and mitigating bias in machine learning (ML) models, increasing justified human trust in AI models, efficiently debugging models, and ensuring that ML models reflect our values. In this thesis, we propose new methods to effectively gain human trust in vision and language reasoning models by generating adaptive and human understandable explanations and also by improving interpretability, faithfulness, and robustness of the existing models. Specifically, we make the following four major contributions: (1) First, motivated by Song-Chun Zhu's work on generating abstract art from photographs, we pose explanation as a procedure/path to explain the image interpretation, i.e. a parse graph. Also, in contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine. In other words, these explicit mental representations in ToM are incorporated to learn an optimal explanation path that takes into account human's perception and beliefs. We call this framework X-ToM; (2) We propose a Conceptual and Counterfactual Explanation framework, which we call CoCo-X, for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCo-X model explains decisions made by a CNN using fault-lines; (3) In addition to proposing explanation frameworks such as X-ToM and CoCo-X, we also evaluate existing deep learning models such as Transformer, Compositional Modular Networks in terms of their ability to provide interpretable visual and language representations and their ability to provide robust predictions to out-of-distribution samples. We show that the state-of-the-art end-to-end modular network implementations – although provide high model interpretability with their transparent, hierarchical and semantically motivated architecture – require a large amount of training data and are less effective in generalizing to unseen but known language constructs. We propose several extensions to modular networks that mitigate bias in the training and improve robustness and faithfulness of model; (4) The research culminates in a visual question and answer generation framework, in which we propose a semi-automatic framework for generating out-of-distribution data to explicitly understand the model biases and help improve the robustness and fairness of the model.
2021