Interpreting Nonlinear Black-box Models Globally: A Comparative Study of Different Techniques

Yingqi Li
MS, 2020
Li, Jingyi
In machine learning, black-box models that discover underlying relationships in data are more predictive than white-box models, but also more challenging to interpret. The constraint of interpretability for nonlinear black-box models is rooted in the interaction effects between features. Based on modeling needs, a white-box model may be sufficient in some circumstances, but understanding how a black-box model generates predictions is essential out of consideration for accuracy and faithfulness. In this paper, we use three global, model-agnostic interpretability methods (Partial Dependence Plot and Individual Conditional Expectation Plot, Global Surrogate, and Global SHAP through discretization) to explain a diverse class of black-box models and compare their behavior quantitatively and qualitatively. The experiment results demonstrate that the performance of the methods varies when the datasets and black-box models in need of interpretation are different. We prove that the methods provide explanations from several perspectives, and therefore present a strategy that selects the most appropriate interpretability method given a new model on a new dataset
2020