Trees vs Neurons: Comparison between Denoising Autoencoders and Random Forest for Imputation of Mixed Data from Electronic Medical Records

Qin Peng
MAS, 2018
Wu, Yingnian
Missing data is a significant challenge impacting almost all studies; however, this is especially true for analyses of electronic health record (EHR). We propose a multiple imputation model based on multi-layer denoising autoencoders. This nonparametric model can deal with mixed-typed data types, and not making assumptions of missing mechanism. Evaluation on simulated datasets based on real life EHR datasets showed that our proposed model outperforms current Random Forest method and median/mode Imputation.
2018