Alternative Multiple Imputation Inference for Categorical Structural Equation Modeling

Seungwon Chung
MS, 2019
Wu, Yingnian
The use of responses from questionnaires is ubiquitous in social and behavioral science research. One side effect of using such data is that researchers must often account for item level missingness. Multiple imputation is one of the most widely used missing data handling techniques, wherein missing data are replaced by plausible values from the their proper posterior distribution given the observed data. Instead of the standard procedure in structural equation modeling (SEM), which requires researchers to fit their model to imputed data sets as many times as the number of imputations and then combine parameter estimates and standard errors at the end, we propose a new and simpler approach that is computationally more convenient. It has a number of additional benefits such as the availability of fit indices. Motivated by Lee and Cai (2012), who proposed an alternative method for statistical inference under MI in SEM with continuous variables, we extend their approach to the case of categorical variables. Within the context of ordered categorical data, the main idea is summarized as follows. Assume we have thresholds and polychoric correlations computed from M imputed data set. Our goal is to perform estimation and inference with these M different thresholds and polychoric correlations. We can easily average the thresholds and polychoric correlations; however, the weight matrix for obtaining the correct statistic in CSEM requires reflecting the between-imputation variance on top of simple averaging of asymptotic covariance matrices of the thresholds and polychoric correlations. Finally, applying Browne (1984)’s Proposition 4 leads us to obtain the correct test statistic, ˜TB. We further consider ˜TYB, a small-sample adjustment of ˜TB (Yuan & Bentler, 1997). We demonstrate our proposed statistics performance and their power to detect model misspecification via simulation studies. In addition, we illustrate our findings with two empirical data sets.
2019