Planned Missing Designs and Diagnostic Classification Models
Yon Soo Suh
MS, 2022
Wu, Yingnian
Missing responses are often inevitable in assessments, whether they are intended or not. The problem is not with the missing data itself but how it is dealt with. In fact, as in the case of planned missing (PM) data designs, missing data can even be used to our advantage to promote cost-effectiveness and design efficiency in test development. Over the years there has been active research on the impact of, treatment for, and use of different kinds of missing data on psychometric models for assessments with a focus on first classical test theory (CTT) and then item response theory (IRT) models. IRT models have become one of the most popular statistical models for psychometrics and they have been widely used in many educational settings. Nonetheless, in an era of accountability in schools with increased emphasis on providing detailed and formative feedback on individual students, a different flavor of IRT models, coined diagnostic classification models (DCMs), have been fast gaining popularity in the same settings. DCMs specialize in the classification of respondents according to their mastery of a predefined set of underlying cognitive processes called attributes and is well-suited for obtaining diagnostic information about individual attributes as well as their combinations. However, there is scant research on the impact of any kind of missing data that has been tailored specifically to DCMs. As a step toward filling this gap, this study investigates the effect of using a maximum likelihood (ML)-based approach for treating missing data assumed to be missing completely at random (MCAR) under specifically PM design scenarios using simulated data. Key factors of the type of PM design, the number of attributes, the structural model of DCMs, and sample size were experimentally manipulated to examine the extent to which item parameters of DCMs can be recovered and to compare the effects of various design factors. This project adds to the empirical knowledge base on the statistical properties of DCMs in the face of missing data, which in turn are expected to improve the design and use of DCMs in practical settings.