Chapter 6 Conclusions

The present study utilizes Monte Carlo simulation to evaluate the performance of various model reduction processes for diagnostic classification models. There are two components of DCMs that can be reduced during the estimation process: the measurement model and the structural model. The measurement model governs how items related to the attributes, and the structural model controls the base rate probabilities of inclusion in each class. In this study, the relative strengths and weaknesses of reducing each model, and in which order was examined. Specifically, convergence rates, the bias and mean square error of the parameter estimates, and the attribute and profile mastery classifications were used to evaluate the processes.

Parameters were chosen for reduction based on their p-value. If after estimation of the prior model in the reduction process the parameter was non-significant (i.e., p-value greater than .05), it was removed from the model in the next iteration. In addition, parameters could also be reduced using a heuristic if the saturated model failed to converge. In practice, practitioners are unlikely to stop if the saturated model fails to converge, and instead remove parameters in an effort to achieve a converged model (e.g., Bradshaw et al., 2014). Thus, if the saturated model failed to converge, three- and four-way interaction terms were removed from the measurement and/or saturated models for the reduction. Following the initial heuristic decision, subsequent reductions were performed based on p-values of converged models. These two processes (p-value only reduction and heuristic and p-value reduction) were analyzed separately to evaluate possible differences in reduction preference.

Overall, the results suggest that all model reduction processes provide reasonably unbiased estimates of measurement model and structural model parameters, and mean square error decreased as the sample size increased. Additionally, all reduction processes showed high levels of agreement between the true and estimated attribute mastery of the respondents. This was especially at the attribute level as compared to the profile level, but showed a consistent pattern across both p-value based and heuristic based reduction.

However, a key difference found in this study was the convergence rates of different model reduction processes, and how those were affected by the initial convergence of the saturated model. When the saturated model converged, reducing the measurement model first was most likely to lead to a converged model, especially if the Q-matrix was over specified. In contrast, when the saturated model failed to converge, reducing the structural model was far more likely to lead to model convergence. This was most pronounced in the correctly specified Q-matrix conditions. Regardless of whether or not the saturated model converged, model reductions were unlikely to result in a converged solution when the Q-matrix was over specified. The overall low rate over convergence for the over specified Q-matrices, across all conditions, indicates that the model is highly unlikely to converge if the Q-matrix has even low rates of over specification.

Additionally, the model fit results suggest other differences. When the saturated model successfully converged, that was most often the preferred model. Obviously when the saturated model fails to converge, this is not an option. In this scenario, structural reduction was the preferred method of model reduction. It is possible that this finding for the heuristic reduction is an artifact of the structural reduction converging the most frequently. However, structural reduction was also the second most preferred reduction process when reducing with p-values. Together, this suggests that reduction of the structural model is less likely to negatively impact model fit on average.

Taken in totality, the results of this study suggest that the path of model reduction should be determined by the convergence of the saturated model. Should the saturated model converge, then the measurement model should be reduced first in order to create a more parsimonious model, while still maintaining a converged solution. On the other hand, if the saturated model fails to converge, reducing the structural model is most likely to provide a converged solution to evaluate. If this too fails to converge, then the most likely scenario is that the Q-matrix has been misspecified, and should be revised with input from content area experts, just as in the process for developing the Q-matrix (Bradshaw, 2017).

6.1 Limitations and future directions

There are several limitations of this study that deserve additional investigation in future work. First, reduction of converged models was based on p-values. The reliance on p-values is problematic for several reasons. For example, p-values do not give direct inferences about the parameter values, and they are unable to provide information about the actual size of the effect (see Wasserstein & Lazar, 2016 for a more complete summary). However, because of the .05 cutoff used to identify non-significant parameters for reduction, there is likely a non-negligible amount of Type I error. Instead, it would likely be better to use a Bayesian estimation, where credible intervals could be formed around the parameter estimates, and reduced based on the proportion of the posterior distribution that is below a predefined cutoff of practical significance. Improvements in available software for estimating DCMs is likely necessary in order for this to be a viable path forward.

Secondly, the random generation of structural parameters resulted in respondents being unevenly placed into classes. Although this may be more reflective of reality, where some classes are less likely than others (for example Figure 3.2), it also likely contributed to convergence problems. Thus, future research may benefit from having a fixed structural model, or least utilizing some mechanism of ensuring that respondents are present in all classes. As an example, it would be relatively straight forward when generating the structural parameters to keep regenerating parameters until a set is generated that results in all classes having a base rate probability above a given threshold.

Related to the issues of under represented classes, part of the model reduction process could involve removing classes that have a low number of respondents assigned to them (e.g., Templin & Bradshaw, 2014 a). The implementation of this process would present several challenges. For instance, without theoretical support for why a given profile may or should not exist in the population, there is no straight forward method for assessing whether or not a profile should be removed from the model. In other words, there is a problem in deciding how large a class must be in order for it to be retained. The base rate probabilities could be used, but then a determination must be made about what percentage constitutes a non-significant proportion of the population. Despite these complications, this remains an important area of future research.

Finally, it should be noted that the model selection process should not merely be an exercise in finding a path to convergence. The findings of this study are largely driven by the convergence rates of various model reduction processes. Although this is an important outcome measure, a model should not be selected for use due solely to the fact that it was able to converge. Rather, additional and more accurate measures of model fit are necessary to support the use of a model. Mplus provides $χ^{2}$ statistics for univariate and bivariate sets of items; however, these are unable to sufficiently assess model fit due to the violations of the asymptotic assumptions of the distributions. Instead, model fit may be better assessed through posterior predictive model checks from a Bayesian estimation (e.g., Gelman & Hill, 2006; Gelman et al., 2014). In this way, the practitioner can evaluate the overall fit of the model to data, not just compare between models that were able to successfully converge.

Despite these limitations, this study provides the first empirical evaluation of model reduction processes in DCMs. As the operational use of these models continues to grow, continued research into the practical applications of DCMs will need to keep pace. This study not only demonstrates a framework for future research into the application of DCMs, but also provides guidance to researchers and practitioners as to how best to proceed with model estimation.