Agresti, A. (2012). Categorical Data Analysis (3rd ed.). New York, NY: John Wiley & Sons.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

Allaire, J., Cheng, J., Xie, Y., McPherson, J., Chang, W., Allen, J., … Arslan, R. (2017). rmarkdown: Dynamic documents for R. Retrieved from

Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian Networks in Educational Assessment (1st ed.). New York, NY: Springer.

Amazon Web Services. (2018). Amazon Elastic Compute Cloud (EC2). Retrieved from

Ayala, R. J. de. (2009). The Theory and Practice of Item Response Theory. New York, NY: Guilford Press.

Bradshaw, L. (2017). Diagnostic Classification Models. In A. A. Rupp & J. Leighton (Eds.), The Handbook of Cognition and Assessment: Frameworks, Methodologies, and Applications (1st ed., pp. 297–327). New York, NY: John Wiley & Sons.

Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403–425.

Bradshaw, L., Izsák, A., Templin, J., & Jacobson, E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational Measurement: Issues and Practice, 33(1), 2–14.

Brown, T. A. (2006). Confirmatory Factor Analysis for Applied Research. New York, NY: Guilford.

Browne, M., Rockloff, M., & Rawat, V. (2016). An SEM algorithm for scale reduction incorporating evaluation of multiple psychometric criteria. Sociological Methods & Research, (Advance online publication).

Burkholder, G. J., & Harlow, L. L. (2003). An illustration of longitudinal cross-lagged design for larger structual equation models. Structural Equation Modeling, 10(3), 465–486.

Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.

Chinn, S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 19, 3127–3131.

Cizek, G. J. (2006). Standard Setting. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of Test Development. Mahwah, NJ: Taylor & Francis.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.

de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343–362.

de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.

de la Torre, J., Ark, L. A. van der, & Rossi, G. (2015). Analysis of clinical data from cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development, 0748175615569110.

DeCarlo, L. T. (2011). On the analysis of fraction bubtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26.

DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), S50–S59. Retrieved from

DiBello, L. V., Stout, W. F., & Roussos, L. (1995). Unified cognitive psychometric assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively Diagnostic Assessment (pp. 361–390). Hillsdale, NJ: Erlbaum.

Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36–49.

Feinberg, R. A., & Wainer, H. (2014). When can we improve subscores by making them shorter?: The case against sbuscores with overlapping items. Educational Measurement: Issues and Practice, 33(3), 47–54.

Fukuhara, H., & Kamata, A. (2011). A bifactor multidimensional item response theory model for differential item functioning analysis on testlet-based items. Applied Psychological Measurement, 35(8), 604–622.

Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models (1st ed.). Cambridge, England: Cambridge University Press.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Boca Raton, FL: CRC Press.

Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321. Retrieved from

Hallquist, M., & Wiley, J. (2018). MplusAutomation: An r package for facilitating large-scale latent variable analyses in mplus. Retrieved from

Halonen, J., Harris, C. M., Pastor, D. A., Abrahamson, C. E., & Huffman, C. J. (2005). Assessing general education outcomes in introductory psychology. In D. S. Dunn & S. Chew (Eds.), Best Practices in Teaching Introduction to Psychology (pp. 195–210). Mahwah, NJ: Erlbaum.

Hambleton, R. (2006). Setting Performance Standards. In Educational Measurement (4th ed., pp. 433–470). Rowman & Littelfield.

Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Urbana-Champaign, IL.

Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125.

Henry, L., & Wickham, H. (2017). Purrr: Functional programming tools. Retrieved from

Henry, L., & Wickham, H. (2018). Tidyselect: Select from a set of strings. Retrieved from

Henson, R. A., Templin, J. L., & Willse, J. T. (2008). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191.

Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277.

Henson, R., & Templin, J. (2005). Extending cognitive diagnosis models to evaluate the validity of DSM criteria for the diagnosis of pathological gambling. In. Las Vegas, NV.

Hester, J. (2017). Glue: Interpreted string literals. Retrieved from

Johnson, P. E. (2016). PortableParallelSeeds: Allow replication of simulations on parallel and serial computers. Retrieved from

Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.

Jurich, D. P., & Bradshaw, L. (2013). An illustration of diagnostic classification modeling in student learning outcomes assessment. International Journal of Testing, 14(1), 49–72.

Kaplan, D. (2009). Structural Equation Modeling: Foundations and Extensions (2nd ed.). Thousand Oaks, CA: Sage.

Kingston, N. M., & McKinley, R. L. (1988). Assessing the structure of the GRF General Test using confirmatory multidimensional item response theory. In Symposium: Item response theory meets multidimensional tests. New Orleans, LA.

Kline, R. B. (2002). Principles and Practice of Structural Equation Modeling (2nd ed.). New York, NY: Guilford.

Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49(1), 59–81.

McKinley, R. L., & Kingston, N. M. (1988). Confirmatory analysis of test structure using multidimensional IRT. In. New Orleans, LA.

McWhite, C. D., & Wilke, C. O. (2018). Colorblindr: Simulate colorblindness in r figures. Retrieved from

Millon, T., Millon, C., Davis, R., & Grossman, S. (2009). MCMI-III Manual (4th ed.). Minneapolis, MN: Pearson.

Muthén, L. K., & Muthén, B. O. (1998). Mplus User’s Guide (7th ed.). Los Angeles, CA: Muthén & Muthén.

Müller, K., & Wickham, H. (2018). Tibble: Simple data frames. Retrieved from

Netlify. (2018). Netlify. Retrieved from

Pandoc. (2017). Retrieved from

Pedersen, T. L. (2016). Ggforce: Accelerating ’ggplot2’. Retrieved from

Perry, J. L., Nicholls, A. R., Clough, P. J., & Crust, L. (2015). Assessing model fit: Caveats and recommendations for confirmatory factor analysis and exploratory structural equation modeling. Measurement in Physical Education and Exercise Science, 19(1), 12–21.

Preston-Werner, T. (2018). Jekyll. Retrieved from

R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from

Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21(1), 25–36.

Reckase, M. D. (2009). Multidimensional Item Response Theory. New York, NY: Springer-Verlag.

Rojas, G., de la Torre, J., & Olea, J. (2012). Choosing between general and specific cognitive diagnosis models when the sample size is small. In. Vancouver, British Columbia, Canada.

RStudio. (2018). RStudio. Retrieved from

Rubinstein, R. Y., & Kroese, D. P. (2017). Simulation and the Monte Carlo Method (3rd ed.). Hoboken, NJ: Wiley.

Rupp, A. A., & Templin, J. (2008a). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68(1), 78–96.

Rupp, A. A., & Templin, J. L. (2008b). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research & Perspective, 6(4), 219–262.

Rupp, A. A., & Wilhelm, O. (2012). Files for Mplus input file generation. College Park, MD. Retrieved from

Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic Measurement: Theory, Methods, and Applications (1st ed.). New York, NY: Guilford Press. Retrieved from

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52(3), 333–343.

Sinharay, S. (2010). When can subscores be expected to have added value? Results from operational and simulated data (No. RR-10-16). Princeton, NJ: Educational Testing Service.

Sinharay, S., Haberman, S. J., & Wainer, H. (2011). Do adjusted subscores lack validity? Don’t blame the messenger. Educational and Psychological Measurement, 71(5), 789–797.

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York, NY: Chapman & Hall/CRC.

Spinu, V., Grolemund, G., & Wickham, H. (2018). Lubridate: Make dealing with dates a little easier. Retrieved from

Stroup, W. W. (2012). Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. New York, NY: Chapman & Hall/CRC.

Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354.

Tatsuoka, K. K. (1990). Toward an Integration of Item-Response Theory and Cognitive Error Diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, & M. Shafto (Eds.), Diagnostic Monitoring of Skill and Knowledge Acquisition (pp. 453–488). Hillsdale, NJ: Erlbaum.

Templin, J. (2010). Classification model based standard setting methods. In. Denver, CO.

Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305.

Templin, J., & Bradshaw, L. (2014a). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339.

Templin, J., & Bradshaw, L. (2014b). The use and misuse of psychometric models. Psychometrika, 79(2), 347–354.

Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32(2), 37–50.

Templin, J., Henson, R., Rupp, A. A., Jang, E., & Ahmed, M. (2008). Diagnostic models for nominal response data. In. New York, NY.

Templin, J., Poggio, A., Irwin, P., & Henson, R. (2007). Latent class model based approaches to standard setting. In. Chicago, IL.

Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577. Retrieved from

Thompson, W. J., & Johnson, P. E. (2017). jayhawkdown: A bookdown template for University of Kansas dissertations. Retrieved from

Travis CI. (2018). Travis CI. Retrieved from

Ullman, J. B. (2012). Structural equation modeling. In B. G. Tabachnick & L. S. Fidell (Eds.), Using Multivariate Statistics (6th ed., pp. 681–785). New York, NY: Pearson.

Ullman, J. B., & Bentler, P. M. (2003). Structural Equation Modeling. In Handbook of Psychology. John Wiley & Sons.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-Values: Context, process, and purpose. The American Statistician, 70(2), 129–133.

Wickham, H. (2018a). Forcats: Tools for working with categorical variables (factors). Retrieved from

Wickham, H. (2018b). Stringr: Simple, consistent wrappers for common string operations. Retrieved from

Wickham, H., & Chang, W. (2018). Ggplot2: Create elegant data visualisations using the grammar of graphics.

Wickham, H., & Henry, L. (2018). Tidyr: Easily tidy data with ’spread()’ and ’gather()’ functions. Retrieved from

Wickham, H., François, R., Henry, L., & Müller, K. (2018). Dplyr: A grammar of data manipulation.

Wickham, H., Hester, J., & Francois, R. (2017). Readr: Read rectangular text data. Retrieved from

Wit, E., van den Heuvel, E., & Romeijn, J.-W. (2012). ’All models are wrong...’: An introduction to model uncertainty. Statistica Neerlandica, 66(3), 217–236.

Xie, Y. (2017a). bookdown: Authoring books and technical documents with R markdown. Retrieved from

Xie, Y. (2017b). knitr: A general-purpose package for dynamic report generation in R. Retrieved from

Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (No. RR-08-27). Princeton, NJ: Educational Testing Service.

Zhu, H. (2018). KableExtra: Construct complex table with ’kable’ and pipe syntax. Retrieved from