References

Agresti, A. (2012). Categorical Data Analysis (3rd ed.). New York, NY: John Wiley & Sons.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705

Allaire, J., Cheng, J., Xie, Y., McPherson, J., Chang, W., Allen, J., … Arslan, R. (2017). rmarkdown: Dynamic documents for R. Retrieved from http://rmarkdown.rstudio.com

Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian Networks in Educational Assessment (1st ed.). New York, NY: Springer.

Amazon Web Services. (2018). Amazon Elastic Compute Cloud (EC2). Retrieved from https://aws.amazon.com/ec2/

Ayala, R. J. de. (2009). The Theory and Practice of Item Response Theory. New York, NY: Guilford Press.

Bradshaw, L. (2017). Diagnostic Classification Models. In A. A. Rupp & J. Leighton (Eds.), The Handbook of Cognition and Assessment: Frameworks, Methodologies, and Applications (1st ed., pp. 297–327). New York, NY: John Wiley & Sons.

Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403–425. https://doi.org/10.1007/s11336-013-9350-4

Bradshaw, L., Izsák, A., Templin, J., & Jacobson, E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational Measurement: Issues and Practice, 33(1), 2–14.

Brown, T. A. (2006). Confirmatory Factor Analysis for Applied Research. New York, NY: Guilford.

Browne, M., Rockloff, M., & Rawat, V. (2016). An SEM algorithm for scale reduction incorporating evaluation of multiple psychometric criteria. Sociological Methods & Research, (Advance online publication). https://doi.org/10.1177/0049124116661580

Burkholder, G. J., & Harlow, L. L. (2003). An illustration of longitudinal cross-lagged design for larger structual equation models. Structural Equation Modeling, 10(3), 465–486. https://doi.org/10.1207/S15328007SEM1003_8

Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866. https://doi.org/10.1080/01621459.2014.934827

Chinn, S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 19, 3127–3131.

Cizek, G. J. (2006). Standard Setting. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of Test Development. Mahwah, NJ: Taylor & Francis.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.

de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343–362. https://doi.org/10.1111/j.1745-3984.2008.00069.x

de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/10.1007/BF02295640

de la Torre, J., Ark, L. A. van der, & Rossi, G. (2015). Analysis of clinical data from cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development, 0748175615569110. https://doi.org/10.1177/0748175615569110

DeCarlo, L. T. (2011). On the analysis of fraction bubtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26. https://doi.org/10.1177/0146621610377081

DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), S50–S59. Retrieved from http://journals.lww.com/lww-medicalcare/Fulltext/2006/11001/Classical_Test_Theory.11.aspx

DiBello, L. V., Stout, W. F., & Roussos, L. (1995). Unified cognitive psychometric assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively Diagnostic Assessment (pp. 361–390). Hillsdale, NJ: Erlbaum.

Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36–49. https://doi.org/10.1111/emip.12111

Feinberg, R. A., & Wainer, H. (2014). When can we improve subscores by making them shorter?: The case against sbuscores with overlapping items. Educational Measurement: Issues and Practice, 33(3), 47–54.

Fukuhara, H., & Kamata, A. (2011). A bifactor multidimensional item response theory model for differential item functioning analysis on testlet-based items. Applied Psychological Measurement, 35(8), 604–622. https://doi.org/10.1177/0146621611428447

Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models (1st ed.). Cambridge, England: Cambridge University Press.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Boca Raton, FL: CRC Press.

Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321. Retrieved from http://www.jstor.org/stable/1434756

Hallquist, M., & Wiley, J. (2018). MplusAutomation: An r package for facilitating large-scale latent variable analyses in mplus. Retrieved from https://CRAN.R-project.org/package=MplusAutomation

Halonen, J., Harris, C. M., Pastor, D. A., Abrahamson, C. E., & Huffman, C. J. (2005). Assessing general education outcomes in introductory psychology. In D. S. Dunn & S. Chew (Eds.), Best Practices in Teaching Introduction to Psychology (pp. 195–210). Mahwah, NJ: Erlbaum.

Hambleton, R. (2006). Setting Performance Standards. In Educational Measurement (4th ed., pp. 433–470). Rowman & Littelfield.

Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Urbana-Champaign, IL.

Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125.

Henry, L., & Wickham, H. (2017). Purrr: Functional programming tools. Retrieved from https://CRAN.R-project.org/package=purrr

Henry, L., & Wickham, H. (2018). Tidyselect: Select from a set of strings. Retrieved from https://CRAN.R-project.org/package=tidyselect

Henson, R. A., Templin, J. L., & Willse, J. T. (2008). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191. https://doi.org/10.1007/s11336-008-9089-5

Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277. https://doi.org/10.1177/0146621604272623

Henson, R., & Templin, J. (2005). Extending cognitive diagnosis models to evaluate the validity of DSM criteria for the diagnosis of pathological gambling. In. Las Vegas, NV.

Hester, J. (2017). Glue: Interpreted string literals. Retrieved from https://CRAN.R-project.org/package=glue

Johnson, P. E. (2016). PortableParallelSeeds: Allow replication of simulations on parallel and serial computers. Retrieved from https://CRAN.R-project.org/package=portableParallelSeeds

Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272. https://doi.org/10.1177/01466210122032064

Jurich, D. P., & Bradshaw, L. (2013). An illustration of diagnostic classification modeling in student learning outcomes assessment. International Journal of Testing, 14(1), 49–72.

Kaplan, D. (2009). Structural Equation Modeling: Foundations and Extensions (2nd ed.). Thousand Oaks, CA: Sage.

Kingston, N. M., & McKinley, R. L. (1988). Assessing the structure of the GRF General Test using confirmatory multidimensional item response theory. In Symposium: Item response theory meets multidimensional tests. New Orleans, LA.

Kline, R. B. (2002). Principles and Practice of Structural Equation Modeling (2nd ed.). New York, NY: Guilford.

Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49(1), 59–81. https://doi.org/10.1111/j.1745-3984.2011.00160.x

McKinley, R. L., & Kingston, N. M. (1988). Confirmatory analysis of test structure using multidimensional IRT. In. New Orleans, LA.

McWhite, C. D., & Wilke, C. O. (2018). Colorblindr: Simulate colorblindness in r figures. Retrieved from https://github.com/clauswilke/colorblindr

Millon, T., Millon, C., Davis, R., & Grossman, S. (2009). MCMI-III Manual (4th ed.). Minneapolis, MN: Pearson.

Muthén, L. K., & Muthén, B. O. (1998). Mplus User’s Guide (7th ed.). Los Angeles, CA: Muthén & Muthén.

Müller, K., & Wickham, H. (2018). Tibble: Simple data frames. Retrieved from https://CRAN.R-project.org/package=tibble

Netlify. (2018). Netlify. Retrieved from https://www.netlify.com/

Pandoc. (2017). Retrieved from https://pandoc.org/

Pedersen, T. L. (2016). Ggforce: Accelerating ’ggplot2’. Retrieved from https://github.com/thomasp85/ggforce

Perry, J. L., Nicholls, A. R., Clough, P. J., & Crust, L. (2015). Assessing model fit: Caveats and recommendations for confirmatory factor analysis and exploratory structural equation modeling. Measurement in Physical Education and Exercise Science, 19(1), 12–21.

Preston-Werner, T. (2018). Jekyll. Retrieved from https://jekyllrb.com/

R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21(1), 25–36. https://doi.org/10.1177/0146621697211002

Reckase, M. D. (2009). Multidimensional Item Response Theory. New York, NY: Springer-Verlag.

Rojas, G., de la Torre, J., & Olea, J. (2012). Choosing between general and specific cognitive diagnosis models when the sample size is small. In. Vancouver, British Columbia, Canada.

RStudio. (2018). RStudio. Retrieved from https://www.rstudio.com/products/rstudio/

Rubinstein, R. Y., & Kroese, D. P. (2017). Simulation and the Monte Carlo Method (3rd ed.). Hoboken, NJ: Wiley.

Rupp, A. A., & Templin, J. (2008a). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68(1), 78–96. https://doi.org/10.1177/0013164407301545

Rupp, A. A., & Templin, J. L. (2008b). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research & Perspective, 6(4), 219–262. https://doi.org/10.1080/15366360802490866

Rupp, A. A., & Wilhelm, O. (2012). Files for Mplus input file generation. College Park, MD. Retrieved from http://www.education.umd.edu/EDMS/fac/Rupp/R%20Files%20for%20Mplus%20Input%20File%20Generation.zip

Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic Measurement: Theory, Methods, and Applications (1st ed.). New York, NY: Guilford Press. Retrieved from https://www.guilford.com/books/Diagnostic-Measurement/Rupp-Templin-Henson/9781606235270

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136

Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52(3), 333–343. https://doi.org/10.1007/BF02294360

Sinharay, S. (2010). When can subscores be expected to have added value? Results from operational and simulated data (No. RR-10-16). Princeton, NJ: Educational Testing Service.

Sinharay, S., Haberman, S. J., & Wainer, H. (2011). Do adjusted subscores lack validity? Don’t blame the messenger. Educational and Psychological Measurement, 71(5), 789–797. https://doi.org/10.1177/0013164410391782

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York, NY: Chapman & Hall/CRC.

Spinu, V., Grolemund, G., & Wickham, H. (2018). Lubridate: Make dealing with dates a little easier. Retrieved from https://CRAN.R-project.org/package=lubridate

Stroup, W. W. (2012). Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. New York, NY: Chapman & Hall/CRC.

Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354.

Tatsuoka, K. K. (1990). Toward an Integration of Item-Response Theory and Cognitive Error Diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, & M. Shafto (Eds.), Diagnostic Monitoring of Skill and Knowledge Acquisition (pp. 453–488). Hillsdale, NJ: Erlbaum.

Templin, J. (2010). Classification model based standard setting methods. In. Denver, CO.

Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305. https://doi.org/10.1037/1082-989X.11.3.287

Templin, J., & Bradshaw, L. (2014a). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339.

Templin, J., & Bradshaw, L. (2014b). The use and misuse of psychometric models. Psychometrika, 79(2), 347–354.

Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32(2), 37–50. https://doi.org/10.1111/emip.12010

Templin, J., Henson, R., Rupp, A. A., Jang, E., & Ahmed, M. (2008). Diagnostic models for nominal response data. In. New York, NY.

Templin, J., Poggio, A., Irwin, P., & Henson, R. (2007). Latent class model based approaches to standard setting. In. Chicago, IL.

Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577. Retrieved from http://www.springerlink.com/index/CMM2M3T213U5683R.pdf

Thompson, W. J., & Johnson, P. E. (2017). jayhawkdown: A bookdown template for University of Kansas dissertations. Retrieved from https://github.com/wjakethompson/jayhawkdown

Travis CI. (2018). Travis CI. Retrieved from https://travis-ci.org/

Ullman, J. B. (2012). Structural equation modeling. In B. G. Tabachnick & L. S. Fidell (Eds.), Using Multivariate Statistics (6th ed., pp. 681–785). New York, NY: Pearson.

Ullman, J. B., & Bentler, P. M. (2003). Structural Equation Modeling. In Handbook of Psychology. John Wiley & Sons.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-Values: Context, process, and purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108

Wickham, H. (2018a). Forcats: Tools for working with categorical variables (factors). Retrieved from https://CRAN.R-project.org/package=forcats

Wickham, H. (2018b). Stringr: Simple, consistent wrappers for common string operations. Retrieved from https://CRAN.R-project.org/package=stringr

Wickham, H., & Chang, W. (2018). Ggplot2: Create elegant data visualisations using the grammar of graphics.

Wickham, H., & Henry, L. (2018). Tidyr: Easily tidy data with ’spread()’ and ’gather()’ functions. Retrieved from https://CRAN.R-project.org/package=tidyr

Wickham, H., François, R., Henry, L., & Müller, K. (2018). Dplyr: A grammar of data manipulation.

Wickham, H., Hester, J., & Francois, R. (2017). Readr: Read rectangular text data. Retrieved from https://CRAN.R-project.org/package=readr

Wit, E., van den Heuvel, E., & Romeijn, J.-W. (2012). ’All models are wrong...’: An introduction to model uncertainty. Statistica Neerlandica, 66(3), 217–236. https://doi.org/10.1111/j.1467-9574.2012.00530.x

Xie, Y. (2017a). bookdown: Authoring books and technical documents with R markdown. Retrieved from https://CRAN.R-project.org/package=bookdown

Xie, Y. (2017b). knitr: A general-purpose package for dynamic report generation in R. Retrieved from https://CRAN.R-project.org/package=knitr

Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (No. RR-08-27). Princeton, NJ: Educational Testing Service.

Zhu, H. (2018). KableExtra: Construct complex table with ’kable’ and pipe syntax. Retrieved from https://CRAN.R-project.org/package=kableExtra