IRTPRO (Item Response Theory for PatientReported Outcomes) is an entirely new application for item calibration and test scoring using IRT.
Item response theory (IRT) models for which item calibration and scoring are implemented in IRTPRO are based on unidimensional and multidimensional [confirmatory factor analysis (CFA) or exploratory factor analysis (EFA)] versions of the following widely used response functions:
 Twoparameter logistic (2PL) (Birnbaum, 1968) [with which equality constraints includes the oneparameter logistic (1PL) (Thissen, 1982)]
 Threeparameter logistic (3PL) (Birnbaum, 1968)
 Graded (Samejima, 1969; 1997)
 Generalized Partial Credit (Muraki, 1992, 1997)
 Nominal (Bock, 1972, 1997; Thissen, Cai, & Bock, 2010)
These item response models may be mixed in any combination within a test or scale, and any (optional) userspecified equality constraints among parameters, or fixed values for parameters, may be specified.
IRTPRO implements the method of Maximum Likelihood (ML) for item parameter estimation (item calibration), or it computes Maximum a posteriori (MAP) estimates if (optional) prior distributions are specified for the item parameters. That being said, alternative computational methods may be used, each of which provides best performance for some combinations of dimensionality and model structure:
 BockAitkin (BAEM) (Bock & Aitkin, 1981)
 Bifactor EM (Gibbons & Hedeker, 1992; Gibbons et al., 2007; Cai, Yang & Hansen (2011)
 Generalized Dimension Reduction EM (Cai, 2010a)
 Adaptive Quadrature (ADQEM) (Schilling & Bock, 2005)
 MetropolisHastings RobbinsMonro (MHRM) (Cai, 2010b, 2010c)
 Markov Chain Monte Carlo (MCMC) PatzJunker's (1999a, 1999b)
The computation of IRT scale scores in IRTPRO may be done using any of the following methods:
 Maximum a posteriori (MAP) for response patterns
 Expected a posteriori (EAP) for response patterns (Bock & Mislevy, 1982)
 Expected a posteriori (EAP) for summed scores (Thissen & Orlando, 2001; Thissen, Nelson, Rosa, & McLeod, 2001)
Data structures in IRTPRO may categorize the item respondents into groups, and the population latent variable means and variancecovariance matrices may be estimated for multiple groups (Mislevy, 1984, 1985). [Most often, if there is only one group, the population latent variable mean(s) and variance(s) are fixed (usually at 0 and 1) to specify the scale; for multiple groups, one group is usually denoted the "reference group" with standardized latent values.]
To detect differential item functioning (DIF), IRTPRO uses Wald tests, modeled after a proposal by Lord (1977), but with accurate item parameter error variancecovariance matrices computed using the Supplemented EM (SEM) algorithm (Cai, 2008).
Depending on the number of items, response categories, and respondents, IRTPRO reports several varieties of goodness of fit and diagnostic statistics after item calibration. The values of –2 log likelihood, Akaike Information Criterion (AIC) (Akaike, 1974) and the Bayesian Information Criterion (BIC) (Schwarz, 1978) are always reported. If the sample size sufficiently exceeds the number of cells in the complete crossclassification of the respondents based on item response patterns, the overall likelihood ratio test against the general multinomial alternative is reported. For some models, the M_{2} statistic (MaydeuOlivares & Joe, 2005, 2006; Cai, MaydeuOlivares, Coffman, & Thissen, 2006) is also computed. Diagnostic statistics include generalizations for polytomous responses of the local dependence (LD) statistic described by Chen & Thissen (1997) and the SSX^{2} itemfit statistic suggested by Orlando & Thissen (2000, 2003).
IRTPRO 4.20 is compatible with Windows 10. It has been tested on Windows 10 and no problems were reported. 

IRTPRO 4.20 is compatible with Windows 8. It has been tested on Windows 8 and no problems were reported. 

IRTPRO 4.20 is compatible with Windows 7. It has been tested on Windows 7 and no problems were reported. 
