Title: | Method of Successive Dichotomizations |
---|---|
Description: | Implements the method of successive dichotomizations by Bradley and Massof (2018) <doi:10.1371/journal.pone.0206106>, which estimates item measures, person measures and ordered rating category thresholds given ordinal rating scale data. |
Authors: | Chris Bradley <[email protected]> |
Maintainer: | Chris Bradley <[email protected]> |
License: | GPL |
Version: | 0.3.1 |
Built: | 2025-02-16 04:30:29 UTC |
Source: | https://github.com/cran/msd |
Expected ratings matrix given item measures, person measures and ordered rating category thresholds.
expdata(items, persons, thresholds, minRating)
expdata(items, persons, thresholds, minRating)
items |
a numeric vector of item measures with missing values set to NA. |
persons |
a numeric vector of person measures with missing values set to NA. |
thresholds |
a numeric vector of ordered rating category thresholds with no NA. |
minRating |
integer representing the smallest ordinal rating category (see Details). |
It is assumed that the set of ordinal rating categories consists of all integers from the lowest rating category specified by minRating
to the highest rating category,
which is minRating + length(thresholds)
.
A numeric matrix of expected ratings.
Expected ratings are literally the expected value of the ordinal rating categories when treated as integers. Expected ratings that cannot be calculated return as NA (e.g., if either the person or item measure is NA). Intended use is for chi-squared tests or for calculating infit and outfit statistics.
Chris Bradley ([email protected])
# Using randomly generated values with minimum rating set to zero im <- runif(20, -2, 2) pm <- runif(50, -2, 2) th <- sort(runif(5, -2, 2)) m <- expdata(items = im, persons = pm, thresholds = th, minRating = 0)
# Using randomly generated values with minimum rating set to zero im <- runif(20, -2, 2) pm <- runif(50, -2, 2) th <- sort(runif(5, -2, 2)) m <- expdata(items = im, persons = pm, thresholds = th, minRating = 0)
Estimates item measures assuming person measures are known and all persons use the same set of rating category thresholds.
ims(data, persons, thresholds, misfit = FALSE, minRating = NULL)
ims(data, persons, thresholds, misfit = FALSE, minRating = NULL)
data |
a numeric matrix of ordinal rating scale data whose entries are integers with missing data set to NA. Rows are persons and columns are items. The ordinal rating scale is assumed to go from the smallest to largest integer in integer steps unless |
persons |
a numeric vector of person measures with missing values set to NA. The length of |
thresholds |
a numeric vector of ordered rating category thresholds with no NA. |
misfit |
logical for calculating infit and outfit statistics. Default is FALSE. |
minRating |
integer representing the smallest ordinal rating category. Default is NULL (see Details). |
minRating
must be specified if either the smallest or largest possible rating category is not in data
(i.e., no person used one of the extreme rating categories). If minRating
is specified, the ordinal rating scale is assumed to go from minRating
to minRating + length(thresholds)
in integer steps.
A list whose elements are:
item_measures |
a vector of person measures for each person |
item_std_errors |
a vector of standard errors for the persons |
infit_items |
if |
outfit_items |
if |
Item measures estimated with ims
differ from those estimated with msd
because ims
assumes all persons use the same rating category thresholds while msd
does not. Intended use of ims
is with an anchored set of persons and thresholds. Item measures that cannot be estimated will return as NA (e.g., if all responses to an item consist of only the highest rating category, or of only the lowest rating category, that item's item measure cannot be estimated).
Chris Bradley ([email protected])
# Simple example with randomly generated values and lowest rating category = 0. d <- as.numeric(sample(0:4, 500, replace = TRUE)) dm <- matrix(d, nrow = 50, ncol = 10) pm <- runif(50, -2, 2) th <- sort(runif(4, -2, 2)) im <- ims(data = dm, persons = pm, thresholds = th, misfit = TRUE, minRating = 0)
# Simple example with randomly generated values and lowest rating category = 0. d <- as.numeric(sample(0:4, 500, replace = TRUE)) dm <- matrix(d, nrow = 50, ncol = 10) pm <- runif(50, -2, 2) th <- sort(runif(4, -2, 2)) im <- ims(data = dm, persons = pm, thresholds = th, misfit = TRUE, minRating = 0)
Calculates infit and outfit statistics for items and persons.
misfit(data, items, persons, thresholds, minRating = NULL)
misfit(data, items, persons, thresholds, minRating = NULL)
data |
a numeric matrix of ordinal rating scale data whose entries are integers with missing data set to NA. Rows are persons and columns are items. The ordinal rating scale is assumed to go from the smallest to largest integer in integer steps unless |
items |
a numeric vector of item measures with missing values set to NA. |
persons |
a numeric vector of person measures with missing values set to NA. |
thresholds |
a numeric vector of ordered rating category thresholds with no NA. |
minRating |
integer representing the smallest ordinal rating category. Default is NULL (see Details). |
minRating
must be specified if either the smallest or largest possible rating category is not in data
(no person used one of the extreme rating categories). If minRating
is specified, the ordinal rating scale is assumed to go from minRating
to minRating + length(thresholds)
.
A list whose elements are:
infit_items |
a vector of infit statistics for the items |
outfit_items |
a vector of outfit statistics for the items |
infit_persons |
a vector of infit statistics for the persons |
outfit_persons |
a vector of outfit statistics for the persons |
Chris Bradley ([email protected])
# Using randomly generated values d <- as.numeric(sample(0:5, 500, replace = TRUE)) dm <- matrix(d, nrow = 50, ncol = 10) im <- runif(10, -2, 2) pm <- runif(50, -2, 2) th <- sort(runif(5, -2, 2)) m <- misfit(data = dm, items = im, persons = pm, thresholds = th) # If the lowest or highest rating category is not in \code{data}, specify \code{minRating} dm[dm == 0] <- NA m2 <- misfit(data = dm, items = im, persons = pm, thresholds = th, minRating = 0)
# Using randomly generated values d <- as.numeric(sample(0:5, 500, replace = TRUE)) dm <- matrix(d, nrow = 50, ncol = 10) im <- runif(10, -2, 2) pm <- runif(50, -2, 2) th <- sort(runif(5, -2, 2)) m <- misfit(data = dm, items = im, persons = pm, thresholds = th) # If the lowest or highest rating category is not in \code{data}, specify \code{minRating} dm[dm == 0] <- NA m2 <- misfit(data = dm, items = im, persons = pm, thresholds = th, minRating = 0)
Estimates item measures, person measures, rating category thresholds and their standard errors using the method of successive dichotomizations. Option provided for anchoring certain items and persons while estimating the rest. Option also provided for estimating infit and outfit statistics.
msd(data, items = NULL, persons = NULL, misfit = FALSE)
msd(data, items = NULL, persons = NULL, misfit = FALSE)
data |
a numeric matrix of ordinal rating scale data whose entries are integers with missing data set to NA. Rows are persons and columns are items. The ordinal rating scale is assumed to go from the smallest integer to the largest integer in |
items |
a numeric vector of anchored item measures. Item measures to be estimated are set to NA. Default is NULL (see Details). |
persons |
a numeric vector of anchored person measures. Person measures to be estimated are set to NA. Default is NULL (see Details). |
misfit |
logical for calculating infit and outfit statistics. Default is FALSE. |
items
and persons
are optional numeric vectors that specify item and person measures that are "anchored" and not estimated. The length of items
must equal the number of columns in data
and the length of persons
must equal the number of rows in data
. Only entries set to NA in items
and persons
are estimated. Default for both items
and persons
is NULL, which is equivalent to a vector of NA so that all items and persons are estimated.
A list whose elements are:
item_measures |
a vector of item measures for each item |
person_measures |
a vector of person measures for each person |
thresholds |
a vector of average rating category thresholds used by the persons when rating the items |
item_std_errors |
a vector of standard errors for the items |
person_std_errors |
a vector of standard errors for the persons |
threshold_std_errors |
a vector of standard errors for the thresholds |
item_reliability |
reliability of the item measures |
person_reliability |
reliability of the person measures |
infit_items |
if |
outfit_items |
if |
infit_persons |
if |
outfit_persons |
if |
The axis origin is set by convention at the mean item measure. All item measures and person measures that cannot be estimated will return as NA (e.g., if a person responds with only the highest rating category, or with only the lowest rating category, to all items, that person's person measure cannot be estimated).
The accuracy of msd
can be tested using the simdata
function (see Examples).
Chris Bradley ([email protected])
Bradley, C. and Massof, R. W. (2018) Method of successive dichotomizations: An improved method for estimating measures of latent variables from rating scale data. PLoS One, 13(10) doi:10.1371/journal.pone.0206106
# Simple example using a randomly generated ratings matrix d <- as.numeric(sample(0:5, 200, replace = TRUE)) dm <- matrix(d, nrow = 20, ncol = 10) m1 <- msd(dm, misfit = TRUE) # Anchor first 5 item measures and first 10 person measures im <- m1$item_measures im[6:length(im)] <- NA pm <- m1$person_measures pm[11:length(pm)] <- NA m2 <- msd(dm, items = im, persons = pm) # To test the accuracy of msd using simdata, set the mean item measure to zero # (axis origin in msd is the mean item measure) and the mean threshold to # zero (any non-zero mean threshold is reflected in the person measures). im <- runif(100, -2, 2) im <- im - mean(im) pm <- runif(100, -2, 2) th <- sort(runif(5, -2, 2)) th <- th - mean(th) d <- simdata(im, pm, th, missingProb = 0.15, minRating = 0) m <- msd(d) # Compare msd parameters to true values. Linear regression should # yield a slope very close to 1 and an intercept very close to 0. lm(m$item_measures ~ im) lm(m$person_measures ~ pm) lm(m$thresholds ~ th)
# Simple example using a randomly generated ratings matrix d <- as.numeric(sample(0:5, 200, replace = TRUE)) dm <- matrix(d, nrow = 20, ncol = 10) m1 <- msd(dm, misfit = TRUE) # Anchor first 5 item measures and first 10 person measures im <- m1$item_measures im[6:length(im)] <- NA pm <- m1$person_measures pm[11:length(pm)] <- NA m2 <- msd(dm, items = im, persons = pm) # To test the accuracy of msd using simdata, set the mean item measure to zero # (axis origin in msd is the mean item measure) and the mean threshold to # zero (any non-zero mean threshold is reflected in the person measures). im <- runif(100, -2, 2) im <- im - mean(im) pm <- runif(100, -2, 2) th <- sort(runif(5, -2, 2)) th <- th - mean(th) d <- simdata(im, pm, th, missingProb = 0.15, minRating = 0) m <- msd(d) # Compare msd parameters to true values. Linear regression should # yield a slope very close to 1 and an intercept very close to 0. lm(m$item_measures ~ im) lm(m$person_measures ~ pm) lm(m$thresholds ~ th)
Estimates the probability of observing each rating category given a set of ordered rating category thresholds.
msdprob(x, thresholds)
msdprob(x, thresholds)
x |
a real number or a vector of real numbers with no NA representing a set of person minus item measures. |
thresholds |
a numeric vector of ordered rating category thresholds with no NA. |
It is assumed that thresholds
partitions the real line into length(thresholds)+1
ordered intervals that represent the rating categories.
A matrix of probabilities where each of the length(thresholds)+1
rows represents a different rating category (lowest rating category is the top row) and each of the length(x)
columns represents a different person minus item measure.
msdprob
can be used to create probability curves, which represent
the probability of rating an item with each rating category as a function
of the person measure minus item measure (see Examples).
Chris Bradley ([email protected])
# Simple example p <- msdprob(c(1.4, -2.2), thresholds = c(-1.1, -0.3, 0.5, 1.7, 2.2)) # Plot probability curves — each curve represents the probability of # rating an item with a given rating category as a function of the # person measure minus item measure. x <- seq(-6, 6, 0.1) p <- msdprob(x, thresholds = c(-3.2, -1.4, 0.5, 1.7, 3.5)) plot(0, 0, xlim = c(-6, 6), ylim = c(0, 1), type = "n", xlab = "Person minus item measure", ylab = "Probability") for (i in seq(1, dim(p)[1])){ lines(x, p[i,], type = "l", lwd = "2" , col = rainbow(6)[i]) }
# Simple example p <- msdprob(c(1.4, -2.2), thresholds = c(-1.1, -0.3, 0.5, 1.7, 2.2)) # Plot probability curves — each curve represents the probability of # rating an item with a given rating category as a function of the # person measure minus item measure. x <- seq(-6, 6, 0.1) p <- msdprob(x, thresholds = c(-3.2, -1.4, 0.5, 1.7, 3.5)) plot(0, 0, xlim = c(-6, 6), ylim = c(0, 1), type = "n", xlab = "Person minus item measure", ylab = "Probability") for (i in seq(1, dim(p)[1])){ lines(x, p[i,], type = "l", lwd = "2" , col = rainbow(6)[i]) }
Estimates person measures assuming item measures are known and all persons use the same set of rating category thresholds.
pms(data, items, thresholds, misfit = FALSE, minRating = NULL)
pms(data, items, thresholds, misfit = FALSE, minRating = NULL)
data |
a numeric matrix of ordinal rating scale data whose entries are integers with missing data set to NA. Rows are persons and columns are items. The ordinal rating scale is assumed to go from the smallest to largest integer in integer steps unless |
items |
a numeric vector of item measures with missing values set to NA. The length of |
thresholds |
a numeric vector of ordered rating category thresholds with no NA. |
misfit |
logical for calculating infit and outfit statistics. Default is FALSE. |
minRating |
integer representing the smallest ordinal rating category. Default is NULL (see Details). |
minRating
must be specified if either the smallest or largest possible rating category is not in data
(i.e., no person used one of the extreme rating categories). If minRating
is specified, the ordinal rating scale is assumed to go from minRating
to minRating + length(thresholds)
in integer steps.
A list whose elements are:
person_measures |
a vector of person measures for each person |
person_std_errors |
a vector of standard errors for the persons |
infit_persons |
if |
outfit_persons |
if |
Person measures estimated with pms
differ from those estimated with msd
because pms
assumes all persons use the same rating category thresholds while msd
does not. Intended use of pms
is with an anchored set of items and thresholds. Person measures that cannot be estimated will return as NA (e.g., if a person responds to all items with only the highest rating category, or with only the lowest rating category, that person's person measure cannot be estimated).
Chris Bradley ([email protected])
# Simple example with randomly generated values and lowest rating category = 0 d <- as.numeric(sample(0:4, 500, replace = TRUE)) dm <- matrix(d, nrow = 25, ncol = 20) im <- runif(20, -2, 2) th <- sort(runif(4, -2, 2)) pm <- pms(data = dm, items = im, thresholds = th, misfit = TRUE, minRating = 0)
# Simple example with randomly generated values and lowest rating category = 0 d <- as.numeric(sample(0:4, 500, replace = TRUE)) dm <- matrix(d, nrow = 25, ncol = 20) im <- runif(20, -2, 2) th <- sort(runif(4, -2, 2)) pm <- pms(data = dm, items = im, thresholds = th, misfit = TRUE, minRating = 0)
Estimates item measures, person measures and their standard errors using the dichotomous Rasch model. A special case of the function msd
when the rating scale consists of only two rating categories: 0 and 1. Option provided for anchoring certain items and persons while estimating the rest. Option also provided for estimating infit and outfit statistics.
rasch(data, items = NULL, persons = NULL, misfit = FALSE)
rasch(data, items = NULL, persons = NULL, misfit = FALSE)
data |
a numeric matrix of 0's and 1's with missing data set to NA. Rows are persons and columns are items. |
items |
a numeric vector of anchored item measures. Item measures to be estimated are set to NA. Default is NULL (see Details). |
persons |
a numeric vector of anchored person measures. Person measures to be estimated are set to NA. Default is NULL (see Details). |
misfit |
logical for calculating infit and outfit statistics. Default is FALSE. |
items
and persons
are optional numeric vectors that specify item and person measures that should be "anchored" and not estimated. The length of items
must equal the number of columns in data
and the length of persons
must equal the number of rows in data
. Only entries set to NA in items
and persons
are estimated. Default for both items
and persons
is NULL, which is equivalent to a vector of NA so that all items and persons are estimated.
A list whose elements are:
item_measures |
a vector of item measures for each item |
person_measures |
a vector of person measures for each person |
item_std_errors |
a vector of standard errors for the items |
person_std_errors |
a vector of standard errors for the persons |
item_reliability |
reliability value for the items |
person_reliability |
reliability value for the persons |
infit_items |
if |
outfit_items |
if |
infit_persons |
if |
outfit_persons |
if |
The axis origin is set by convention at the mean item measure. All item measures and person measures that cannot be estimated will return as NA (e.g., if a person responds with a single rating category to all items, that person's person measure cannot be estimated).
rasch
is the basis for the "successive dichotomizations" in msd
and is repeatedly called by msd
when there are three or more rating categories.
The accuracy of rasch
can be tested using the simdata
function (see Examples).
Chris Bradley ([email protected])
# Simple example using a randomly generated ratings matrix d <- as.numeric(sample(0:1, 200, replace = TRUE)) dm <- matrix(d, nrow = 20, ncol = 10) m1 <- rasch(dm, misfit = TRUE) # Anchor first 5 item measures and first 10 person measures im <- m1$item_measures im[6:length(im)] <- NA pm <- m1$person_measures pm[11:length(pm)] <- NA m2 <- rasch(dm, items = im, persons = pm) # To test the accuracy of rasch using simdata, set the true mean item measure to # zero (axis origin in rasch is the mean item measure). Note that the threshold for # dichotomous data is at 0. im <- runif(100, -2, 2) im <- im - mean(im) pm <- runif(100, -2, 2) th <- 0 d <- simdata(im, pm, th, missingProb = 0.15, minRating = 0) m <- rasch(d) # Compare rasch parameters to true values. Linear regression should # yield a slope very close to 1 and an intercept very close to 0. lm(m$item_measures ~ im) lm(m$person_measures ~ pm)
# Simple example using a randomly generated ratings matrix d <- as.numeric(sample(0:1, 200, replace = TRUE)) dm <- matrix(d, nrow = 20, ncol = 10) m1 <- rasch(dm, misfit = TRUE) # Anchor first 5 item measures and first 10 person measures im <- m1$item_measures im[6:length(im)] <- NA pm <- m1$person_measures pm[11:length(pm)] <- NA m2 <- rasch(dm, items = im, persons = pm) # To test the accuracy of rasch using simdata, set the true mean item measure to # zero (axis origin in rasch is the mean item measure). Note that the threshold for # dichotomous data is at 0. im <- runif(100, -2, 2) im <- im - mean(im) pm <- runif(100, -2, 2) th <- 0 d <- simdata(im, pm, th, missingProb = 0.15, minRating = 0) m <- rasch(d) # Compare rasch parameters to true values. Linear regression should # yield a slope very close to 1 and an intercept very close to 0. lm(m$item_measures ~ im) lm(m$person_measures ~ pm)
Generates simulated rating scale data given item measures, person measures and rating category thresholds.
simdata(items, persons, thresholds, missingProb = 0, minRating = 0)
simdata(items, persons, thresholds, missingProb = 0, minRating = 0)
items |
a numeric vector of item measures with no NA. |
persons |
a numeric vector of person measures with no NA. |
thresholds |
a numeric vector of ordered rating category thresholds with no NA. |
missingProb |
a number between 0 and 1 specifying the probability of missing data. |
minRating |
integer representing the smallest ordinal rating category. Default is 0 (see Details). |
It is assumed that the set of ordinal rating categories consists of all integers from the lowest rating category specified by minRating
to the highest rating category,
which is minRating + length(thresholds)
.
A numeric matrix of simulated rating scale data.
simdata
can be used to test the accuracy of msd
(see Examples).
Chris Bradley ([email protected])
# Use simdata to test the accuracy of msd. First, randomly generate item # measures, person measures and thresholds with 15 percent missing data and # ordinal rating categories from 0 to 5. Then, set mean item measure to zero # (axis origin in msd is the mean item measure) and mean threshold to zero # (any non-zero mean threshold is reflected in the person measures). im <- runif(100, -2, 2) pm <- runif(100, -2, 2) th <- sort(runif(5, -2, 2)) im <- im - mean(im) th <- th - mean(th) d <- simdata(im, pm, th, missingProb = 0.15, minRating = 0) m <- msd(d) # Compare msd parameters to true values. Linear regression should # yield a slope very close to 1 and an intercept very close to 0. lm(m$item_measures ~ im) lm(m$person_measures ~ pm) lm(m$thresholds ~ th)
# Use simdata to test the accuracy of msd. First, randomly generate item # measures, person measures and thresholds with 15 percent missing data and # ordinal rating categories from 0 to 5. Then, set mean item measure to zero # (axis origin in msd is the mean item measure) and mean threshold to zero # (any non-zero mean threshold is reflected in the person measures). im <- runif(100, -2, 2) pm <- runif(100, -2, 2) th <- sort(runif(5, -2, 2)) im <- im - mean(im) th <- th - mean(th) d <- simdata(im, pm, th, missingProb = 0.15, minRating = 0) m <- msd(d) # Compare msd parameters to true values. Linear regression should # yield a slope very close to 1 and an intercept very close to 0. lm(m$item_measures ~ im) lm(m$person_measures ~ pm) lm(m$thresholds ~ th)
Estimates rating category thresholds for msd
given rating scale data, item measures and person measures.
thresh(data, items, persons)
thresh(data, items, persons)
data |
a numeric matrix of ordinal rating scale data whose entries are integers with missing data set to NA. Rows are persons and columns are items. The ordinal rating scale is assumed to go from the smallest integer to the largest integer in |
items |
a numeric vector of item measures with missing values set to NA (see Details). |
persons |
a numeric vector of person measures with missing values set to NA (see Details). |
The length of items
must equal the number of columns in data
and the length of persons
must equal the number of rows in data
. Neither items
nor persons
can consist of only NA.
A list whose elements are:
thresholds |
a vector of average rating category thresholds used by the persons when rating the items |
threshold_std_errors |
a vector of standard errors for the thresholds |
thresh
is a special case of msd
when item measures and person measures are known.
Chris Bradley ([email protected])
# Using randomly generated values d <- as.numeric(sample(0:5, 1000, replace = TRUE)) m <- matrix(d, nrow = 50, ncol = 20) im <- runif(20, -2, 2) pm <- runif(50, -2, 2) th1 <- thresh(m, items = im, persons = pm) # Anchor first 10 item measures and first 10 person measures im[11:length(im)] <- NA pm[11:length(pm)] <- NA th2 <- thresh(m, items = im, persons = pm)
# Using randomly generated values d <- as.numeric(sample(0:5, 1000, replace = TRUE)) m <- matrix(d, nrow = 50, ncol = 20) im <- runif(20, -2, 2) pm <- runif(50, -2, 2) th1 <- thresh(m, items = im, persons = pm) # Anchor first 10 item measures and first 10 person measures im[11:length(im)] <- NA pm[11:length(pm)] <- NA th2 <- thresh(m, items = im, persons = pm)