Sample size calculations for kappa
Two unique raters, two ratings:
kapssi kappa, { se(#) | diff(#) [level(#)] | n(#) } p1(#) [ p2(#) round ]
Two or more (non-unique) raters, two ratings:
kapssi kappa, { se(#) | diff(#) [level(#)] | n(#) } p(#) [ m(#) round ]
Description
kapssi estimates required sample size for estimating the kappa-statistic of inter-rater reliability for a binary outcome (having postulated value kappa) with given standard error, or the standard error for a given sample size. If n() is specified, kapssi computes standard error; otherwise it computes sample size. kapssi is an immediate command; all of its arguments are numbers (see help immed).
For two raters, the results are the same as produced by sskdlg or sskapp (except for rounding; see round option below), based on the asymptotic variance presented by Fleiss, Cohen and Everitt (1969). Results for more than two raters are based on the asymptotic variance for the Fleiss-Cuzick estimator of kappa presented by Zou & Donner (2004) in the case of equal numbers of ratings for each subject.
Options
se(#) specifies the standard error of kappa.
diff(#) specifies the half width of the confidence interval for kappa as an alternative to the standard error.
level(#) specifies the significance level for the confidence interval; the default is obtained from set level (see help level), usually level(95).
n(#) specifies the sample size for which to calculate standard error.
p1(#) specifies the proportion of positive results reported by rater 1 (of two raters).
p2(#) specifies the proportion of positive results reported by rater 2 (of two raters); if p2 is not specified it is assumed to be equal to p1.
p(#) specifies the overall proportion of positive results (multiple raters).
m(#) specifies the number of raters; the default is m(2).
round specifies that the sample size is to be rounded to the nearest integer; the default is to round up using the function ceil(). This allows reproducability of results for two raters produced by sskdlg or sskapp which both have this behaviour.
Examples
Two raters. Compute sample size given standard error:
. kapssi .8, se(.1) p(.1)
Compute sample size given half width of confidence interval:
. kapssi .6, diff(.2) p1(.15) p2(.12) round
This is equivalent to:
. sskapp, p1(.15) p2(.12) diff(.2) kapp(.6)
More than two raters. Compute sample size:
. kapssi .75, se(.12) p(.05) m(3)
Compute standard error for given sample size:
. kapssi .8, n(100) p(.12) m(4)
References
Fleiss, J. L., Cohen, J. and Everitt, B.S. 1969. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72: 323-327.
Zou, G. and Donner, A. 2004. Confidence interval estimation of the intraclass correlation coefficient for binary outcome data. Biometrics 60: 807-811.
Maintainer
David A. Harrison Intensive Care National Audit & Research Centre david@icnarc.org
Also see
Online: help for kappa, sskdlg, sskapp, immed