-------------------------------------------------------------------------------
help for senspec                                                 (Roger Newson)
-------------------------------------------------------------------------------

Sensitivity and specificity results saved in generated variables

senspec refvar classvar [if exp] [in range] [, posif(relational_operator) sensitivity(newvarname) specificity(newvarname) fpos(newvarname) fneg(newvarname) ntpos(newvarname) ntneg(newvarname) nfpos(newvarname) nfneg(newvarname) float ]

where refvar and classvar ar names of an existing numeric reference variable and an existing numeric classification variable, respectively, and relational_operator may be any one of the relational operators >, <, >= or <=.

aweights, fweights, iweights, and pweights are allowed, and are all treated in the same way. See help for weights.

Description

senspec inputs a reference variable with two values and a quantitative classification variable. It creates, as output, a set of new variables, containing, in each observation, the numbers and/or rates of true positives, true negatives, false positives and false negatives observed if the classification variable is used to define a diagnostic test, with a threshold equal to the value of the classification variable for that observation. The two variables refvar and classvar must be numeric. The reference variable refvar indicates the true state of the observation, such as diseased and non-diseased, or normal and abnormal. It must have only 2 values, for example 0 and 1, of which the lower value identifies negative observations and the higher value identifies positive observations. The rating or outcome of the diagnostic test is recorded in classvar, which must be at least ordinal. senspec is similar to roctab, but produces output variables instead of plots and listings, so that users can create plots and listings in their own chosen formats.

Options

posif(relational_operator) specifies one of the 4 relational operators >, <, >= or <=. These relational operators specify that a positive test result is defined as a value of the classification variable above the threshold, below the threshold, at or above the threshold, or at or below the threshold, respectively. If posif() is not specified, then >= is assumed, and the test is assumed to have a positive result if and only if the classification variable is at or above the threshold.

sensitivity(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the sensitivity of the diagnostic test if the threshold is equal to the value of the classification variable in that observation.

specificity(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the specificity of the diagnostic test if the threshold is equal to the value of the classification variable in that observation.

fpos(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the false positive rate of the diagnostic test if the threshold is equal to the value of the classification variable in that observation.

fneg(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the false negative rate of the diagnostic test if the threshold is equal to the value of the classification variable in that observation.

ntpos(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the number of true positives (weighted if weights are specified) if the threshold is equal to the value of the classification variable in that observation.

ntneg(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the number of true negatives (weighted if weights are specified) if the threshold is equal to the value of the classification variable in that observation.

nfpos(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the number of false positives (weighted if weights are specified) if the threshold is equal to the value of the classification variable in that observation.

nfneg(newvarname) specifies the name of a new variable to be generated, containing, in each observation, the number of false negatives (weighted if weights are specified) if the threshold is equal to the value of the classification variable in that observation.

float specifies that the derived variables specified by the sensitivity(), {cmf:specificity()}, fpos() and fneg() options will have storage type no higher than float. If float is not specified, then these derived variables will be created initially as type double. Whether or not float is specified, the derived variables are compressed to the lowest numeric storage type possible without loss of information.

Equations and formulas

senspec starts by calculating the numbers (or weighted numbers, if weights are specified) of true positives, true negatives, false positives and false negatives. THese are stored in variables that may be saved for future use by the ntpos(), nfpos(), ntneg() and nfneg() options, respectively. These variables are created as type double if aweights, pweights or aweights are specified, or as type long if fweights or no weights are specified, and are compressed to the lowest numeric storage type possible without loss of information. senspec then calculates any derived variables requested by the options sensitivity(), {cmf:specificity()}, fpos() and fneg(), using the following formulas:

sensitivity=ntpos/(ntpos+nfneg)

specificity=ntneg/(ntneg+nfpos)

fpos=nfpos/(ntneg+nfpos)

fneg=nfneg/(ntpos+nfneg)

The user can calculate other results using other formulas by specifying the options ntpos(), nfpos(), ntneg() and nfneg(), and then generating further variables from the new variables. Confidence intervals for the area under the specificity-sensitivity (or ROC) curve can be calculated using the somersd package, downloadable from SSC. The somersd package offers a choice of normalizing and/or variance-stabilizing transformations, the ability to adjust for clustering, and the ability to calculate differences between ROC areas for two alternative classification variables using lincom. (See Newson, 2002.)

Examples

. senspec foreign mpg, sensitivity(sens1) fpos(fpos1) . scatter sens1 fpos1, sort(fpos sens1) connect(L) mlab(mpg)

The following example will work if the user has installed xcontract (downloadable from SSC).

. senspec foreign weight, posif(<=) sens(sens2) spec(spec2) . xcontract weight sens2 spec2, list(,) format(sens2 spec2 %8.4f)

Saved results

Scalars:

r(N) Number of observations r(N_pos) Number of positive observations r(N_neg) Number of negative observations

Author

Roger Newson, King's College, London, UK. Email: roger.newson@kcl.ac.uk

References

Newson R. 2002. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences. The Stata Journal 2(1): 45-64. Also downloadable from Roger Newson's website at http://www.kcl-phs.org.uk/rogernewson.

Also see

Manual: [R] roc

Online: help for roctab help for somersd and xcontract if installed