help scores
-------------------------------------------------------------------------------

Title

scores -- creates scores (row-wise) of a set of variables allowing the specification of the number of valid values required.

Syntax

scores newvar = fcn(varlist) [if] [in] [weight] [, options]

where the syntax of fcn is

fcn Description ------------------------------------------------------------------------- min minima of varlist max maxima of varlist total total (sum) scores of varlist sd standard deviations of varlist mean mean scores of varlist median medians of varlist pctile percentiles of varlist

options Description ------------------------------------------------------------------------- Main minvalid(#) minimum number of variables with valid values; default is (1) score(argument) transformation of mean scores according to argument minval(#) theoretically lowest value of scale items - use with transformation argument pomp, prop, or sprop; default is (0) maxval(#) theoretically highest value of scale items - use with transformation argument pomp, prop, or sprop; default is (0) endshift(#) value added to lower end (0) and subtracted from upper end (1) of scores transformed to proportions - use with transformation argument prop; default is (0) (= no shift of ends) center(#) value around which shrunken scores transformed to proportions are centered - use with transformation argument sprop; default is (0.5) p(#) the #th percentiles of variables with valid values - use with function pctile; default is (50) (= medians) auto determine range of values of scale items automatically - use with transformation argument pomp, prop, or sprop replace replace existing variable by newvar

Sub (arguments of score()) z z-score transformation of mean scores centered centering of mean scores at the overall (group) mean pomp transformation of mean scores to POMP scores (interval [0,100]) prop transformation of mean scores to proportion of maximum possible scores (interval [0,1]) sprop transformation of mean scores to shrunken proportion of maximum possible scores (open interval ]0,1[) ------------------------------------------------------------------------- by is allowed (see by); aweights, fweights, and iweights are allowed (see weight).

Description

Using row functions of [D] egen, -scores- calculates scores according to fcn using variables listed in varlist and assigns them to the new variable newvar. If the number of valid values of varlist is less than minvalid(#), the resulting score will be set to missing.

If fcn is mean, the option score(argument) can be used to request a transformation of the scores to - z-scores, - to scores centered at the overall (group) mean, - POMP (= percent of maximum possible) scores with 0 and 100 as minimum and maximum possible values (see: Cohen, P., Cohen, J., Aiken, L.S., & West, S.G. (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34, 315-346), - the proportion of maximum possible scores with 0 and 1 as minimum and maximum possible scores, or - the shrunken proportion of maximum possible scores with values in the open interval ]0,1[ (see: Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11, 54-71).

The transformation arguments pomp, prop, and sprop require to specify the minimum and maximum values possible of the items listed in varlist by using the options minval(#) and maxval(#). Alternatively, if all items listed in varlist have value labels defining the same range of values, the minimum and maximum possible values can be determined automatically by using the option auto.

POMP scores are calculated by POMP = 100*(raw - min)/(max - min) with raw = original mean score of variables (items) with valid values, min = minimum possible value, and max = maximum possible value (min and max need not exist in the actual data) (see Cohen et al., 1999). The proportion of maximum possible scores is equal to POMP/100. When using as an independent variable in nonlinear models such as logistic regression models, the proportion of maximum possible scores is preferable to POMP scores to facilitate the interpretation of the exponentiated coefficient such as the odds ratio. When using as a dependent variable in beta regression models (see betafit and the SSC package -betafit- (click here)), 0 and 1 can't be used because their logits are undefined. There are two remedies: (a) Add a small value such as .001 to 0 and subtract it from 1 by using the option endshift(#), or (b) shrink the range of values centered around a certain value (usually 0.5) by transforming the proportion of maximum possible scores according to sprop = (prop*(n-1) + c)/n with prop = original proportion of maximum possible scores, n = number of valid cases, and c = center (see Smithson & Verkuilen, 2006, p. 54f). A different value for c can be specified by using the option center(#) (default: 0.5).

by is allowed (see by). However, by only makes a difference when using the score() option arguments z, centered, or sprop for obtaining z-scores, mean-centered scores, or shrunken proportions of maximum possible scores, respectively. The same is true for weighting (see weight): Using weights only makes a difference to z-scores, mean-centered scores, or shrunken proportions.

If newvar exists already, you can use the option replace to replace it by the new variable generated.

Options

+------+ ----+ Main +-------------------------------------------------------------

minvalid(#) specifies how many variables must have valid values for calculating a score. If the number of valid values is less than # the resulting score will be set to missing.

score(argument) requests a transformation of the scores. This will only work if mean scores have been requested by mean(varlist). Five transformations are possible according to argument (see below).

minval(#) specifies the theoretical minimum value of the items (# = lowest value possible) (only useful if percents or (shrunken) proportions of maximum possible scores have been requested by score(pomp), score(prop), or score(sprop)).

maxval(#) specifies the theoretical maximum value of the items (# = highest value possible) (only useful if percents or (shrunken) proportions of maximum possible scores have been requested by score(pomp), score(prop), or score(sprop)).

endshift(#) specifies the value to be added to 0 (lower end) and to be subtracted from 1 (upper end) (only useful if the proportions of maximum possible scores have been requested by score(prop)).

center(#) specifies the value around which proportions of maximum possible scores are to be shrunken (default: 0.5) (only useful if the shrunken proportions of possible scores have been requested by score(sprop)).

p(#) specifies the #th percentile scores to be calculated (default: 50) (only useful if percentiles have been requested by pctile(varlist)).

auto requests to determine the range of values of the items (lowest and highest possible values) automatically. auto requires that all items listed in varlist have value labels defining the same range of values. This option overrides values specified using minval(#) and maxval(#) (only useful if percents or (shrunken) proportions of maximum possible scores have been requested by score(pomp), score(prop), or score(sprop)).

replace requests that newvar will replace an already existing variable (if it exists).

+--------------------+ ----+ Sub (option score) +-----------------------------------------------

score(z) requests a transformation of the mean scores to z-scores (resulting mean = 0, sd = 1)

score(centered) requests to center the mean scores at the overall (group) mean (resulting mean = 0)

score(pomp) requests to transform the mean scores to POMP (percent of maximum possible) scores with 0 and 100 as minimum and maximum possible values. Note that score(pomp) requires specifying the minimum and maximum possible values of the items listed in varlist by using the options minval(#) and maxval(#), as well.

score(prop) requests to transform the mean scores to the proportion of maximum possible scores with 0 and 1 as minimum and maximum possible values (0 and 1 can be shifted to a larger and smaller value by using the option endshift(#)). Note that score(prop) requires specifying the minimum and maximum possible values of the items listed in varlist by using the options minval(#) and maxval(#), as well.

score(sprop) requests to transform the mean scores to the shrunken proportion of maximum possible scores in the open interval ]0,1[ (the proportion can be shifted around a certain value by using the option center(#) - the default value is 0.5). Note that score(sprop) requires specifying the minimum and maximum possible values of the items listed in varlist by using the options minval(#) and maxval(#), as well.

Examples

The following data set allows to show how -scores- creates scores depending on the number of missing values as specified by the user:

. clear all . input v1-v5

v1 v2 v3 v4 v5 1. 1 2 3 4 5 2. 1 2 3 4 . 3. 1 2 3 . . 4. 1 . 3 4 . 5. 1 2 . . . 6. 1 . . . . 7. . . . . . 8. 1 1 1 1 1 9. 5 5 5 5 5 10. end

Generate variable test containing the minima of the four variables v1, v3, v4, and v5 (per default the value of test is valid if at least one variable of varlist has a valid value):

. scores test=min(v1 v3-v5) . list, sep(0)

+-------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------| 1. | 1 2 3 4 5 1 | 2. | 1 2 3 4 . 1 | 3. | 1 2 3 . . 1 | 4. | 1 . 3 4 . 1 | 5. | 1 2 . . . 1 | 6. | 1 . . . . 1 | 7. | . . . . . . | 8. | 1 1 1 1 1 1 | 9. | 5 5 5 5 5 5 | +-------------------------------+

Replace variable test by test containing the minima of variables v1, v3, v4, and v5. Specify minvalid(3) so that test has valid values only if at least three values of the four variables of varlist are valid (or equivalently: at most 4-3=1 value may be missing):

. scores test=min(v1 v3-v5), nv(3) replace . list, sep(0)

+-------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------| 1. | 1 2 3 4 5 1 | 2. | 1 2 3 4 . 1 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 1 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 1 | 9. | 5 5 5 5 5 5 | +-------------------------------+

Same as above but using the maxima instead of the minima:

. scores test=max(v1 v3-v5), nv(3) replace . list, sep(0)

+-------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------| 1. | 1 2 3 4 5 5 | 2. | 1 2 3 4 . 4 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 4 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 1 | 9. | 5 5 5 5 5 5 | +-------------------------------+

Same as above but creating the total (sum) scores instead of the maxima:

. scores test=total(v1 v3-v5), nv(3) replace . list, sep(0)

+-------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------| 1. | 1 2 3 4 5 13 | 2. | 1 2 3 4 . 8 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 8 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 4 | 9. | 5 5 5 5 5 20 | +-------------------------------+

Same as above but creating the standard deviations (sd) instead of total scores:

. scores test=sd(v1 v3-v5), nv(3) replace . list, sep(0)

+------------------------------------+ | v1 v2 v3 v4 v5 test | |------------------------------------| 1. | 1 2 3 4 5 1.7078251 | 2. | 1 2 3 4 . 1.5275252 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 1.5275252 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 0 | 9. | 5 5 5 5 5 0 | +------------------------------------+

Same as above but creating median scores instead of the standard deviations:

. scores test=median(v1 v3-v5), nv(3) replace . list, sep(0)

+-------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------| 1. | 1 2 3 4 5 3.5 | 2. | 1 2 3 4 . 3 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 3 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 1 | 9. | 5 5 5 5 5 5 | +-------------------------------+

Same as above but creating 1st quartiles instead of the medians:

. scores test=pctile(v1 v3-v5), nv(3) p(25) replace . list, sep(0)

+-------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------| 1. | 1 2 3 4 5 2 | 2. | 1 2 3 4 . 1 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 1 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 1 | 9. | 5 5 5 5 5 5 | +-------------------------------+

Same as above but creating mean scores instead of the 1st quartiles:

. scores test=mean(v1 v3-v5), nv(3) replace . list, sep(0)

+------------------------------------+ | v1 v2 v3 v4 v5 test | |------------------------------------| 1. | 1 2 3 4 5 3.25 | 2. | 1 2 3 4 . 2.6666667 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 2.6666667 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 1 | 9. | 5 5 5 5 5 5 | +------------------------------------+

Same as above but creating mean scores centered at the overall mean:

. scores test=mean(v1 v3-v5), nv(3) sc(c) replace . list, sep(0)

+-------------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------------| 1. | 1 2 3 4 5 .33333333 | 2. | 1 2 3 4 . -.25 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . -.25 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 -1.9166667 | 9. | 5 5 5 5 5 2.0833333 | +-------------------------------------+

Same as above but creating z-scores instead of centered mean scores:

. scores test=mean(v1 v3-v5), nv(3) sc(z) replace . list, sep(0)

+-------------------------------------+ | v1 v2 v3 v4 v5 test | |-------------------------------------| 1. | 1 2 3 4 5 .23210354 | 2. | 1 2 3 4 . -.17407766 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . -.17407766 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 -1.3345954 | 9. | 5 5 5 5 5 1.4506471 | +-------------------------------------+

Same as above but instead of transforming the mean scores to z-scores, transforming them to POMP scores assuming Likert scale items with anchors ranging from 1 (minimum possible value) to 5 (maximum possible value):

. scores test=mean(v1 v3-v5), nv(3) sc(po) min(1) max(5) replace . list, sep(0)

+------------------------------------+ | v1 v2 v3 v4 v5 test | |------------------------------------| 1. | 1 2 3 4 5 56.25 | 2. | 1 2 3 4 . 41.666667 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . 41.666667 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 0 | 9. | 5 5 5 5 5 100 | +------------------------------------+

Same as above but instead of POMP scores transforming the mean scores to the proportions of maximum possible scores assuming Likert scale items with anchors ranging from 1 (minimum possible value) to 5 (maximum possible value):

. scores test=mean(v1 v3-v5), nv(3) sc(pp) min(1) max(5) replace . list, sep(0)

+------------------------------------+ | v1 v2 v3 v4 v5 test | |------------------------------------| 1. | 1 2 3 4 5 .5625 | 2. | 1 2 3 4 . .41666667 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . .41666667 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 0 | 9. | 5 5 5 5 5 1 | +------------------------------------+

Same as above but shifting the end points by adding and subtracting .01 from 0 and 1:

. scores test=mean(v1 v3-v5), nv(3) sc(pp) min(1) max(5) end(.01) replace . list, sep(0)

+------------------------------------+ | v1 v2 v3 v4 v5 test | |------------------------------------| 1. | 1 2 3 4 5 .5625 | 2. | 1 2 3 4 . .41666667 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . .41666667 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 .01 | 9. | 5 5 5 5 5 .99 | +------------------------------------+

Same as above but shrinking the proportions to the center of 0.5:

. scores test=mean(v1 v3-v5), nv(3) sc(sp) min(1) max(5) replace . list, sep(0)

+-----------------------------------+ | v1 v2 v3 v4 v5 test | |-----------------------------------| 1. | 1 2 3 4 5 .55 | 2. | 1 2 3 4 . .4333333 | 3. | 1 2 3 . . . | 4. | 1 . 3 4 . .4333333 | 5. | 1 2 . . . . | 6. | 1 . . . . . | 7. | . . . . . . | 8. | 1 1 1 1 1 .1 | 9. | 5 5 5 5 5 .9 | +-----------------------------------+

Saved results

scores saves the following scalars in r():

r(N) Number of non missing observations of newvar r(sum_w) sum of the weights used by creating newvar r(mean) mean of newvar r(Var) variance of newvar r(sd) standard deviation of newvar r(min) minimum of newvar r(max) maximum of newvar r(sum) sum of newvar

References

Cohen, P., Cohen, J., Aiken, L.S., & West, S.G. (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34, 315-346.

Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11, 54-71.

Also see

SSC package betafit (click here)

Acknowledgements

Thanks to Kit Baum for helpful advice and to Alan Acock for suggesting to calculate the (shrunken) proportions of maximum possible scores.

Author

Dirk Enzmann http://www2.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzman > n.html dirk.enzmann@uni-hamburg.de