```help scores
-------------------------------------------------------------------------------

Title

scores -- creates scores (row-wise) of a set of variables allowing the
specification of the number of valid values required.

Syntax

scores newvar = fcn(varlist) [if] [in] [weight] [, options]

where the syntax of fcn is

fcn          Description
-------------------------------------------------------------------------
min          minima of varlist
max          maxima of varlist
total        total (sum) scores of varlist
sd           standard deviations of varlist
mean         mean scores of varlist
median       medians of varlist
pctile       percentiles of varlist

options            Description
-------------------------------------------------------------------------
Main
minvalid(#)      minimum number of variables with valid values; default
is (1)
score(argument)  transformation of mean scores according to argument
minval(#)        theoretically lowest value of scale items - use with
transformation argument pomp, prop, or sprop; default
is (0)
maxval(#)        theoretically highest value of scale items - use with
transformation argument pomp, prop, or sprop; default
is (0)
endshift(#)      value added to lower end (0) and subtracted from upper
end (1) of scores transformed to proportions - use
with transformation argument prop; default is (0) (=
no shift of ends)
center(#)        value around which shrunken scores transformed to
proportions are centered - use with transformation
argument sprop; default is (0.5)
p(#)             the #th percentiles of variables with valid values -
use with function pctile; default is (50) (= medians)
auto             determine range of values of scale items automatically
- use with transformation argument pomp, prop, or
sprop
replace          replace existing variable by newvar

Sub (arguments of score())
z                z-score transformation of mean scores
centered         centering of mean scores at the overall (group) mean
pomp             transformation of mean scores to POMP scores (interval
[0,100])
prop             transformation of mean scores to proportion of maximum
possible scores (interval [0,1])
sprop            transformation of mean scores to shrunken proportion
of maximum possible scores (open interval ]0,1[)
-------------------------------------------------------------------------
by is allowed (see by);
aweights, fweights, and iweights are allowed (see weight).

Description

Using row functions of [D] egen, -scores- calculates scores according to
fcn using variables listed in varlist and assigns them to the new
variable newvar. If the number of valid values of varlist is less than
minvalid(#), the resulting score will be set to missing.

If fcn is mean, the option score(argument) can be used to request a
transformation of the scores to
- z-scores,
- to scores centered at the overall (group) mean,
- POMP (= percent of maximum possible) scores with 0 and 100 as minimum
and maximum possible values (see: Cohen, P., Cohen, J., Aiken, L.S., &
West, S.G. (1999). The problem of units and the circumstance for POMP.
Multivariate Behavioral Research, 34, 315-346),
- the proportion of maximum possible scores with 0 and 1 as minimum and
maximum possible scores, or
- the shrunken proportion of maximum possible scores with values in the
open interval ]0,1[ (see: Smithson, M. & Verkuilen, J. (2006). A better
lemon squeezer? Maximum-likelihood regression with beta-distributed
dependent variables. Psychological Methods, 11, 54-71).

The transformation arguments pomp, prop, and sprop require to specify the
minimum and maximum values possible of the items listed in varlist by
using the options minval(#) and maxval(#). Alternatively, if all items
listed in varlist have value labels defining the same range of values,
the minimum and maximum possible values can be determined automatically
by using the option auto.

POMP scores are calculated by POMP = 100*(raw - min)/(max - min) with raw
= original mean score of variables (items) with valid values, min =
minimum possible value, and max = maximum possible value (min and max
need not exist in the actual data) (see Cohen et al., 1999). The
proportion of maximum possible scores is equal to POMP/100. When using as
an independent variable in nonlinear models such as logistic regression
models, the proportion of maximum possible scores is preferable to POMP
scores to facilitate the interpretation of the exponentiated coefficient
such as the odds ratio. When using as a dependent variable in beta
regression models (see betafit and the SSC package -betafit- (click
here)), 0 and 1 can't be used because their logits are undefined. There
are two remedies: (a) Add a small value such as .001 to 0 and subtract it
from 1 by using the option endshift(#), or (b) shrink the range of values
centered around a certain value (usually 0.5) by transforming the
proportion of maximum possible scores according to sprop = (prop*(n-1) +
c)/n with prop = original proportion of maximum possible scores, n =
number of valid cases, and c = center (see Smithson & Verkuilen, 2006, p.
54f). A different value for c can be specified by using the option
center(#) (default: 0.5).

by is allowed (see by). However, by only makes a difference when using
the score() option arguments z, centered, or sprop for obtaining
z-scores, mean-centered scores, or shrunken proportions of maximum
possible scores, respectively. The same is true for weighting (see
weight): Using weights only makes a difference to z-scores, mean-centered
scores, or shrunken proportions.

If newvar exists already, you can use the option replace to replace it by
the new variable generated.

Options

+------+
----+ Main +-------------------------------------------------------------

minvalid(#) specifies how many variables must have valid values for
calculating a score. If the number of valid values is less than # the
resulting score will be set to missing.

score(argument) requests a transformation of the scores. This will only
work if mean scores have been requested by mean(varlist). Five
transformations are possible according to argument (see below).

minval(#) specifies the theoretical minimum value of the items (# =
lowest value possible) (only useful if percents or (shrunken)
proportions of maximum possible scores have been requested by
score(pomp), score(prop), or score(sprop)).

maxval(#) specifies the theoretical maximum value of the items (# =
highest value possible) (only useful if percents or (shrunken)
proportions of maximum possible scores have been requested by
score(pomp), score(prop), or score(sprop)).

endshift(#) specifies the value to be added to 0 (lower end) and to be
subtracted from 1 (upper end) (only useful if the proportions of
maximum possible scores have been requested by score(prop)).

center(#) specifies the value around which proportions of maximum
possible scores are to be shrunken (default: 0.5) (only useful if the
shrunken proportions of possible scores have been requested by
score(sprop)).

p(#) specifies the #th percentile scores to be calculated (default: 50)
(only useful if percentiles have been requested by pctile(varlist)).

auto requests to determine the range of values of the items (lowest and
highest possible values) automatically. auto requires that all items
listed in varlist have value labels defining the same range of
values. This option overrides values specified using minval(#) and
maxval(#) (only useful if percents or (shrunken) proportions of
maximum possible scores have been requested by score(pomp),
score(prop), or score(sprop)).

replace requests that newvar will replace an already existing variable
(if it exists).

+--------------------+
----+ Sub (option score) +-----------------------------------------------

score(z) requests a transformation of the mean scores to z-scores
(resulting mean = 0, sd = 1)

score(centered) requests to center the mean scores at the overall (group)
mean (resulting mean = 0)

score(pomp) requests to transform the mean scores to POMP (percent of
maximum possible) scores with 0 and 100 as minimum and maximum
possible values. Note that score(pomp) requires specifying the
minimum and maximum possible values of the items listed in varlist by
using the options minval(#) and maxval(#), as well.

score(prop) requests to transform the mean scores to the proportion of
maximum possible scores with 0 and 1 as minimum and maximum possible
values (0 and 1 can be shifted to a larger and smaller value by using
the option endshift(#)). Note that score(prop) requires specifying
the minimum and maximum possible values of the items listed in
varlist by using the options minval(#) and maxval(#), as well.

score(sprop) requests to transform the mean scores to the shrunken
proportion of maximum possible scores in the open interval ]0,1[ (the
proportion can be shifted around a certain value by using the option
center(#) - the default value is 0.5). Note that score(sprop)
requires specifying the minimum and maximum possible values of the
items listed in varlist by using the options minval(#) and maxval(#),
as well.

Examples

The following data set allows to show how -scores- creates scores
depending on the number of missing values as specified by the user:

. clear all
. input v1-v5

v1         v2         v3         v4         v5
1. 1 2 3 4 5
2. 1 2 3 4 .
3. 1 2 3 . .
4. 1 . 3 4 .
5. 1 2 . . .
6. 1 . . . .
7. . . . . .
8. 1 1 1 1 1
9. 5 5 5 5 5
10. end

Generate variable test containing the minima of the four variables v1,
v3, v4, and v5 (per default the value of test is valid if at least one
variable of varlist has a valid value):

. scores test=min(v1 v3-v5)
. list, sep(0)

+-------------------------------+
| v1   v2   v3   v4   v5   test |
|-------------------------------|
1. |  1    2    3    4    5      1 |
2. |  1    2    3    4    .      1 |
3. |  1    2    3    .    .      1 |
4. |  1    .    3    4    .      1 |
5. |  1    2    .    .    .      1 |
6. |  1    .    .    .    .      1 |
7. |  .    .    .    .    .      . |
8. |  1    1    1    1    1      1 |
9. |  5    5    5    5    5      5 |
+-------------------------------+

Replace variable test by test containing the minima of variables v1, v3,
v4, and v5. Specify minvalid(3) so that test has valid values only if at
least three values of the four variables of varlist are valid (or
equivalently: at most 4-3=1 value may be missing):

. scores test=min(v1 v3-v5), nv(3) replace
. list, sep(0)

+-------------------------------+
| v1   v2   v3   v4   v5   test |
|-------------------------------|
1. |  1    2    3    4    5      1 |
2. |  1    2    3    4    .      1 |
3. |  1    2    3    .    .      . |
4. |  1    .    3    4    .      1 |
5. |  1    2    .    .    .      . |
6. |  1    .    .    .    .      . |
7. |  .    .    .    .    .      . |
8. |  1    1    1    1    1      1 |
9. |  5    5    5    5    5      5 |
+-------------------------------+

Same as above but using the maxima instead of the minima:

. scores test=max(v1 v3-v5), nv(3) replace
. list, sep(0)

+-------------------------------+
| v1   v2   v3   v4   v5   test |
|-------------------------------|
1. |  1    2    3    4    5      5 |
2. |  1    2    3    4    .      4 |
3. |  1    2    3    .    .      . |
4. |  1    .    3    4    .      4 |
5. |  1    2    .    .    .      . |
6. |  1    .    .    .    .      . |
7. |  .    .    .    .    .      . |
8. |  1    1    1    1    1      1 |
9. |  5    5    5    5    5      5 |
+-------------------------------+

Same as above but creating the total (sum) scores instead of the maxima:

. scores test=total(v1 v3-v5), nv(3) replace
. list, sep(0)

+-------------------------------+
| v1   v2   v3   v4   v5   test |
|-------------------------------|
1. |  1    2    3    4    5     13 |
2. |  1    2    3    4    .      8 |
3. |  1    2    3    .    .      . |
4. |  1    .    3    4    .      8 |
5. |  1    2    .    .    .      . |
6. |  1    .    .    .    .      . |
7. |  .    .    .    .    .      . |
8. |  1    1    1    1    1      4 |
9. |  5    5    5    5    5     20 |
+-------------------------------+

Same as above but creating the standard deviations (sd) instead of total
scores:

. scores test=sd(v1 v3-v5), nv(3) replace
. list, sep(0)

+------------------------------------+
| v1   v2   v3   v4   v5        test |
|------------------------------------|
1. |  1    2    3    4    5   1.7078251 |
2. |  1    2    3    4    .   1.5275252 |
3. |  1    2    3    .    .           . |
4. |  1    .    3    4    .   1.5275252 |
5. |  1    2    .    .    .           . |
6. |  1    .    .    .    .           . |
7. |  .    .    .    .    .           . |
8. |  1    1    1    1    1           0 |
9. |  5    5    5    5    5           0 |
+------------------------------------+

Same as above but creating median scores instead of the standard
deviations:

. scores test=median(v1 v3-v5), nv(3) replace
. list, sep(0)

+-------------------------------+
| v1   v2   v3   v4   v5   test |
|-------------------------------|
1. |  1    2    3    4    5    3.5 |
2. |  1    2    3    4    .      3 |
3. |  1    2    3    .    .      . |
4. |  1    .    3    4    .      3 |
5. |  1    2    .    .    .      . |
6. |  1    .    .    .    .      . |
7. |  .    .    .    .    .      . |
8. |  1    1    1    1    1      1 |
9. |  5    5    5    5    5      5 |
+-------------------------------+

Same as above but creating 1st quartiles instead of the medians:

. scores test=pctile(v1 v3-v5), nv(3) p(25) replace
. list, sep(0)

+-------------------------------+
| v1   v2   v3   v4   v5   test |
|-------------------------------|
1. |  1    2    3    4    5      2 |
2. |  1    2    3    4    .      1 |
3. |  1    2    3    .    .      . |
4. |  1    .    3    4    .      1 |
5. |  1    2    .    .    .      . |
6. |  1    .    .    .    .      . |
7. |  .    .    .    .    .      . |
8. |  1    1    1    1    1      1 |
9. |  5    5    5    5    5      5 |
+-------------------------------+

Same as above but creating mean scores instead of the 1st quartiles:

. scores test=mean(v1 v3-v5), nv(3) replace
. list, sep(0)

+------------------------------------+
| v1   v2   v3   v4   v5        test |
|------------------------------------|
1. |  1    2    3    4    5        3.25 |
2. |  1    2    3    4    .   2.6666667 |
3. |  1    2    3    .    .           . |
4. |  1    .    3    4    .   2.6666667 |
5. |  1    2    .    .    .           . |
6. |  1    .    .    .    .           . |
7. |  .    .    .    .    .           . |
8. |  1    1    1    1    1           1 |
9. |  5    5    5    5    5           5 |
+------------------------------------+

Same as above but creating mean scores centered at the overall mean:

. scores test=mean(v1 v3-v5), nv(3) sc(c) replace
. list, sep(0)

+-------------------------------------+
| v1   v2   v3   v4   v5         test |
|-------------------------------------|
1. |  1    2    3    4    5    .33333333 |
2. |  1    2    3    4    .         -.25 |
3. |  1    2    3    .    .            . |
4. |  1    .    3    4    .         -.25 |
5. |  1    2    .    .    .            . |
6. |  1    .    .    .    .            . |
7. |  .    .    .    .    .            . |
8. |  1    1    1    1    1   -1.9166667 |
9. |  5    5    5    5    5    2.0833333 |
+-------------------------------------+

Same as above but creating z-scores instead of centered mean scores:

. scores test=mean(v1 v3-v5), nv(3) sc(z) replace
. list, sep(0)

+-------------------------------------+
| v1   v2   v3   v4   v5         test |
|-------------------------------------|
1. |  1    2    3    4    5    .23210354 |
2. |  1    2    3    4    .   -.17407766 |
3. |  1    2    3    .    .            . |
4. |  1    .    3    4    .   -.17407766 |
5. |  1    2    .    .    .            . |
6. |  1    .    .    .    .            . |
7. |  .    .    .    .    .            . |
8. |  1    1    1    1    1   -1.3345954 |
9. |  5    5    5    5    5    1.4506471 |
+-------------------------------------+

Same as above but instead of transforming the mean scores to z-scores,
transforming them to POMP scores assuming Likert scale items with anchors
ranging from 1 (minimum possible value) to 5 (maximum possible value):

. scores test=mean(v1 v3-v5), nv(3) sc(po) min(1) max(5) replace
. list, sep(0)

+------------------------------------+
| v1   v2   v3   v4   v5        test |
|------------------------------------|
1. |  1    2    3    4    5       56.25 |
2. |  1    2    3    4    .   41.666667 |
3. |  1    2    3    .    .           . |
4. |  1    .    3    4    .   41.666667 |
5. |  1    2    .    .    .           . |
6. |  1    .    .    .    .           . |
7. |  .    .    .    .    .           . |
8. |  1    1    1    1    1           0 |
9. |  5    5    5    5    5         100 |
+------------------------------------+

Same as above but instead of POMP scores transforming the mean scores to
the proportions of maximum possible scores assuming Likert scale items
with anchors ranging from 1 (minimum possible value) to 5 (maximum
possible value):

. scores test=mean(v1 v3-v5), nv(3) sc(pp) min(1) max(5) replace
. list, sep(0)

+------------------------------------+
| v1   v2   v3   v4   v5        test |
|------------------------------------|
1. |  1    2    3    4    5       .5625 |
2. |  1    2    3    4    .   .41666667 |
3. |  1    2    3    .    .           . |
4. |  1    .    3    4    .   .41666667 |
5. |  1    2    .    .    .           . |
6. |  1    .    .    .    .           . |
7. |  .    .    .    .    .           . |
8. |  1    1    1    1    1           0 |
9. |  5    5    5    5    5           1 |
+------------------------------------+

Same as above but shifting the end points by adding and subtracting .01
from 0 and 1:

. scores test=mean(v1 v3-v5), nv(3) sc(pp) min(1) max(5) end(.01) replace
. list, sep(0)

+------------------------------------+
| v1   v2   v3   v4   v5        test |
|------------------------------------|
1. |  1    2    3    4    5       .5625 |
2. |  1    2    3    4    .   .41666667 |
3. |  1    2    3    .    .           . |
4. |  1    .    3    4    .   .41666667 |
5. |  1    2    .    .    .           . |
6. |  1    .    .    .    .           . |
7. |  .    .    .    .    .           . |
8. |  1    1    1    1    1         .01 |
9. |  5    5    5    5    5         .99 |
+------------------------------------+

Same as above but shrinking the proportions to the center of 0.5:

. scores test=mean(v1 v3-v5), nv(3) sc(sp) min(1) max(5) replace
. list, sep(0)

+-----------------------------------+
| v1   v2   v3   v4   v5       test |
|-----------------------------------|
1. |  1    2    3    4    5        .55 |
2. |  1    2    3    4    .   .4333333 |
3. |  1    2    3    .    .          . |
4. |  1    .    3    4    .   .4333333 |
5. |  1    2    .    .    .          . |
6. |  1    .    .    .    .          . |
7. |  .    .    .    .    .          . |
8. |  1    1    1    1    1         .1 |
9. |  5    5    5    5    5         .9 |
+-----------------------------------+

Saved results

scores saves the following scalars in r():

r(N)        Number of non missing observations of newvar
r(sum_w)    sum of the weights used by creating newvar
r(mean)     mean of newvar
r(Var)      variance of newvar
r(sd)       standard deviation of newvar
r(min)      minimum of newvar
r(max)      maximum of newvar
r(sum)      sum of newvar

References

Cohen, P., Cohen, J., Aiken, L.S., & West, S.G. (1999). The problem of
units and the circumstance for POMP. Multivariate Behavioral Research,
34, 315-346.

Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer?
Maximum-likelihood regression with beta-distributed dependent
variables. Psychological Methods, 11, 54-71.

Also see