help scores
-------------------------------------------------------------------------------

Title

    scores -- creates scores (row-wise) of a set of variables allowing the
              specification of the number of valid values required.


Syntax

    scores newvar = fcn(varlist) [if] [in] [weight] [, options]


    where the syntax of fcn is

    fcn          Description
    -------------------------------------------------------------------------
    min          minima of varlist
    max          maxima of varlist
    total        total (sum) scores of varlist
    sd           standard deviations of varlist
    mean         mean scores of varlist
    median       medians of varlist
    pctile       percentiles of varlist


    options            Description
    -------------------------------------------------------------------------
    Main
      minvalid(#)      minimum number of variables with valid values; default
                       is (1)
      score(argument)  transformation of mean scores according to argument
      minval(#)        theoretically lowest value of scale items - use with
                       transformation argument pomp, prop, or sprop; default
                       is (0)
      maxval(#)        theoretically highest value of scale items - use with
                       transformation argument pomp, prop, or sprop; default
                       is (0)
      endshift(#)      value added to lower end (0) and subtracted from upper
                       end (1) of scores transformed to proportions - use
                       with transformation argument prop; default is (0) (=
                       no shift of ends)
      center(#)        value around which shrunken scores transformed to
                       proportions are centered - use with transformation
                       argument sprop; default is (0.5)
      p(#)             the #th percentiles of variables with valid values -
                       use with function pctile; default is (50) (= medians)
      auto             determine range of values of scale items automatically
                       - use with transformation argument pomp, prop, or
                       sprop
      replace          replace existing variable by newvar

    Sub (arguments of score())
      z                z-score transformation of mean scores
      centered         centering of mean scores at the overall (group) mean
      pomp             transformation of mean scores to POMP scores (interval
                       [0,100])
      prop             transformation of mean scores to proportion of maximum
                       possible scores (interval [0,1])
      sprop            transformation of mean scores to shrunken proportion
                       of maximum possible scores (open interval ]0,1[)
    -------------------------------------------------------------------------
    by is allowed (see by);
    aweights, fweights, and iweights are allowed (see weight).


Description

    Using row functions of [D] egen, -scores- calculates scores according to
    fcn using variables listed in varlist and assigns them to the new
    variable newvar. If the number of valid values of varlist is less than
    minvalid(#), the resulting score will be set to missing.

    If fcn is mean, the option score(argument) can be used to request a
    transformation of the scores to
    - z-scores,
    - to scores centered at the overall (group) mean,
    - POMP (= percent of maximum possible) scores with 0 and 100 as minimum
      and maximum possible values (see: Cohen, P., Cohen, J., Aiken, L.S., &
      West, S.G. (1999). The problem of units and the circumstance for POMP.
      Multivariate Behavioral Research, 34, 315-346),
    - the proportion of maximum possible scores with 0 and 1 as minimum and
      maximum possible scores, or
    - the shrunken proportion of maximum possible scores with values in the
      open interval ]0,1[ (see: Smithson, M. & Verkuilen, J. (2006). A better
      lemon squeezer? Maximum-likelihood regression with beta-distributed
      dependent variables. Psychological Methods, 11, 54-71).

    The transformation arguments pomp, prop, and sprop require to specify the
    minimum and maximum values possible of the items listed in varlist by
    using the options minval(#) and maxval(#). Alternatively, if all items
    listed in varlist have value labels defining the same range of values,
    the minimum and maximum possible values can be determined automatically
    by using the option auto.

    POMP scores are calculated by POMP = 100*(raw - min)/(max - min) with raw
    = original mean score of variables (items) with valid values, min =
    minimum possible value, and max = maximum possible value (min and max
    need not exist in the actual data) (see Cohen et al., 1999). The
    proportion of maximum possible scores is equal to POMP/100. When using as
    an independent variable in nonlinear models such as logistic regression
    models, the proportion of maximum possible scores is preferable to POMP
    scores to facilitate the interpretation of the exponentiated coefficient
    such as the odds ratio. When using as a dependent variable in beta
    regression models (see betafit and the SSC package -betafit- (click
    here)), 0 and 1 can't be used because their logits are undefined. There
    are two remedies: (a) Add a small value such as .001 to 0 and subtract it
    from 1 by using the option endshift(#), or (b) shrink the range of values
    centered around a certain value (usually 0.5) by transforming the
    proportion of maximum possible scores according to sprop = (prop*(n-1) +
    c)/n with prop = original proportion of maximum possible scores, n =
    number of valid cases, and c = center (see Smithson & Verkuilen, 2006, p.
    54f). A different value for c can be specified by using the option
    center(#) (default: 0.5).

    by is allowed (see by). However, by only makes a difference when using
    the score() option arguments z, centered, or sprop for obtaining
    z-scores, mean-centered scores, or shrunken proportions of maximum
    possible scores, respectively. The same is true for weighting (see 
    weight): Using weights only makes a difference to z-scores, mean-centered
    scores, or shrunken proportions.

    If newvar exists already, you can use the option replace to replace it by
    the new variable generated.


Options

        +------+
    ----+ Main +-------------------------------------------------------------

    minvalid(#) specifies how many variables must have valid values for
        calculating a score. If the number of valid values is less than # the
        resulting score will be set to missing.

    score(argument) requests a transformation of the scores. This will only
        work if mean scores have been requested by mean(varlist). Five
        transformations are possible according to argument (see below).

    minval(#) specifies the theoretical minimum value of the items (# =
        lowest value possible) (only useful if percents or (shrunken)
        proportions of maximum possible scores have been requested by
        score(pomp), score(prop), or score(sprop)).

    maxval(#) specifies the theoretical maximum value of the items (# =
        highest value possible) (only useful if percents or (shrunken)
        proportions of maximum possible scores have been requested by
        score(pomp), score(prop), or score(sprop)).

    endshift(#) specifies the value to be added to 0 (lower end) and to be
        subtracted from 1 (upper end) (only useful if the proportions of
        maximum possible scores have been requested by score(prop)).

    center(#) specifies the value around which proportions of maximum
        possible scores are to be shrunken (default: 0.5) (only useful if the
        shrunken proportions of possible scores have been requested by
        score(sprop)).

    p(#) specifies the #th percentile scores to be calculated (default: 50)
        (only useful if percentiles have been requested by pctile(varlist)).

    auto requests to determine the range of values of the items (lowest and
        highest possible values) automatically. auto requires that all items
        listed in varlist have value labels defining the same range of
        values. This option overrides values specified using minval(#) and
        maxval(#) (only useful if percents or (shrunken) proportions of
        maximum possible scores have been requested by score(pomp),
        score(prop), or score(sprop)).

    replace requests that newvar will replace an already existing variable
        (if it exists).

        +--------------------+
    ----+ Sub (option score) +-----------------------------------------------

    score(z) requests a transformation of the mean scores to z-scores
        (resulting mean = 0, sd = 1)

    score(centered) requests to center the mean scores at the overall (group)
        mean (resulting mean = 0)

    score(pomp) requests to transform the mean scores to POMP (percent of
        maximum possible) scores with 0 and 100 as minimum and maximum
        possible values. Note that score(pomp) requires specifying the
        minimum and maximum possible values of the items listed in varlist by
        using the options minval(#) and maxval(#), as well.

    score(prop) requests to transform the mean scores to the proportion of
        maximum possible scores with 0 and 1 as minimum and maximum possible
        values (0 and 1 can be shifted to a larger and smaller value by using
        the option endshift(#)). Note that score(prop) requires specifying
        the minimum and maximum possible values of the items listed in
        varlist by using the options minval(#) and maxval(#), as well.

    score(sprop) requests to transform the mean scores to the shrunken
        proportion of maximum possible scores in the open interval ]0,1[ (the
        proportion can be shifted around a certain value by using the option
        center(#) - the default value is 0.5). Note that score(sprop)
        requires specifying the minimum and maximum possible values of the
        items listed in varlist by using the options minval(#) and maxval(#),
        as well.


Examples

    The following data set allows to show how -scores- creates scores
    depending on the number of missing values as specified by the user:

. clear all
. input v1-v5

            v1         v2         v3         v4         v5
  1. 1 2 3 4 5
  2. 1 2 3 4 .
  3. 1 2 3 . .
  4. 1 . 3 4 .
  5. 1 2 . . .
  6. 1 . . . .
  7. . . . . .
  8. 1 1 1 1 1
  9. 5 5 5 5 5
 10. end


    Generate variable test containing the minima of the four variables v1,
    v3, v4, and v5 (per default the value of test is valid if at least one
    variable of varlist has a valid value):

. scores test=min(v1 v3-v5)
. list, sep(0)

     +-------------------------------+
     | v1   v2   v3   v4   v5   test |
     |-------------------------------|
  1. |  1    2    3    4    5      1 |
  2. |  1    2    3    4    .      1 |
  3. |  1    2    3    .    .      1 |
  4. |  1    .    3    4    .      1 |
  5. |  1    2    .    .    .      1 |
  6. |  1    .    .    .    .      1 |
  7. |  .    .    .    .    .      . |
  8. |  1    1    1    1    1      1 |
  9. |  5    5    5    5    5      5 |
     +-------------------------------+


    Replace variable test by test containing the minima of variables v1, v3,
    v4, and v5. Specify minvalid(3) so that test has valid values only if at
    least three values of the four variables of varlist are valid (or
    equivalently: at most 4-3=1 value may be missing):

. scores test=min(v1 v3-v5), nv(3) replace
. list, sep(0)

     +-------------------------------+
     | v1   v2   v3   v4   v5   test |
     |-------------------------------|
  1. |  1    2    3    4    5      1 |
  2. |  1    2    3    4    .      1 |
  3. |  1    2    3    .    .      . |
  4. |  1    .    3    4    .      1 |
  5. |  1    2    .    .    .      . |
  6. |  1    .    .    .    .      . |
  7. |  .    .    .    .    .      . |
  8. |  1    1    1    1    1      1 |
  9. |  5    5    5    5    5      5 |
     +-------------------------------+


    Same as above but using the maxima instead of the minima:

. scores test=max(v1 v3-v5), nv(3) replace
. list, sep(0)

     +-------------------------------+
     | v1   v2   v3   v4   v5   test |
     |-------------------------------|
  1. |  1    2    3    4    5      5 |
  2. |  1    2    3    4    .      4 |
  3. |  1    2    3    .    .      . |
  4. |  1    .    3    4    .      4 |
  5. |  1    2    .    .    .      . |
  6. |  1    .    .    .    .      . |
  7. |  .    .    .    .    .      . |
  8. |  1    1    1    1    1      1 |
  9. |  5    5    5    5    5      5 |
     +-------------------------------+


    Same as above but creating the total (sum) scores instead of the maxima:

. scores test=total(v1 v3-v5), nv(3) replace
. list, sep(0)

     +-------------------------------+
     | v1   v2   v3   v4   v5   test |
     |-------------------------------|
  1. |  1    2    3    4    5     13 |
  2. |  1    2    3    4    .      8 |
  3. |  1    2    3    .    .      . |
  4. |  1    .    3    4    .      8 |
  5. |  1    2    .    .    .      . |
  6. |  1    .    .    .    .      . |
  7. |  .    .    .    .    .      . |
  8. |  1    1    1    1    1      4 |
  9. |  5    5    5    5    5     20 |
     +-------------------------------+


    Same as above but creating the standard deviations (sd) instead of total
    scores:

. scores test=sd(v1 v3-v5), nv(3) replace
. list, sep(0)

     +------------------------------------+
     | v1   v2   v3   v4   v5        test |
     |------------------------------------|
  1. |  1    2    3    4    5   1.7078251 |
  2. |  1    2    3    4    .   1.5275252 |
  3. |  1    2    3    .    .           . |
  4. |  1    .    3    4    .   1.5275252 |
  5. |  1    2    .    .    .           . |
  6. |  1    .    .    .    .           . |
  7. |  .    .    .    .    .           . |
  8. |  1    1    1    1    1           0 |
  9. |  5    5    5    5    5           0 |
     +------------------------------------+


    Same as above but creating median scores instead of the standard
    deviations:

. scores test=median(v1 v3-v5), nv(3) replace
. list, sep(0)

     +-------------------------------+
     | v1   v2   v3   v4   v5   test |
     |-------------------------------|
  1. |  1    2    3    4    5    3.5 |
  2. |  1    2    3    4    .      3 |
  3. |  1    2    3    .    .      . |
  4. |  1    .    3    4    .      3 |
  5. |  1    2    .    .    .      . |
  6. |  1    .    .    .    .      . |
  7. |  .    .    .    .    .      . |
  8. |  1    1    1    1    1      1 |
  9. |  5    5    5    5    5      5 |
     +-------------------------------+


    Same as above but creating 1st quartiles instead of the medians:

. scores test=pctile(v1 v3-v5), nv(3) p(25) replace
. list, sep(0)

     +-------------------------------+
     | v1   v2   v3   v4   v5   test |
     |-------------------------------|
  1. |  1    2    3    4    5      2 |
  2. |  1    2    3    4    .      1 |
  3. |  1    2    3    .    .      . |
  4. |  1    .    3    4    .      1 |
  5. |  1    2    .    .    .      . |
  6. |  1    .    .    .    .      . |
  7. |  .    .    .    .    .      . |
  8. |  1    1    1    1    1      1 |
  9. |  5    5    5    5    5      5 |
     +-------------------------------+


    Same as above but creating mean scores instead of the 1st quartiles:

. scores test=mean(v1 v3-v5), nv(3) replace
. list, sep(0)

     +------------------------------------+
     | v1   v2   v3   v4   v5        test |
     |------------------------------------|
  1. |  1    2    3    4    5        3.25 |
  2. |  1    2    3    4    .   2.6666667 |
  3. |  1    2    3    .    .           . |
  4. |  1    .    3    4    .   2.6666667 |
  5. |  1    2    .    .    .           . |
  6. |  1    .    .    .    .           . |
  7. |  .    .    .    .    .           . |
  8. |  1    1    1    1    1           1 |
  9. |  5    5    5    5    5           5 |
     +------------------------------------+


    Same as above but creating mean scores centered at the overall mean:

. scores test=mean(v1 v3-v5), nv(3) sc(c) replace
. list, sep(0)

     +-------------------------------------+
     | v1   v2   v3   v4   v5         test |
     |-------------------------------------|
  1. |  1    2    3    4    5    .33333333 |
  2. |  1    2    3    4    .         -.25 |
  3. |  1    2    3    .    .            . |
  4. |  1    .    3    4    .         -.25 |
  5. |  1    2    .    .    .            . |
  6. |  1    .    .    .    .            . |
  7. |  .    .    .    .    .            . |
  8. |  1    1    1    1    1   -1.9166667 |
  9. |  5    5    5    5    5    2.0833333 |
     +-------------------------------------+


    Same as above but creating z-scores instead of centered mean scores:

. scores test=mean(v1 v3-v5), nv(3) sc(z) replace
. list, sep(0)

     +-------------------------------------+
     | v1   v2   v3   v4   v5         test |
     |-------------------------------------|
  1. |  1    2    3    4    5    .23210354 |
  2. |  1    2    3    4    .   -.17407766 |
  3. |  1    2    3    .    .            . |
  4. |  1    .    3    4    .   -.17407766 |
  5. |  1    2    .    .    .            . |
  6. |  1    .    .    .    .            . |
  7. |  .    .    .    .    .            . |
  8. |  1    1    1    1    1   -1.3345954 |
  9. |  5    5    5    5    5    1.4506471 |
     +-------------------------------------+


    Same as above but instead of transforming the mean scores to z-scores,
    transforming them to POMP scores assuming Likert scale items with anchors
    ranging from 1 (minimum possible value) to 5 (maximum possible value):

. scores test=mean(v1 v3-v5), nv(3) sc(po) min(1) max(5) replace
. list, sep(0)

     +------------------------------------+
     | v1   v2   v3   v4   v5        test |
     |------------------------------------|
  1. |  1    2    3    4    5       56.25 |
  2. |  1    2    3    4    .   41.666667 |
  3. |  1    2    3    .    .           . |
  4. |  1    .    3    4    .   41.666667 |
  5. |  1    2    .    .    .           . |
  6. |  1    .    .    .    .           . |
  7. |  .    .    .    .    .           . |
  8. |  1    1    1    1    1           0 |
  9. |  5    5    5    5    5         100 |
     +------------------------------------+


    Same as above but instead of POMP scores transforming the mean scores to
    the proportions of maximum possible scores assuming Likert scale items
    with anchors ranging from 1 (minimum possible value) to 5 (maximum
    possible value):

. scores test=mean(v1 v3-v5), nv(3) sc(pp) min(1) max(5) replace
. list, sep(0)

     +------------------------------------+
     | v1   v2   v3   v4   v5        test |
     |------------------------------------|
  1. |  1    2    3    4    5       .5625 |
  2. |  1    2    3    4    .   .41666667 |
  3. |  1    2    3    .    .           . |
  4. |  1    .    3    4    .   .41666667 |
  5. |  1    2    .    .    .           . |
  6. |  1    .    .    .    .           . |
  7. |  .    .    .    .    .           . |
  8. |  1    1    1    1    1           0 |
  9. |  5    5    5    5    5           1 |
     +------------------------------------+


    Same as above but shifting the end points by adding and subtracting .01
    from 0 and 1:

. scores test=mean(v1 v3-v5), nv(3) sc(pp) min(1) max(5) end(.01) replace
. list, sep(0)

     +------------------------------------+
     | v1   v2   v3   v4   v5        test |
     |------------------------------------|
  1. |  1    2    3    4    5       .5625 |
  2. |  1    2    3    4    .   .41666667 |
  3. |  1    2    3    .    .           . |
  4. |  1    .    3    4    .   .41666667 |
  5. |  1    2    .    .    .           . |
  6. |  1    .    .    .    .           . |
  7. |  .    .    .    .    .           . |
  8. |  1    1    1    1    1         .01 |
  9. |  5    5    5    5    5         .99 |
     +------------------------------------+


    Same as above but shrinking the proportions to the center of 0.5:

. scores test=mean(v1 v3-v5), nv(3) sc(sp) min(1) max(5) replace
. list, sep(0)

     +-----------------------------------+
     | v1   v2   v3   v4   v5       test |
     |-----------------------------------|
  1. |  1    2    3    4    5        .55 |
  2. |  1    2    3    4    .   .4333333 |
  3. |  1    2    3    .    .          . |
  4. |  1    .    3    4    .   .4333333 |
  5. |  1    2    .    .    .          . |
  6. |  1    .    .    .    .          . |
  7. |  .    .    .    .    .          . |
  8. |  1    1    1    1    1         .1 |
  9. |  5    5    5    5    5         .9 |
     +-----------------------------------+


Saved results

    scores saves the following scalars in r():

      r(N)        Number of non missing observations of newvar
      r(sum_w)    sum of the weights used by creating newvar
      r(mean)     mean of newvar
      r(Var)      variance of newvar
      r(sd)       standard deviation of newvar
      r(min)      minimum of newvar
      r(max)      maximum of newvar
      r(sum)      sum of newvar


References

    Cohen, P., Cohen, J., Aiken, L.S., & West, S.G. (1999). The problem of
       units and the circumstance for POMP. Multivariate Behavioral Research,
       34, 315-346.

    Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer?
       Maximum-likelihood regression with beta-distributed dependent
       variables. Psychological Methods, 11, 54-71.

Also see

    SSC package betafit (click here)

Acknowledgements

    Thanks to Kit Baum for helpful advice and to Alan Acock for suggesting to
    calculate the (shrunken) proportions of maximum possible scores.

Author

    Dirk Enzmann
    http://www2.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzman
> n.html
    dirk.enzmann@uni-hamburg.de