-------------------------------------------------------------------------------
help for byvar                                                  Patrick Royston
-------------------------------------------------------------------------------

Repeat command by variable(s)


        byvar varlist [if exp] [in range] [ , b(coeflist) e(elist) generate
                 missing nolabel pause r(rlist) return se(selist) tabulate
                 unique ] : stata_cmd[@stata_cmd ...]


Description

    byvar repeats stata_cmd (and each additional stata_cmd following each @,
    if present) for each distinct combination of values in varlist.  The
    latter may contain string variables.

    For details of storage of results, see the generate and return options.


Options

    b(coeflist) stores the regression coefficients for variables named in
        coeflist. Individual items may be labelled as with the e() option.

    e(elist) saves the E-class estimates e() named in elist which arise from
        the final stata_cmd in stata_cmd[@stata_cmd ...]. The E-class
        estimates must evaluate to numbers; strings are not allowed. The
        estimate names must be separated by space(s). You may append a label,
        preceded by an = sign, to each estimate name; this will be used to
        label the corresponding column of output (if the tabulate option is
        used) or variable (if the generate option is used). The label will be
        truncated to 14 characters if it is longer than 14. If spaces are to
        be included, the label must be enclosed within quotes (""). Commas,
        colons or equals signs are not allowed anywhere within in the label.
        Example of e() option with such labelling: e(rmse="RMS error" F="F
        statistic" N).

    generate creates new variable(s) corresponding to the quantities named in
        the e(), r(), b() and se() options. The names of the new variables
        begin with letter E, R, B and S, respectively, followed by up to six
        characters which represent the e(), r(), b() and se() quantity or
        variable name.  The final character is _ (or sometimes, to avoid
        overwriting, a letter).  For example, e(rmse N) generate would create
        variables called Ermse_ and EN_, containing the values of e(rmse) and
        e(N), respectively, as left behind by each execution of the final
        stata_cmd in stata_cmd[@stata_cmd ...]. Results are stored according
        to the combinations of values of the by-variables in varlist.

    missing causes stata_cmd(s) to be executed even when a combination of
        values of any of the variables in varlist involves a missing value.
        The idea is the same as for the missing option in Stata's tabulate
        command.

    nolabel suppresses display of score labels for categoric variables for
        which score labels are defined. Numeric values are used instead.

    pause pauses output after each execution of stata_cmd. Useful for graphs.

    r(rlist) saves the R-class results r() named in rlist which arise from
        the final stata_cmd in stata_cmd[@stata_cmd ...]. The estimates must
        evaluate to numbers; strings are not allowed. Individual items may be
        labelled as with the e() option. Example: r(W="W statistic"
        p=P-value).

    return returns the quantities named in the e(), r(), b() and se() options
        in functions of the form r(E|R|B|S#1gp#2). Here, #1 indexes the items
        in the e(), r(), b() and se() options; gp#2 indexes the subgroups
        defined by the combinations of values in varlist. For example, e(rmse
        N) return would return r(E1gp1), r(E1gp2), ... containing e(rmse) for
        subgroups 1, 2, ...  and r(E2gp1), r(E2gp2), ... containing e(N) for
        subgroups 1, 2, ... .

    se(selist) stores the standard errors of regression coefficients for
        variables named in selist.  Individual items may be labelled as with
        the e() option.

    tabulate displays the results in tabular form, suppressing the output (if
        any) from the final stata_cmd.

    unique is relevant only with generate. It specifies that results for each
        unique combination of values defined by varlist are stored only in
        the first position in the new variable(s). Values in other positions
        are set to missing. See also store().

    stata_cmd is any Stata command and its options.


Remarks

    Note that byvar acts conservatively when creating new variables with the
    generate option. It won't wipe out existing variables. You may therefore
    find your workspace becomes cluttered by variables beginning with the
    letters E, R, B or S. With caution, you can type, for example, drop E* R*
    to eliminate them in one action.

    Note that byvar now has sortpreserve, meaning that byvar will change the
    sort order of the data and that Stata is to restore the original sort
    order of the data upon the program's conclusion. See program.


Examples

    To produce a Normal Q-Q plot of weight for each non-missing value of
    rep78:

    . sysuse auto
    . byvar rep78, pause: qnorm weight

    To carry out Shapiro-Wilk tests on mpg for each of the 6 values of rep78
    including missing, store the W-statistics (r(W)) in functions
    r(R1gp1),..., r(R1gp6) and their P-values (r(p)) in functions r(R2gp1),
    ..., r(R2gp6), and display the results in tabular form, with columns
    headed W statistic and P-value:

    . byvar rep78, r(W="W statistic" p=P-value) return tabulate missing:
        swilk mpg

    To create two new variables: Ermse_ containing e(rmse), i.e. the
    regression mean square error, for each of the two values of foreign, and
    Bweight_ containing the estimated regression coefficients for regressing
    mpg on weight:

    . byvar foreign, e(rmse) b(weight) generate: regress mpg weight

    To run a logistic regression of foreign on mpg for levels 3, 4 and 5 of
    rep78, and calculate and report a Hosmer-Lemeshow chi-square statistic
    based on 5 groups:

    . byar rep78 if rep78>=3, tabulate r(chi2):  logit foreign mpg @ estat
        gof, groups(5)

    To do the same, but save the chi-square values to a new variable called
    Rchi2_, storing only the unique result for each value of rep78:

    . byar rep78 if rep78>=3, generate r(chi2) unique:  logit foreign mpg @
        estat gof, groups(5)


Author

    Patrick Royston, MRC Clinical Trials Unit, London.
    patrick.royston@ctu.mrc.ac.uk

Also see