-------------------------------------------------------------------------------
help for byvar                                                  Patrick Royston
-------------------------------------------------------------------------------

Repeat command by variable(s)

byvar varlist [if exp] [in range] [ , b(coeflist) e(elist) generate missing nolabel pause r(rlist) return se(selist) tabulate unique ] : stata_cmd[@stata_cmd ...]

Description

byvar repeats stata_cmd (and each additional stata_cmd following each @, if present) for each distinct combination of values in varlist. The latter may contain string variables.

For details of storage of results, see the generate and return options.

Options

b(coeflist) stores the regression coefficients for variables named in coeflist. Individual items may be labelled as with the e() option.

e(elist) saves the E-class estimates e() named in elist which arise from the final stata_cmd in stata_cmd[@stata_cmd ...]. The E-class estimates must evaluate to numbers; strings are not allowed. The estimate names must be separated by space(s). You may append a label, preceded by an = sign, to each estimate name; this will be used to label the corresponding column of output (if the tabulate option is used) or variable (if the generate option is used). The label will be truncated to 14 characters if it is longer than 14. If spaces are to be included, the label must be enclosed within quotes (""). Commas, colons or equals signs are not allowed anywhere within in the label. Example of e() option with such labelling: e(rmse="RMS error" F="F statistic" N).

generate creates new variable(s) corresponding to the quantities named in the e(), r(), b() and se() options. The names of the new variables begin with letter E, R, B and S, respectively, followed by up to six characters which represent the e(), r(), b() and se() quantity or variable name. The final character is _ (or sometimes, to avoid overwriting, a letter). For example, e(rmse N) generate would create variables called Ermse_ and EN_, containing the values of e(rmse) and e(N), respectively, as left behind by each execution of the final stata_cmd in stata_cmd[@stata_cmd ...]. Results are stored according to the combinations of values of the by-variables in varlist.

missing causes stata_cmd(s) to be executed even when a combination of values of any of the variables in varlist involves a missing value. The idea is the same as for the missing option in Stata's tabulate command.

nolabel suppresses display of score labels for categoric variables for which score labels are defined. Numeric values are used instead.

pause pauses output after each execution of stata_cmd. Useful for graphs.

r(rlist) saves the R-class results r() named in rlist which arise from the final stata_cmd in stata_cmd[@stata_cmd ...]. The estimates must evaluate to numbers; strings are not allowed. Individual items may be labelled as with the e() option. Example: r(W="W statistic" p=P-value).

return returns the quantities named in the e(), r(), b() and se() options in functions of the form r(E|R|B|S#1gp#2). Here, #1 indexes the items in the e(), r(), b() and se() options; gp#2 indexes the subgroups defined by the combinations of values in varlist. For example, e(rmse N) return would return r(E1gp1), r(E1gp2), ... containing e(rmse) for subgroups 1, 2, ... and r(E2gp1), r(E2gp2), ... containing e(N) for subgroups 1, 2, ... .

se(selist) stores the standard errors of regression coefficients for variables named in selist. Individual items may be labelled as with the e() option.

tabulate displays the results in tabular form, suppressing the output (if any) from the final stata_cmd.

unique is relevant only with generate. It specifies that results for each unique combination of values defined by varlist are stored only in the first position in the new variable(s). Values in other positions are set to missing. See also store().

stata_cmd is any Stata command and its options.

Remarks

Note that byvar acts conservatively when creating new variables with the generate option. It won't wipe out existing variables. You may therefore find your workspace becomes cluttered by variables beginning with the letters E, R, B or S. With caution, you can type, for example, drop E* R* to eliminate them in one action.

Note that byvar now has sortpreserve, meaning that byvar will change the sort order of the data and that Stata is to restore the original sort order of the data upon the program's conclusion. See program.

Examples

To produce a Normal Q-Q plot of weight for each non-missing value of rep78:

. sysuse auto . byvar rep78, pause: qnorm weight

To carry out Shapiro-Wilk tests on mpg for each of the 6 values of rep78 including missing, store the W-statistics (r(W)) in functions r(R1gp1),..., r(R1gp6) and their P-values (r(p)) in functions r(R2gp1), ..., r(R2gp6), and display the results in tabular form, with columns headed W statistic and P-value:

. byvar rep78, r(W="W statistic" p=P-value) return tabulate missing: swilk mpg

To create two new variables: Ermse_ containing e(rmse), i.e. the regression mean square error, for each of the two values of foreign, and Bweight_ containing the estimated regression coefficients for regressing mpg on weight:

. byvar foreign, e(rmse) b(weight) generate: regress mpg weight

To run a logistic regression of foreign on mpg for levels 3, 4 and 5 of rep78, and calculate and report a Hosmer-Lemeshow chi-square statistic based on 5 groups:

. byar rep78 if rep78>=3, tabulate r(chi2): logit foreign mpg @ estat gof, groups(5)

To do the same, but save the chi-square values to a new variable called Rchi2_, storing only the unique result for each value of rep78:

. byar rep78 if rep78>=3, generate r(chi2) unique: logit foreign mpg @ estat gof, groups(5)

Author

Patrick Royston, MRC Clinical Trials Unit, London. patrick.royston@ctu.mrc.ac.uk

Also see