------------------------------------------------------------------------------- help forbyvarPatrick Royston -------------------------------------------------------------------------------

Repeat command by variable(s)

byvarvarlist[ifexp] [inrange] [,b(coeflist)e(elist)generatemissingnolabelpauser(rlist)returnse(selist)tabulateunique]:stata_cmd[@stata_cmd...]

Description

byvarrepeatsstata_cmd(and each additionalstata_cmdfollowing each@, if present) for each distinct combination of values invarlist. The latter may contain string variables.For details of storage of results, see the

generateandreturnoptions.

Options

b(coeflist)stores the regression coefficients for variables named incoeflist. Individual items may be labelled as with thee()option.

e(elist)saves the E-class estimatese()named inelistwhich arise from the finalstata_cmdinstata_cmd[@stata_cmd...]. The E-class estimates must evaluate to numbers; strings are not allowed. The estimate names must be separated by space(s). You may append a label, preceded by an=sign, to each estimate name; this will be used to label the corresponding column of output (if thetabulateoption is used) or variable (if thegenerateoption is used). The label will be truncated to 14 characters if it is longer than 14. If spaces are to be included, the label must be enclosed within quotes (""). Commas, colons or equals signs are not allowed anywhere within in the label. Example ofe()option with such labelling:e(rmse="RMS error" F="Fstatistic" N).

generatecreates new variable(s) corresponding to the quantities named in thee(),r(),b()andse()options. The names of the new variables begin with letterE,R,BandS, respectively, followed by up to six characters which represent thee(),r(),b()andse()quantity or variable name. The final character is_(or sometimes, to avoid overwriting, a letter). For example,e(rmse N) generatewould create variables calledErmse_andEN_, containing the values ofe(rmse)ande(N), respectively, as left behind by each execution of the finalstata_cmdinstata_cmd[@stata_cmd...]. Results are stored according to the combinations of values of the by-variables invarlist.

missingcausesstata_cmd(s) to be executed even when a combination of values of any of the variables invarlistinvolves a missing value. The idea is the same as for themissingoption in Stata's tabulate command.

nolabelsuppresses display of score labels for categoric variables for which score labels are defined. Numeric values are used instead.

pausepauses output after each execution ofstata_cmd. Useful for graphs.

r(rlist)saves the R-class resultsr()named inrlistwhich arise from the finalstata_cmdinstata_cmd[@stata_cmd...]. The estimates must evaluate to numbers; strings are not allowed. Individual items may be labelled as with thee()option. Example:r(W="W statistic"p=P-value).

returnreturns the quantities named in thee(),r(),b()andse()options in functions of the formr(E|R|B|S#1gp#2). Here,#1indexes the items in thee(),r(),b()andse()options;gp#2indexes the subgroups defined by the combinations of values invarlist. For example,e(rmseN) returnwould returnr(E1gp1),r(E1gp2), ... containinge(rmse)for subgroups 1, 2, ... andr(E2gp1),r(E2gp2), ... containinge(N)for subgroups 1, 2, ... .

se(selist)stores the standard errors of regression coefficients for variables named inselist. Individual items may be labelled as with thee()option.

tabulatedisplays the results in tabular form, suppressing the output (if any) from the finalstata_cmd.

uniqueis relevant only withgenerate. It specifies that results for each unique combination of values defined by varlist are stored only in the first position in the new variable(s). Values in other positions are set to missing. See alsostore().

stata_cmdis any Stata command and its options.

RemarksNote that

byvaracts conservatively when creating new variables with thegenerateoption. It won't wipe out existing variables. You may therefore find your workspace becomes cluttered by variables beginning with the lettersE,R,BorS. With caution, you can type, for example,drop E* R*to eliminate them in one action.Note that

byvarnow hassortpreserve, meaning thatbyvarwill change the sort order of the data and that Stata is to restore the original sort order of the data upon the program's conclusion. See program.

ExamplesTo produce a Normal Q-Q plot of

weightfor each non-missing value ofrep78:

. sysuse auto. byvar rep78, pause: qnorm weightTo carry out Shapiro-Wilk tests on

mpgfor each of the 6 values ofrep78including missing, store the W-statistics (r(W)) in functionsr(R1gp1),...,r(R1gp6)and their P-values (r(p)) in functionsr(R2gp1), ...,r(R2gp6), and display the results in tabular form, with columns headedW statisticandP-value:

. byvar rep78, r(W="W statistic" p=P-value) return tabulate missing:swilk mpgTo create two new variables:

Ermse_containinge(rmse), i.e. the regression mean square error, for each of the two values offoreign, andBweight_containing the estimated regression coefficients for regressingmpgonweight:

. byvar foreign, e(rmse) b(weight) generate: regress mpg weightTo run a logistic regression of

foreignonmpgfor levels 3, 4 and 5 ofrep78, and calculate and report a Hosmer-Lemeshow chi-square statistic based on 5 groups:

. byar rep78 if rep78>=3, tabulate r(chi2):logit foreign mpg @ estatgof, groups(5)To do the same, but save the chi-square values to a new variable called

Rchi2_, storing only the unique result for each value ofrep78:

. byar rep78 if rep78>=3, generate r(chi2) unique:logit foreign mpg @estat gof, groups(5)

AuthorPatrick Royston, MRC Clinical Trials Unit, London. patrick.royston@ctu.mrc.ac.uk

Also see