-------------------------------------------------------------------------------
help for hlm, mkmdm, mdmset, hlm2, hlm3, run                    sean f. reardon
-------------------------------------------------------------------------------

Writing and Running HLM v.6 files from within Stata

hlm mkmdm using <a new file> [if exp] [in range] , id2(varname) [type(string) id3(varname) l1(varlist) l2(varlist) l3(varlist) miss(now or analysis) nodta nomdmt nosts replace norun]

hlm mdmset existing .mdm filename

hlm mdmset _drop

hlm hlm2 depvar model, cmdfile(newfile) [mdmfile(existing .mdm filename) outfile(newfile) run replace linear bernouli poisson binomial multinomial ordinal ncat(integer) evar(varname) fml prompt stop od laplace l6 lnum(integer) macit(integer) micit(integer) macstop(#) micstop(#) wgt1(varname) wgt2(varname) pwgt(varname) fsig(real) test(test_specification) deviance(real integer) store(lin|us|pa|eml|l6) robust noeq long res1 rv1(varlist) res2 rv2(varlist) title(string) graph(newfile)]

hlm hlm3 depvar model, cmdfile(newfile) [mdmfile(existing .mdm filename) outfile(newfile) run replace linear bernouli poisson binomial multinomial ordinal ncat(integer) evar(varname) fml prompt stop od laplace lnum(integer) macit(integer) micit(integer) macstop(#) micstop(#) wgt1(varname) wgt2(varname) wgt3(varname) pwgt(varname) fsig(real) test(test_specification) deviance(real integer) store(lin|us|pa|eml) robust noeq long res1 rv1(varlist) res2 rv2(varlist) res3 rv3(varlist) title(string) graph(newfile)]

hlm run existing .hlm filename [, hlm2 hlm3 mdm(mdmfile) store(lin|us|pa|eml|l6) robust noview]

In the hlm2 command, model is an expression of the form

[int[(L2equation)]] L1var[(L2equation)] [L1var[(L2equation)] ...]] [rand]

where L2equation is an expression of the form

[int] [L2 varlist] [rand]

If L2equation is not specified, it is assumed to be (int). Any level-1 variable may be group-mean centered by preceding the variable name with an #. Any level-1 or level-2 variable may be grand-mean centered by preceding the variable name with an %.

In the hlm3 command, model is an expression of the form

[int[(L2equation)]] [L1var[(L2equation)] [L1var[(L2equation)] ...]] [rand]

where L2equation is an expression of the form

[int[(L3equation)]] [L2var[(L3equation)] [L2var[(L3equation)] ...]] [rand]

If L2equation is not specified, it is assumed to be (int(int))

where L3equation is an expression of the form

[int] [L3 varlist] [rand]

If L3equation is not specified, it is assumed to be (int). Any level-1 or level-2 variable may be group-mean centered by preceding the variable name with an #. Any level-1, level-2, or level-3 variable may be grand-mean centered by preceding the variable name with an %.

See the Examples section below for examples of the model expression.

Note: HLM requires that the level-1 variables in the model be listed in the order that they appear in the .mdm file. If int is included in the model, it must be listed first; likewise, if rand is included in the model, it must be listed last (rand can be excluded from the level-1 model, since HLM automatically assumes it is there). Likewise, within each level-2 and level-3 equation, the variables must be listed in the order they appear in the .mdm file.

Description

These commands allow the user to invoke and run HLM v6 from within Stata 8.2 or higher. In order for this program to run properly, the HLM directory must be included in the PATH. To see how to include HLM in the PATH variable, see this page on how to add HLM to the PATH variable.

hlm mkmdm creates an MDM file for HLM. HLM uses an MDM file to perform analysis of univariate hierarchical models and to perform analysis of multivariate hierarchical models. By default, an MDM file for a 2-level univariate model is created. You can specify the type of MDM file by using the type option. After an MDM file is created, use hlm mdmset to set the .mdm file in memory for use in writing and running HLM command files. Use hlm mdmset _drop to drop the current .mdm file from memory. The commands hlm hlm2 and hlm hlm3 are used to write and run HLM command (.hlm) files. Finally, the command hlm run will run an existing HLM command file.

hlm hlm2, hlm hlm3, and hlm run are e-class programs. Saved results can be obtained after estimation by use of the ereturn commands.

Options

+-----------------------+ ----+ options for hlm mkmdm +--------------------------------------------

id2(varname) is required. Varname specifies the level 2 id.

type(string) specifies the type of HLM structure. The possible types are HLM2, HLM3, HMLM2 and HMLM3. The default is HLM2. Note: Currently only the HLM2 and HLM3 options are enabled.

id3(varname) specifies the level 3 id variable if the model is a 3-level model.

l1(varlist) specifies the level 1 variables to be included in the MDM file. If omitted, all variables except id2 and id3 are included at level 1.

l2(varlist) specifies the level 2 variables to be included in the MDM file. If omitted, all variables except id2 and id3 are included at level 2.

l3(varlist) specifies the level 3 variables to be included in the MDM file. If omitted, all variables except id2 and id3 are included at level 3.

miss(now or analysis) is optional in HLM2 and HLM3 to indicate how to deal with missing data. miss(now) tells HLM to delete missing data when making the MDM file; miss(analysis) tells HLM to delete missing data when running analyses. If miss is not specified, HLM assumes there is no missing data.

nodta specifies that the stata data file used to create the MDM file should be deleted; the default is to save it.

nomdmt specifies that the mdmt file should be deleted; the default is to save it.

nosts specifies that the summary statistics file should be deleted; the default is to save it.

replace specifies that the MDM file should be overwritten if it already exists.

norun specifies that the mdmt file should be created, but not run. This may be useful for creating MDM files in batch mode or if the mdmt file needs to be manually changed prior to creating the MDM file. Note that if norun is specified, the nomdmt, nosts, and nodta options are ignored (i.e., the .mdmt file is saved, the .dta file is saved, since HLM needs it to make the .mdm file when the .mdmt file is run later; the nosts option is irrelevant in this case.)

+------------------------+ ----+ options for hlm mdmset +-------------------------------------------

hlm mdmset _drop instructs Stata to drop the current mdm file name from memory.

+---------------------------+ ----+ options for hlm2 and hlm3 +----------------------------------------

cmdfile(newfile) is required. The HLM command file is written to newfile.hlm.

mdm(mdmfile) specifies that mdmfile is the MDM file to be used. If not specified, the MDM file specified in the most recent mdmdset command is used.

outfile(newfile) specifies that the HLM output is to be written to newfile.txt. If outfile(newfile) is not specified, the HLM output is written to filename.txt, where filename is the filename specified in the cmdfile(newfile) command.

run specifies that the HLM command file is to be run immediately; the default is not to run the command file.

replace specifies that the command and output files are to be replaced if they already exist.

linear specifies the model is linear. This is the default.

bernouli specifies the model is bernouli (0 or 1).

poisson specifies a poisson model (constant or variable exposure).

binomial specifies a binomial model.

multinomial specifies a multinomial model.

ordinal specifies an ordinal model.

ncat(integer) specifies the number of categories of the dependent variable in a multinomial or ordinal model. It must be between 3 and 9.

evar(varname) specifies the exposure variable in a variable-exposure poisson model or a binomial (number of trials) model. If poisson is specified and evar:(varname) is omitted, a constant-exposure poisson model is estimated; if evar:(varname) is specified, a variable-exposure poisson model is estimated.

fml is only available in hlm2. It specifies that the model is to be estimated using full maximum likelihood. The default in hlm2 is to use restricted maximum likelihood.

prompt specifies that HLM should prompt the user if the maximum number of iterations is reached without convergence. The default is to continue iterating until convergence, regarless of the number of iterations set in macit.

stop specifies that HLM should stop iterating if the maximum number of iterations is reached without convergence. The default is to continue iterating until convergence, regarless of the number of iterations set in macit.

od specifies the bernouli or poisson model should be estimated with overdispersion of the level-1 variance. This option is not allowed with any other type of model.

laplace specifies that a bernouli or constant exposure poisson model should be estimated using the EM laplace estimation. This option cannot be combined with the od option. This option is only available in HLM2.

l6 specifies that a bernouli model should be estimated using a 6-th order laplace approximation to maximum likelihood. This option cannot be combined with the od option.

lnum(integer) specifies the maximum number of laplace iterations. The default is 50.

macit(integer) specifies the maximum number of macro iterations in a non-linear model. In a linear model, this option specifies the maximum number of iterations. The default is 100. If neither of the prompt or stop options are specified, macit:(integer) is irrelevant, since the default is to continue iteration until convergence.

micit(integer) specifies the maximum number of micro iterations in a non-linear model. The default is 14.

macstop(#) specifies the stopping criterion for macro iterations. The default is 0.0001.

micstop(#) specifies the stopping criterion for micro iterations in a non-linear model. The default is 0.000001.

wgt1(varname) specifies the level-1 design weight variable.

wgt2(varname) specifies the level-2 design weight variable.

wgt2(varname) specifies the level-3 design weight variable.

pwgt(varname) specifies the level-1 precision weight variable..

fsig(#) specifies that sigma-squared is to be set to #. Sigma-squared cannot be fixed in an over-dispersed bernouli or poisson model.

test is used to specify multivariate hypothesis tests. The syntax for test_specification is

[test][[test][[test]...]]

Note: Distinguish between [], which are to be typed, and [], which indicate optional arguments. In addition, while not shown in the syntax diagram, brackets around test are only required with multiple test specifications.

test is an expression of the form

eqn[,eqn[,eqn...]]

eqn is an expression of the form

exp=exp[=exp...]

exp is a linear combination of the variables and/or interactions in the model. It can be any one of

0 [#*][var][+|-[#*]var[+|-[#*]var...]]

var is any one of

L1var L2var L3var L2var*L1var L3var*L1var L3var*L2var L3var*L2var*L1var

For example, to test the null hypothesis that the coefficients on var1 and var2 are both equal to zero, one could write any of the following:

test([var1=0, var2=0]) test([var1=var2=0]) test([var1-var2=0, var2=0])

However, to test the separate null hypotheses that the coefficient on var1 equals zero and that the coefficient on var2 equals zero, one could write the following:

test([var1=0][var2=0])

deviance(real integer) specifies a deviance test. The model will be compared to a model with deviance real and integer degrees of freedom. Note that the deviance statistic and degrees of freedom from previous models will be saved as e-class scalars (see stored results below), and so can be used in the deviance option to compare the relative fit of different models.

store(name) specifies which set of estimates are to be stored in e(b) and e(V). See also robust below. In store, name may be lin (used only with a linear model), us (unit-specific estimates, may be used with any non-linear model), pa (population-average estimates, may be used with any non-linear model), eml (EM laplace estimates, may be used when laplace option is specified}, or l6 (laplace-6 estimates, may be used when l6 option is specified}. If store is omitted, the default depends on the model specified. If laplace is specified, the default is eml; otherwise, if l6 is specified, the default is l6; otherwise, if the model is any other non-linear model, the default is us; finally, if the model is linear, the default is lin.

Note: The mechanism of storing the estimates is currently a bit clunky -- HLM saves the estimates in files called "gamvcX.dat" and "tauvcX.dat", where X is something like "us" or "eml" and so indicates which estimator the estimates come from. In doing so, HLM writes over any existing files with these names, but since not all files are produced by each run of HLM, there can be some confusion regarding whether any existing "gamvcX.dat" come from the current model. To deal with this, this program renames any existing "gamvcX.dat" and "tauvcX.dat" files to some unused filenams like "gamvcX_tempN.dat", etc. before HLM fits the current model. This program then reads the estimates from the newly produced "gamvcX.dat" and "tauvcX.dat" files into matrices, stored in e(), and then erases the current "gamvcX.dat" and "tauvcX.dat" files. Finally the "gamvcX_tempN.dat" and "tauvcX_tempN.dat" files are renamed to their old ("gamvcX_tempN.dat" and "tauvcX_tempN.dat") names. In principal this works fine, but if there is an error in the program and the hlm commands exits before completing, it is possible that the "gamvcX_tempN.dat" and "tauvcX_tempN.dat" will not be renamed to their original names. I tell you this so you won't wonder if a bunch of "gamvcX_tempN.dat" and "tauvcX_tempN.dat" files start populating your hard drive.

robust specifies that the variance-covariance matrix used to compute the robust standard errors should be saved in e(V), when available (robust standard errors are not available, for example, with laplace estimation).

noeq suppresses the writing of the full HLM model equations in stata. The equations are written for users to verify that the model is correctly specified; they do not affect the writing of the command file.

long specifies that HLM should produce a full output file; the default is to produce only a reduced output file.

res1 specifies that HLM should produce a level-1 residual file. The residual file is written to a file named filename_res1.dta, where filename is the filename specified in the cmdfile(newfile) command.

res2 specifies that HLM should produce a level-2 residual file. The residual file is written to a file named filename_res2.dta, where filename is the filename specified in the cmdfile(newfile) command.

res3 specifies that HLM should produce a level-3 residual file. The residual file is written to a file named filename_res3.dta, where filename is the filename specified in the cmdfile(newfile) command.

rv1(varlist) specifies that the level-1 residual file should include (in addition to the variables in the model, which are included by default) the variables listed in varlist.

rv2(varlist) specifies that the level-2 residual file should include (in addition to the variables in the model, which are included by default) the variables listed in varlist.

rv3(varlist) specifies that the level-3 residual file should include (in addition to the variables in the model, which are included by default) the variables listed in varlist.

title(string) specifies the title to be given to the HLM output file.

graph(newfile) specifies that HLM should write a graph equations (.geq) file named newfile.

+---------------------+ ----+ options for hlm run +----------------------------------------------

hlm2 is the default. It specifies that the .hlm command file is an HLM2 command file.

hlm3 specifies that the .hlm command file is an HLM3 command file.

mdm(mdmfile) specifies that mdmfile is the MDM file to be used. If not specified, the MDM file specified in the most recent mdmdset command is used.

store(name) specifies which set of estimates are to be stored in e(b) and e(V). See also robust below. In store, name may be lin (used only with a linear model), us (unit-specific estimates, may be used with any non-linear model), pa (population-average estimates, may be used with any non-linear model), eml (EM laplace estimates, may be used when laplace option is specified}, or l6 (laplace-6 estimates, may be used when l6 option is specified}. If store is omitted, the default depends on the model specified. If laplace is specified, the default is eml; otherwise, if l6 is specified, the default is l6; otherwise, if the model is any other non-linear model, the default is us; finally, if the model is linear, the default is lin.

robust specifies that the variance-covariance matrix used to compute the robust standard errors should be saved in e(V), when available (robust standard errors are not available, for example, with laplace estimation).

noview suppresses the viewing of the output file in the viewer window.

Stored Results

Depending on the model specified, HLM will produce several sets of estimated coefficients and the corresponding variance-covariance matrices. Each of these are stored in Stata as e-class results. In addition, a number of scalars and macros are srored:

Scalars:

e(sig2) estimated level-1 residual variance (linear model only) e(N_1) number of level-1 observations e(N_2) number of level-2 observations e(N_3) number of level-3 observations e(dev) deviance statistic from linear model e(dev_l6) deviance statistic from laplace-6 estimates e(dev_eml) deviance statistic from EM laplace estimates e(df) model df (number of parameters estimated)

Macros:

e(esttype) type of estimates stored in e(b) and e(V) e(setype} type of standard errors stored in e(V) e(cmdfile) name of .hlm command file used e(depvar) dependent variable e(model) model type (distribution of outcome) e(cmd) hlm2 or hlm3

Matrices:

e(b) 1 x K vector of estimated coefficients e(V) K x K variance-covariance matrix e(b_lin) 1 x K vector of estimated coefficients from linear model e(b_us) 1 x K vector of estimated unit-specific coefficients e(b_pa) 1 x K vector of estimated population-average coefficients e(b_eml) 1 x K vector of estimated EM laplace coefficients e(b_l6) 1 x K vector of estimated laplace-6 coefficients e(V_lin) K x K variance-covariance matrix, linear model e(V_us) K x K variance-covariance matrix, unit-specific estimates e(V_pa) K x K variance-covariance matrix, population-average estimat > es e(V_eml) K x K variance-covariance matrix, EM laplace estimates e(V_l6) K x K variance-covariance matrix, laplace-6 estimates e(V_r) K x K robust variance-covariance matrix, linear model e(V_usr) K x K robust variance-covariance matrix, unit-spec. estimate > s e(V_par) K x K robust variance-covariance matrix, pop-average estimat > es e(se_lin) 1 x K standard error vector, linear model e(se_us) 1 x K standard error vector, unit-specific estimates e(se_pa) 1 x K standard error vector, population-average estimates e(se_eml) 1 x K standard error vector, EM laplace estimates e(se_l6) 1 x K standard error vector, laplace-6 estimates e(se_r) 1 x K robust standard error vector, linear model e(se_usr) 1 x K robust standard error vector, unit-spec. estimates e(se_par) 1 x K robust standard error vector, pop-average estimates e(tau2) R x R level-2 tau matrix e(tau2_eml) R x R level-2 tau matrix, EM laplace estimates e(tau2_l6) R x R level-2 tau matrix, laplace-6 estimates e(tau3) R2 x R2 level-3 tau matrix e(invtaui) inverse tau information matrix e(invtaui_l6) inverse tau information matrix, laplace-6 e(se_tau) vector of standard errors of tau e(se_tau_l6) vector of standard errors of tau, laplace-6

The type of estimates stored in e(b) and e(V) are determined by the store and robust options. In general, all available estimated coefficients, variance-covariance matrices, and standard errors are stored, but those stored in e(b) and e(V) will be available for postestimation commands such as estout, which can then be used to display results of multiple models.

Examples

Suppose we have a data file with outcome variable score, level-one variables age, sex, race1, race2, race3, and level-two variable private. We first sort the data by the level-two id variable (schoolid here), and then make an mdm file:

sort schoolid hlm mkmdm using new_mdm_file, id2(schoolid) l1(score age sex race1 race2 race3) l2(private) miss(now)

Next we tell Stata to use this mdm file in subsequent hlm models:

hlm mdmset new_mdm_file.mdm

Now, to estimate the following model:

score(ij) = B(0j) + B(1j)*age(ij) + B(2j)*sex(ij) + R(ij) B(0j) = G(00) + G(01)*private(j) + U(0j) B(1j) = G(10) B(2j) = G(20)

we write the command:

hlm hlm2 score int(int private rand) age(int) sex(int) rand, cmd(model01) run replace

alternately, we could write the same command as:

hlm hlm2 score int(int private rand) age sex rand, cmd(model01) run replace

Now, to estimate the following model:

score(ij) = B(0j) + B(1j)*race1(ij) + B(2j)*race2(ij) + B(3j)*race3(ij) + R(ij) B(0j) = G(00) + G(01)*private(j) + U(0j) B(1j) = G(10) + G(11)*private(j) B(2j) = G(20) + G(21)*private(j) B(3j) = G(30) + G(31)*private(j)

we write the command:

hlm hlm2 score int(int private rand) race1-race3(int private) rand, cmd(model02) run replace

Finally, to test the null hypothesis that the cofficients G(11), G(21), and G(31) are all equal to 0, we would write:

hlm hlm2 score int(int private rand) race1-race3(int private) rand, cmd(model02) run replace test(private*race1=private*race2=private*race3=0)

Author

sean f. reardon Stanford University sean.reardon@stanford.edu