{smcl}
{* *! version 2.50 05jul2019}{...}
{hline}
help for {hi:xtdpdml} version 2.50
{hline}
{title:Dynamic Panel Data Models using Maximum Likelihood}
{marker syntax}{...}
{title:Syntax}
{p 8 16 2}
{opt xtdpdml} y [time-varying strictly exogeneous vars]
[{cmd:,} {it:inv(time-invariant exogenous vars)} {it:pre(predetermined vars)} {it:other_options}]
{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Independent variables (other than strictly exogenous)}
{synopt :{opt inv(varlist)}}Time-invariant exogenous variables, e.g. year of birth{p_end}
{synopt :{opt pre:det(varlist)}}Time varying predetermined (sequentially exogenous) variables {p_end}
{synopt :{opt ylag:s(numlist)}}Specifies lagged values of y to be included in the model. Default is lag 1. {p_end}
{syntab:Dataset options}
{synopt :{opt wide}}Data are already in wide format (default is long format with xtset preceding the command){p_end}
{synopt :{opt stayw:ide}}Keep data in wide format after execution. May help with
some sem post-estimation commands, e.g. predict.{p_end}
{synopt :{opt tfix}}Recode time variable to equal 1, 2,..., T (number of waves). Set delta = 1.{p_end}
{synopt :{opt std}}Standardize all variables in the model to have mean 0 and variance 1 (in long format) {p_end}
{synopt :{opt std(varlist)}}Standardize specified variables to have mean 0 and variance 1{p_end}
{syntab:Model Specification and Constraints Options}
{synopt :{opt evars}}When there are no predetermined variables in the model this sometimes helps with convergence{p_end}
{synopt :{opt alphafree}}Allow Alpha (fixed) effects to vary across time{p_end}
{synopt :{opt xfree}}All x effects free to vary across time{p_end}
{synopt :{opt xfree(varlist)}}x effects of specified variables free to vary across time{p_end}
{synopt :{opt yfree}}All lagged y effects free to vary across time{p_end}
{synopt :{opt yfree(numlist)}}effects of specified lagged ys free to vary across time{p_end}
{synopt :{opt constinv}}constrains constants to be equal across waves. Alias for {it:nocsd}{p_end}
{synopt :{opt nocsd}}Cross-sectional dependence is NOT allowed. Alias for {it:constinv}{p_end}
{synopt :{opt errorinv}}constrains error variances to be equal across waves. May cause convergence problems{p_end}
{synopt :{opt re}}Random Effects Model (Alpha uncorrelated with Xs){p_end}
{syntab:Reporting}
{synopt :{opt ti:tle(string)}}Gives a title to the analysis, e.g. {it: ti(Baseline Model)}{p_end}
{synopt :{opt detail:s}}shows all the sem output + highlights. Otherwise you
only get highlights.{p_end}
{synopt :{opt show:cmd}}show the sem command generated by xtdpdml{p_end}
{synopt :{opt gof}}report several goodness of fit measures{p_end}
{synopt :{opt tsoff}}do not use time-series notation in the highlights output{p_end}
{synopt :{it:{help estimation options##display_options:display_options}}}Assorted display options, e.g. noci,
cformat(%8.3f){p_end}
INCLUDE help shortdes-coeflegend
{synopt :{opt dec:imals(integer)}}Specifies the number of decimal places to display for the coefficients,
SEs and CIs. {p_end}
{syntab:Other options}
{synopt :{opt mp:lus(fname, opts)}}Create Mplus input and data files. File may need some editing before running.{p_end}
{synopt :{opt lav:aan(fname, opts)}}Create Lavaan input and data files. File may need some editing before running.{p_end}
{synopt :{opt semf:ile(fname, r)}}Create do file with the generated sem commands{p_end}
{synopt :{opt sto:re(stub)}}Stores the full & highlights-only results under the names stub_f and stub_h {p_end}
{synopt :{opt dry:run}}Do not actually estimate the model.{p_end}
{synopt :{opt iter:ate(#)}}Maximum number of iterations allowed. Default is 250.{p_end}
{synopt :{opt tech:nique(options)}}Estimation technique used. Default is {it: nr 25 bhhh 25} unless {opt method(adf)} is specified.{p_end}
{synopt :{opt semopts(options)}}Additional sem options to be included in the generated sem command.{p_end}
{synopt :{opt fiml}}Full Information Maximum Likelihood is used for missing data.{p_end}
{synopt :{opt v12}}Lets xtdpdml run under Stata 12.1. Probably ok but use at own risk.{p_end}
{synopt :{opt skipcfa:transform}}Changes the way start values are computed in Stata 14.2 and later.{p_end}
{synopt :{opt skipcond:itional}}Changes the way start values are computed in Stata 14.2 and later.{p_end}
{synopt :{opt altst:art}}Convenient way to specify both {it: skipcfatransform} and {it: skipconditional}{p_end}
{synopt :{opt meth:od(method)}}Methods supported by {it: sem}, e.g. ml, mlmv, adf{p_end}
{synopt :{opt vce(vcetype)}}vceypes supported by sem, e.g. oim, robust.{p_end}
{synopt :{it:{help maximize:maximize_options}}}control the maximization process; seldom used{p_end}
{synoptline}
{p 4 6 2} Factor variable notation is NOT supported.{p_end}
{p 4 6 2}
{it:Strictly exogenous} and {it:predetermined} variables may contain time-series operators; see {help tsvarlist}.{p_end}
{p 4 6 2}
Many/most sem postestimation commands will work after xtdpdml.
See {manhelp sem_postestimation R:sem postestimation} for features
available after estimation. You may need to use {it:staywide} to get some options to work. {p_end}
{marker description}{...}
{title:Description}
{pstd} {cmd:xtdpdml} fits Dynamic Panel Data Models using Maximum
Likelihood. It basically works as a shell for {it:sem}, generating the
necessary {it:sem} commands. It can also generate code for running these models
in Mplus. It tends to work best when panels are
strongly balanced, T is relatively small (e.g. less than 10), and there
is no missing data. See the section on Special Topics below for suggestions
on what to do if your data do not meet these criteria.
{pstd} Panel data make it possible both to control for unobserved
confounders and to include lagged, endogenous regressors. Trying to do
both at the same time, however, leads to serious estimation
difficulties. In the econometric literature, these problems have been
solved by using lagged instrumental variables together with the
generalized method of moments (GMM). In Stata, commands such as xtabond
and xtdpdsys have been used for these models.
{pstd} xtdpdml addresses the same problems via maximum likelihood
estimation implemented with Stata's structural equation modeling (sem)
command. The ML (sem) method is substantially more efficient than the
GMM method when the normality assumption is met and suffers less from
finite sample biases. xtdpdml simplifies the SEM model specification
process; makes it possible to test and relax many of the constraints
that are typically embodied in dynamic panel models; unlike most related
methods, allows for the inclusion of time-invariant variables in the
model; and takes advantage of Stata's ability to use full information
maximum likelihood (FIML) for dealing with missing data. xtdpdml also provides
an overall goodness of fit measure by default and provides access to others
via the sem postestimation command {cmd:estat gof, stats(all)}. Many other
sem postestimation commands can be used as well. Since xtdpdml is a shell
for sem, you should use the {cmd:sem} command if you want to replay
the full results and {cmd:xtdpdml} to replay the highlights-only results.
{pstd} {it:Data should be xtset with both the panel id and time variable specified.}
The time variable should be coded t = 1, 2, 3, ...,
T, and delta (the period between observations) should equal 1. Other
values for t (e.g. years, or starting at 0, or skipped values of t) will
likely produce error messages or incorrect results. If necessary, recode
the time variable before running xtdpdml. Or, you can use the {it:tfix}
option and let xtdpdml recode the time variable for you (but you can
still get errors if, say, delta was not specified correctly in the
source data set, e.g. data were collected every two years and delta was
set to 1). The model assumes that time intervals are equally spaced.
{p 6 6 2} Note: unless you specify {opt fiml}, panels with missing data
will be deleted from any intermediate data files
xtdpdml creates and from data files created by the {opt mplus}
or {opt lavaan} options.
{p 6 6 2} Note: unless you specify {opt staywide}, your original data are always
restored after xtdpdml execution. If you do specify {opt staywide}, be careful if you
then save the data file. You don't want to overwrite a file you want to keep.
{pstd} {it:All variable names should start with lowercase letters.}
As the Stata sem manual points out, "In the command language,
variables are assumed to be observed if they are typed in lowercase and
are assumed to be latent if the first letter is capitalized.
Variable educ is observed, while variable Knowledge or KNOWLEDGE is
latent. If the observed variables in your dataset have uppercase names,
type {cmd:rename all, lower} to convert them to lowercase."
{pstd} By default, most effects (with the exceptions of the constants and error variances) are
constrained to be equal across waves, making it possible to present only a single set
of parameter estimates for each variable in the model. These constraints can be relaxed
via options such as {it:xfree}, {it:yfree} and {it:alphafree}.
{pstd} The models include a latent variable ALPHA that reflects the fixed effects that are
common to all time periods. By default, The coefficient of
ALPHA is constrained to have a value of 1.0 at each time period. The alphafree option can
be used to allow the effects of ALPHA to vary across waves. Also by default, ALPHA
freely covaries with the time-varying exogenous variables. If {it:re} is specified,
a random effects model is estimated where ALPHA is uncorrelated with all of the X
variables.
{pstd} The are FOUR types of independent variables that can be
specified. There is considerable flexibility in specifying which lagged
values of variables (if any) should be included in the model, e.g. no
lags or heterogeneous lags can be specified.
{p 6 6 2} The lag 1 value of y (e.g. L1.y) is included by default. This can be changed
with the {it:ylag} option.
{p 6 6 2} Strictly exogenous variables are those that (by assumption) are uncorrelated with
the error terms at all points in time. Equivalently, we assume that
they are not affected by prior values of the dependent variable. These variables
are specified on the left side of the comma, before the options. Time series
notation can be used, e.g. {it: xtdpdml y L1.wages L2.wages} would include the first
and second lagged values of wages as independent variables.
{p 6 6 2} Predetermined variables, also known as sequentially exogenous, are
variables that can be affected by prior values of the dependent
variable. Time series notation can be used. These are specified with the {it:pre} option.
{p 6 6 2} Time-invariant exogenous variables are variables whose values are constant
across time, such as year of birth. You of course DO NOT use time series
notation with these. The ability to use time-invariant exogenous variables in the
model is one of the key advantages of the sem approach. These are
specified with the {it:inv} option. These variables are assumed to be
uncorrelated with ALPHA.
{marker options}{...}
{title:Options}
{dlgtab:Independent variables (other than strictly exogenous)}
{phang}
{opt inv(varlist)} Time-invariant exogenous variables, e.g. year of birth. {p_end}
{phang} {opt predet(varlist)} Predetermined variables, also known as
sequentially exogenous. Predermined variables can be affected by prior
values of the dependent variable. Time series notation can be
used.{p_end}
{phang}
{opt ylag(numlist)} By default the lag 1 value of y is included as an independent variable.
Different or multiple lags can be specified, e.g. ylag(1 2) would include lags 1 and 2 of y.
ylag(0) will cause no lagged value of y to be included in the model.{p_end}
{dlgtab:Dataset Options}
{phang}
{opt wide} By default, data are assumed to be xtset long with both time and panelid
variables specified. The data set is temporarily converted to wide format for use with sem.
If data are already in wide format use the {it:wide} option. However, note that the file
must have been created by a reshape wide command, using a file that is in long
format and that was xtset, or else it won't have information
that xtdpdml needs. Use of this option is generally discouraged.
{p_end}
{phang} {opt staywide} This will keep the data in wide format after
runinng xtdpdml. This may be necessary if you want to use post-estimation
commands like predict. If you use staywide be careful you don't accidentally
save the wide .dta file and overwrite a file you want to keep!
{p_end}
{phang} {opt tfix} Time should be coded t = 1, 2, ..., T where T =
number of waves. By default, units like years (e.g. 1990, 1991,) will
cause errors or incorrect results. There will also be errors or
incorrect results if delta does not equal 1, e.g. t = 1, 3, 5. The tfix
option will recode time to equal 1, 2, ..., T and set delta = 1. You can
still have problems though if delta was not specified correctly in the
source data set or if interval width is not consistent. It is safest if
you correctly code time yourself but tfix should work in most cases.
{p_end}
{phang} {opt std} std standardizes all the variables in the model
to have mean 0 and variance 1. It does this while the data set is still
in long format. You probably will not want to use this option in most cases
but it can sometimes help when the model is having trouble converging.
{p_end}
{phang} {opt std(varlist)} standardizes only the selected variables to
have mean 0 and variance 1. Does not work if the {opt wide}
option has been specified. Do NOT use time series notation; just
list the names of the variables you want standardized.{p_end}
{dlgtab:Model Specification and Constraints Options}
{phang} {opt evars} sometimes helps with convergence when there are no
predetermined variables in the model. It is an alternative and usually
less efficient way of specifying the error terms. But sometimes it helps
and may be necessary for replicating results from earlier versions of
the program. {p_end}
{phang}
{opt alphafree} alphafree lets the Alpha (fixed) effects differ across
time. Note that, if this option is used, Alpha will be normalized by
fixing its variance at 1; otherwise the model sometimes has convergence problems.
{p_end}
{phang} {opt xfree} lets the effects of all the independent
variables (except lagged y) freely differ across time. {p_end}
{phang} {opt xfree(varlist)} lets the effects of the specified
independent variables freely differ across time. {p_end}
{phang}
{opt yfree} lets all lagged y effects freely differ across time.
{p_end}
{phang}
{opt yfree(numlist)} allows the specified lagged y effects to freely differ across time.
{p_end}
{phang} {opt nocsd} (alias is {opt constinv}} Cross-sectional dependence
is NOT allowed, i.e. constants are constrained to be equal across waves.
This is equivalent to no effect of time. This option sometimes
causes convergence problems.{p_end}
{phang}
{opt errorinv} constrains error variances to be equal across waves. May cause convergence problems{p_end}
{phang}
{opt re} Random Effects Model (Alphas uncorrelated with Xs){p_end}
{dlgtab:Reporting Options}
{phang}
{opt title(string)} Gives a title to the analysis. This title will appear in both the
highlights results and (if requested) the Mplus and lavaan code. For example, {it:ti(Baseline Model)}
{p_end}
{phang}
{opt details} This will show all the output generated by the sem command. Otherwise only a
highlights version is presented. This can be useful if you want to make sure the model
specification is correct or if you want information not contained in the highlights.
{p_end}
{phang}
{opt showcmd} This will show the sem command generated by xtdpdml. This can be useful to
make sure the estimated model is what you wanted.
{p_end}
{phang}
{opt gof} Reports several goodness of fit measures after model estimation. It has the
same effect as running the sem postestimation command {cmd:estat gof, stats(all)}
after xtdpdml.
{p_end}
{phang}
{opt tsoff} By default, when possible the highlights output produced by
xtdpdml will use time-series notation similar to what you see with
commands like xtabond, e.g. L3.xvar will represent the lag 3 value of
xvar. Since the data are reshaped wide, this is not the same as the name
of the variable that was actually used, e.g. it might be that L3.xvar
corresponds to xvar2. tsoff will turn off the use of time series
notation in the highlights printout and show the names of the variables
actually used in the reshaped wide data. This may be useful if you are
going to hand-modify the code generated by xtdpdml.
{p_end}
INCLUDE help displayopts_list
{phang}
{opt coeflegend} Display the legend instead of the statistics. This can be useful if, say,
you are trying to use post-estimation test commands to test hypotheses about effects.
{p_end}
{phang}
{opt decimals(integer)} specifies the number of decimal places to display
for the coefficients, standard errors, and confidence limits. It is a shorthand way
of specifying {opt cformat}, e.g. {opt dec(3)} is the same as specifying
{opt cformat(%9.3f)}. You will get an error if you specify both {opt dec} and
{opt cformat}. The value specified must range between 0 and 8; 3 is
often a good choice for making the output easier to read.
{p_end}
{dlgtab:Other Options}
{phang}
{opt mplus(filenamestub, mplus options)} This will create inp (mplus commands) and data
files that can be used by Mplus (has only been tested with Mplus 7.4).
This is adapted (with permission) from UCLA's and Michael Mitchell's
stata2mplus command but does not require that it be installed. The
filenamestub must be specified; it will be used to name the Mplus .inp
and .dat files. Everything else is optional. Options {opt r:eplace},
{opt mi:ssing(#)}, {opt a:nalysis}, and {opt out:put}
are supported. {opt replace} will cause existing .inp and .dat files
to be overwritten. {opt missing} specifies the missing value for all
variables; default is -9999. {opt analysis} and {opt output}
specify options to be passed to the Mplus analysis and output options.
As is the case in Mplus, multiple analysis and output options should
be separated by semicolons. {opt xtdpdml} cannot check your Mplus syntax so
be careful.
{phang}
So, for example, if the user specified
{cmd:mplus(myfile, r missing(-999999) analysis(iterations = 2000) out(mod(3.84); sampstat))}
mpl_myfile.inp and mpl_myfile.dat would be created
(replacing any existing files by those names). All missing values
would be set to -999999. The Mplus analysis option would set iterations equal to 2000
(default is 1,000). The output option (note how ; was used to separate the two options requested)
would request that modification indices > 3.84
be printed out and that sample statistics be included in the output. Obviously you
need to understand Mplus to use the analysis and output options; if you don't use them
the default values will probably meet most of your needs. You can, of course, edit the .inp
file on your own before running Mplus.
{phang}
Include the {opt dryrun} option if you only want the mplus code.
Keep in mind that Mplus only shows the first 8 characters of variable
names; also since data are reshaped wide the names of time-varying variables should be 7
characters or less (or 6 characters or less if T > 10) if you want to see
the full variable name in the output. Some editing of the .inp file
may be required first, e.g. variable names may need to be shortened and/or long lines may have to
be split. Like Stata, the Mplus code will default to listwise deletion
unless {opt fiml} is specified. Most xdpdml model specification options
are supported but the user should still check the coding, e.g. options like
{opt semopts(whatever)} will not be carried over into the mplus code.
{p_end}
{phang}
{opt lavaan(filenamestub, r)} This will create R (lavaan commands) and Stata dta
files that can be used by R's lavaan package. The
filenamestub must be specified; it will be used to name the lavaan .R
and .dta files. {opt replace} will cause existing .R and .dta files by those names
to be overwritten. You of course need to have R installed and know how to use it.
You may want to edit the generated code if you want to change or add options.
So, for example, if the user specified
{cmd:lav(myfile, r)}
lav_myfile.R and lav_myfile.dta would be created
(replacing any existing files by those names).
{phang}
{opt semfile(filename, r)} The generated sem commands will be output to a file
called filename.do. The r option can be specified to replace an existing do file
by that name. This is useful if you want to try to modify the sem commands
in ways that are not easily done with xtdpdml. You may wish to also specify
the {opt staywide} option so that data remain correctly formatted for use
with the generated do file.
{p_end}
{phang}
{opt store(stubname)} {it: xtdpdml} generates two sets of results: the full results,
generated by sem, and a highlights-only set of results which can used with programs
like {it: esttab}. The stored results have the names stubname_f and stubname_h, e.g.
if you specify {it: store(model1)} the results will be stored as model1_f and
model1_h. The default stubname is xtdpdml, so after running {it: xtdpdml} without the
{it: store} option you should have stored results xtdpdml_f and xtdpdml_h. You shouldn't
try to do any post-estimation commands with the highlights version
(e.g. predict, margins) because necessary information may not be stored in the file;
use the full version instead.
{p_end}
{phang}
{opt dryrun} This will keep sem from actually being executed. This will catch some
errors immediately and can be useful
if you want to see the sem command that is generated and/or wish to specify
{it:staywide} to reformat the data from long to wide and/or just want to
generate mplus or lavaan code and data files. This will often
be combined with the {it:showcmd}, {it:mplus}, {it:lavaan}, {it:semfile}, or {it:staywide} options.
{p_end}
{phang}
{opt iterate(#)} Maximum number of iterations allowed. Default
is 250. You can increase this number and/or change the maximization
technique if the model is having trouble converging.
{p_end}
{phang}
{opt technique(methods)} Maximization techniques used. Default is
{it:technique(nr 25 bhhh 25)} unless {opt method(adf)} is specified. You can change this if the model is
having trouble converging. If you use {opt method(adf)} (asymptotic distribution free) the default technique
is set to {it:technique(nr 25 bfgs 10)} since adf and the bhhh technique do not seem to work together.
See {help maximize} for details as well as
for information on other options that can be used, e.g. {it:difficult}.
{p_end}
{phang}
{opt semopts(otions)} Other options allowed by sem will be included in the generated sem command.
See, for example, {help sem_reporting_options}.
{p_end}
{phang}
{opt fiml} Full Information Maximum Likelihood is used for missing data. This is the equivalent of specifying
method(mlmv) on the sem command. Use of fiml sometimes dramatically slows down execution so be patient
if you use it! Unless you specify fiml, intermediate data files created by xtdpdml will have panels
with missing data deleted.
{p_end}
{phang} {opt skipcfatransform} and {opt skipconditional} Stata 14.2
changed the way start values are computed. Usually the new procedures
work better, especially when fiml is used, but sometimes the old start
values speed up execution and/or are better for getting models to
converge. These options are ignored in Stata 14.1 or earlier.
{p_end}
{phang} {opt altstart} is a convenient way to specify both
{opt skipcfatransform} and {opt skipconditional}.
{p_end}
{phang} {opt method()} and {opt vce()} specify the method used to obtain parameter
estimates and the technique used to obtain the variance-covariance matrix of the estimates.
See {it:{help sem_option_method}}. {opt fiml} is another way of specifying {opt method(mlmv)}.
Be patient if you specify something like {opt vce(bootstrap)} as it may take a very long
time to run.
{p_end}
{phang}
{opt v12} xtdpdml was written and tested using Stata 13 and 14.
The v12 option will also cause it to run under
Stata 12.1. This has not been extensively tested so use at your own risk.
{p_end}
{marker "Special Topics"}{...}
{title:Special Topics}
{dlgtab:Interactions with Time}
{pstd}Users sometimes want constants and variable effects to differ across
time. xtdpdml can do this but, because data are reshaped wide, the procedure
is different than it is with other programs.
{pstd}By default, xtdpdml lets the constants differ across time periods. In other
programs this would be like including i.time in the model. The {opt constinv} or
{opt nocsd} options can be specified if the user wants the constants to
be invariant across time. Note that using these options will sometimes cause
convergence problems.
{pstd}In other situations the user might want interactions with time where the
effect of a variable is free to differ across time periods. In other programs
this might be accomplished by specifying something like i.time#c.ses. With
xtdpml you use the free options instead, e.g. {opt xfree(ses)} will allow the
effect of ses to differ at each time period.
{dlgtab:Convergence Problems}
{pstd}xtdpdml sometimes has trouble converging to a solution. Here are some
things you can try when that happens.
{pstd}xtdpdml works best when panels are strongly balanced, T is small (e.g.
less than 10), and there is no missing data. If these conditions do not apply
to your data, consider doing the following.
{p 6 6 2} The {it:fiml} option will often help when some data are missing.
{p 6 6 2} Consider restricting your data to a smaller range of time periods
where most or all cases have complete data. See the example using the abdata
given below. Or, you might consider using only every kth year, e.g. 1980, 1985,
1990, ..., 2015. Using fewer variables in the model may also help.
{p 6 6 2} Consider rescaling variables, e.g. measure income in thousands of
dollars rather than in dollars. This can help with numerical precision problems.
The {opt std} option makes rescaling and standardizing variables easy,
although it may make coefficients a little harder to interpret. If {opt std}
solves a convergence problem then you may want to rescale the variables
yourself in a more interpretable way.
{p 6 6 2} Stata 14.2 changed the way start values are computed. Our
experience is that models using fiml tend to run far more quickly now.
However, sometimes the new start values actually make the models run
more slowly or cause convergence problems. If you are running Stata
14.2 or later, you can add the option {opt altstart}
to make Stata use the old starting values
methods. This is equivalent to specifying both {opt skipcfatransform} and
{opt skipconditional}, which can also be specified separately if you want.
{p 6 6 2} Mplus sometimes succeeds when Stata has problems and is often much
faster. Try the {opt mplus} option if you have access to the program. The {opt lavaan}
option may also be worth trying. We prefer Mplus to lavaan but Mplus costs money
while lavaan is free (since R and RStudio are free).
{p 6 6 2} Finally, remember that problems with regressing Y on lagged
Y may not be that severe when T and/or N is large. Methods like xtreg may
meet your needs in such situations. But even then, features like fiml
and time-invariant independent variables may make it worth your while
to pair your dataset down so you can do at least some analyses with
xtdpdml.
{pstd}There are several other options you can try if you are having problems
achieving convergence. Much of this advice applies to many programs,
not just xtdpdml.
{p 6 6 2} The {it:difficult} option will sometimes work miracles.
There is no guarantee it will work but it is very easy to try.
{p 6 6 2} The {it:technique} option can be specified to use different maximization
techniques. See the help for {help maximize}.
{p 6 6 2} {opt evars} sometimes helps with convergence when there are no
predetermined variables in the model. It is an alternative and usually
less efficient way of specifying the error terms. But sometimes it helps
and may be necessary for replicating results from earlier versions of
xtdpdml.
{p 6 6 2} The {it:iterate} option can be used to increase or decrease the number
of iterations xtdpdml tries before giving up. The {it:details} option will
show the iteration log. You can increase or decrease the number of iterations
depending on whether it appears the program is converging to a solution.
{marker examples}{...}
{title:Examples}
{pstd}Data setup. Data should be xtset first with both panel id and time variable specified.
Run these commands before trying the other examples. NOTE: Some of the examples also require
that {opt estout} (available from SSC) be installed to run the full example. You may also need
to specify {opt set matsize} for bigger problems.{p_end}
{phang2}{cmd}
use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end}
{phang2}xtset id t{p_end}
{txt}
{pstd}Lag 1 for the y, strictly exogenous and pretermined variables,
and a time-invariant variable{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline Model) show{p_end}{txt}
{pstd}Same as above, writing out the equivalent sem code.{p_end}
{phang2}{cmd}preserve{p_end}
{phang2}keep wks lwage union ed id t{p_end}
{phang2}reshape wide wks lwage union, i(id) j(t){p_end}
{phang2}sem (wks2 <- wks1@b1 lwage1@b2 union1@b3 ed@b4 Alpha@1 E2@1 ) ///{p_end}
{phang2} (wks3 <- wks2@b1 lwage2@b2 union2@b3 ed@b4 Alpha@1 E3@1) ///{p_end}
{phang2} (wks4 <- wks3@b1 lwage3@b2 union3@b3 ed@b4 Alpha@1 E4@1) ///{p_end}
{phang2} (wks5 <- wks4@b1 lwage4@b2 union4@b3 ed@b4 Alpha@1 E5@1) ///{p_end}
{phang2} (wks6 <- wks5@b1 lwage5@b2 union5@b3 ed@b4 Alpha@1 E6@1) ///{p_end}
{phang2} (wks7 <- wks6@b1 lwage6@b2 union6@b3 ed@b4 Alpha@1), ///{p_end}
{phang2} var(e.wks2@0 e.wks3@0 e.wks4@0 e.wks5@0 e.wks6@0) var(Alpha) ///{p_end}
{phang2} cov(Alpha*(ed)@0) cov(Alpha*(E2 E3 E4 E5 E6)@0) /// {p_end}
{phang2} cov(_OEx*(E2 E3 E4 E5 E6)@0) cov(E2*(E3 E4 E5 E6)@0) ///{p_end}
{phang2} cov(E3*(E4 E5 E6)@0) cov(E4*(E5 E6)@0) cov(E5*(E6)@0) ///{p_end}
{phang2} cov(union3*(E2)) cov(union4*(E2 E3)) cov(union5*(E2 E3 E4)) ///{p_end}
{phang2} cov(union6*(E2 E3 E4 E5)) ///{p_end}
{phang2} iterate(250) technique(nr 25 bhhh 25) noxconditional{p_end}
{phang2}restore{p_end}{txt}
{pstd}Lags 0 and 1 of union are included as independent variables.{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L(0 1).union) ti(Baseline Model + lag 0 of union){p_end}{txt}
{pstd}No lag on Xs{p_end}
{phang2}{cmd}xtdpdml wks lwage, inv(ed) pre(union) {p_end}{txt}
{pstd}No lagged ys included in the model{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ylag(0){p_end}{txt}
{pstd}xfree and yfree options -- All lagged Ys and Xs effects free to vary across time.
This is how you allow for interactions with time.{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline Model) {p_end}
{phang2}est store m1{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) yfree xfree ti(Baseline Model + yfree xfree){p_end}
{phang2}est store m2{p_end}
{phang2}lrtest m1 m2, stats{p_end}{txt}
{pstd}Postestimation commands. Many/most sem postestimation commands work with xtdpdml.
For some commands it may be necessary to specify the staywide option so the data set is
properly formatted. In the following examples we get several goodness of
fit measures. We also replay all the results using 99% confidence levels.{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) {p_end}
{phang2}estat gof, stats(all){p_end}
{phang2}sem, l(99) nocnsr {p_end}{txt}
{pstd}Missing data. The fiml (Full Information Maximum Likelihood) option can be very effective
for dealing with data that are missing on a random basis. It is generally much
easier to use fiml than it is to use multiple imputation. This example also shows how to use
the store option and {opt esttab} (if it is installed) to present a table of results.{p_end}
{phang2}{cmd}* Results with no missing data -- provides a baseline for{p_end}
{phang2}* assessing how well fiml works.{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline with no missing data) sto(nomiss){p_end}
{phang2}* Now we randomly create MD since there is none. But normally you{p_end}
{phang2}* would not do this!{p_end}
{phang2}replace union = . if _n/10 == int(_n/10){p_end}
{phang2}* fiml not used -- 60% of cases lost, estimates are quite a bit off.{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline with missing data, no fiml) sto(nofiml){p_end}
{phang2}* fiml used -- works extremely well, at least in this case{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) fiml ti (Baseline with missing data, using fiml) sto(fiml){p_end}
{phang2}esttab nomiss_h nofiml_h fiml_h, z scalars(chi2_ms df_ms p_ms BIC AIC) mtitles(nomiss nofiml fiml){p_end}
{txt}
{pstd}Bollen and Brand (2010) replication. In their 2010 Social Forces paper,
Bollen and Brand present a series of Panel Models with Random and Fixed Effects.
Many, perhaps all, of their models can be easily replicated with xtdpdml (although
hand tweaking of the code may be required in a few cases). Sometimes xtdpdml yields
a modestly different model chi-square value than what they reported but we believe the xtdpdml value is
the correct one. Here we present the fixed effects model 2 from their Table 3.{p_end}
{phang2}{cmd}* Bollen & Brand Social Forces 2010 Fixed Effects Table 3 Model 2 p. 15 {p_end}
{phang2}use https://www3.nd.edu/~rwilliam/statafiles/bollenbrand, clear {p_end}
{phang2}xtdpdml lnwg hchild marr div, ylag(0) fiml tfix errorinv gof {p_end}
{txt}
{pstd}Comparisons with xtabond -- coefficients similar, xtdpdml tends to
be more significant. Include time dummies in xtabond since constants are
free to vary across time (by default) in xtdpdml. Alternatively you
could leave the time dummies out of xtabond and use constinv option with
xtdpdml. tfix option is necessary since year is coded in years rather
than t = 1, 2, ..., T. If you have trouble getting xtdpdml to converge with your version
of Stata try adding the {it:evars} or {it:altstart} option. The A/B
data are very unbalanced so we restrict the analysis to a shorter time frame.
xtabond and xtdpdml report N differently but the same data are analyzed by both.
{p_end}
{phang2}{cmd}webuse abdata, clear{p_end}
{phang2}keep if year >=1978 & year <= 1982{p_end}
{phang2}xtabond n l(0/1).w l(0/2).(k ys) yr1976-yr1984, lags(2){p_end}
{phang2}estimates store xtabond{p_end}
{phang2}xtdpdml n l(0/1).w l(0/2).(k ys) , ylags(1 2) tfix ti(A/B data 1978 - 1982 Only){p_end}
{phang2}esttab xtabond xtdpdml_h, mtitles(xtabond xtdpdml){p_end}{txt}
{pstd}Create files for Mplus --
This will create Mplus .dat and .inp files but some editing may be necessary.
Files are written to the current directory so make sure it is
writing to the directory you want. The following will create mpl_m1.dat and
mpl_m1.inp, replacing any existing files by those names. dryrun will keep
Stata from actually estimating the model, which can be a good idea if you only
want the Mplus files. The Mplus output will
include the Modification Indices and the descriptive sample statistics. Be sure
to use semicolons if you have multiple options for either analysis or output.
{p_end}{cmd}
{phang2}use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end}
{phang2}xtset id t{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) dryrun ti(Baseline Model) mplus(m1, r out(mod; sampstat)){p_end}
{phang2}* View or edit the mplus .inp file if you want {p_end}
{phang2}doedit mpl_m1.inp {p_end}
{phang2}* Run mplus if you want to. Mplus mut be installed! {p_end}
{phang2}* The correct command may depend on your OS and your computer setup. {p_end}
{phang2}!mplus mpl_m1.inp {p_end}
{phang2}* View or edit the mplus output file if you want {p_end}
{phang2}doedit mpl_m1.out {p_end}
{txt}
{pstd}Create files for lavaan --
This will create lavaan .R and and .dta files but some editing may be necessary.
Files are written to the current directory so make sure it is
writing to the directory you want. The following will create lav_m1.R and
lav_m1.R, replacing any existing files by those names. dryrun will keep
Stata from actually estimating the model, which can be a good idea if you only
want the lavaan files. You will of course need to install R (and probably Rstudio)
and know how to run it, but that is pretty easy to do.
{p_end}{cmd}
{phang2}use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end}
{phang2}xtset id t{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) dryrun ti(Baseline Model) lavaan(m1, r){p_end}
{phang2}* View or edit the lavaan .R file if you want {p_end}
{phang2}doedit lav_m1.R {p_end}
{txt}
{pstd}Generate a sem do file -- You can output the generated sem commands
to a do file. This may be useful if you want to modify the commands in ways
not easily done with xtdpdml. In this example the file
mytry.do is created and (because the r option is specified)
any existing file by that name is
overwritten. The staywide option keeps the data in the wide format
that is required by sem. {p_end}{cmd}
{phang2}use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end}
{phang2}xtset id t{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) staywide semfile(mytry, r){p_end}
{txt}
{marker authors}{...}
{title:Authors}
{p 5 5}
Richard Williams, University of Notre Dame, Department of Sociology{break}
Paul Allison, University of Pennsylvania, Department of Sociology{break}
Enrique Moral Benito, Banco de Espana, Madrid {break}
Support: Richard.A.Williams.5@ND.Edu{break}
Web Page: {browse "https://www3.nd.edu/~rwilliam/dynamic/index.html"}{break}
{marker acknowledgments}{...}
{title:Acknowledgments}
{p 5 5} Ken Bollen and Jennie Brand graciously provided us with the
data from their 2010 Social Forces paper to use in our examples. UCLA
and Michael Mitchell kindly allowed us to take their stata2mplus
program and adapt it for our purposes. Code from Mead Over's linewrap
program was modified for use with the semfile option. William Lisowski
and Clyde Schechter provided comments that improved program coding.
Jacob Long gave us ideas that were very useful for writing the lavaan option.
Paul von Hippel offered helpful comments on the program's
documentation. Kristin MacDonald and other Stata Corp staff were very
helpful in modifying Stata so that sem and xtdpdml would execute much
more quickly.
{marker references}{...}
{title:References}
{p 5 5} Allison, Paul D., Richard Williams and Enrique Moral-Benito. 2017. "Maximum
Likelihood for Cross-Lagged Panel Models with Fixed Effects." Socius 3: 1-17.
{browse "http://journals.sagepub.com/doi/suppl/10.1177/2378023117710578"} {break}
{p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2018.
"Linear Dynamic Panel-Data Estimation using Maximum Likelihood and
Structural Equation Modeling." The Stata Journal 18(2): 293-326. A pre-publication
version is at
{browse "https://www3.nd.edu/~rwilliam/dynamic/SJPaper.pdf"}{break}
{p 5 5} Moral-Benito, Enrique, Paul Allison & Richard Williams (2018): Dynamic panel
data modelling using maximum likelihood: an alternative to Arellano-Bond.
Applied Economics, DOI: 10.1080/00036846.2018.1540854. A pre-publication version is at
{browse "https://www3.nd.edu/~rwilliam/dynamic/Benito_Allison_Williams.pdf"}{break}
{p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2015.
"Linear Dynamic Panel-Data Estimation using Maximum Likelihood and
Structural Equation Modeling". Presented July 30, 2015 at the 2015 Stata
Users Conference in Columbus, Ohio.
{browse "https://www3.nd.edu/~rwilliam/dynamic/xtdpdml_Stata2015.pdf"}{break}
{p 5 5}Allison, Paul D. 2015. "Don't Put Lagged Dependent Variables in Mixed Models."
{browse "http://statisticalhorizons.com/lagged-dependent-variables"} {break}
{p 5 5}Moral-Benito, Enrique. 2013. "Likelihood-based Estimation of
Dynamic Panels with Predetermined Regressors." Journal of Business and
Economic Statistics 31:4, 451-472.
{p 5 5}Bollen, Kenneth, and Jennie Brand. 2010. "A General Panel Model with Random
and Fixed Effects: A Structural Equations Approach." Social Forces 89:1, 1-34.
{marker "suggested citation"}{...}
{title:Suggested citations if using {cmd:xtdpdml} in published work }
{p 5 5}{cmd:xtdpdml} is not an official Stata command. It is a free
contribution to the research community, like a paper. Please cite it
as such. The suggested citations are
{p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2018.
"Linear Dynamic Panel-Data Estimation using Maximum Likelihood and
Structural Equation Modeling." The Stata Journal 18(2): 293-326. A pre-publication
version is at
{browse "https://www3.nd.edu/~rwilliam/dynamic/SJPaper.pdf"}{break}
{p 5 5} Allison, Paul D., Richard Williams and Enrique Moral-Benito. 2017. "Maximum
Likelihood for Cross-Lagged Panel Models with Fixed Effects." Socius 3: 1-17.
{browse "http://journals.sagepub.com/doi/suppl/10.1177/2378023117710578"} {break}
{p 5 5} Moral-Benito, Enrique, Paul Allison & Richard Williams (2018): Dynamic panel
data modelling using maximum likelihood: an alternative to Arellano-Bond.
Applied Economics, DOI: 10.1080/00036846.2018.1540854. A pre-publication version is at
{browse "https://www3.nd.edu/~rwilliam/dynamic/Benito_Allison_Williams.pdf"}{break}
{p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2015.
"Linear Dynamic Panel-Data Estimation using Maximum Likelihood and
Structural Equation Modeling". Presented July 30, 2015 at the 2015 Stata
Users Conference in Columbus, Ohio.
{browse "https://www3.nd.edu/~rwilliam/dynamic/xtdpdml_Stata2015.pdf"}{break}