{smcl} {* *! version 2.50 05jul2019}{...} {hline} help for {hi:xtdpdml} version 2.50 {hline} {title:Dynamic Panel Data Models using Maximum Likelihood} {marker syntax}{...} {title:Syntax} {p 8 16 2} {opt xtdpdml} y [time-varying strictly exogeneous vars] [{cmd:,} {it:inv(time-invariant exogenous vars)} {it:pre(predetermined vars)} {it:other_options}] {synoptset 20 tabbed}{...} {synopthdr} {synoptline} {syntab:Independent variables (other than strictly exogenous)} {synopt :{opt inv(varlist)}}Time-invariant exogenous variables, e.g. year of birth{p_end} {synopt :{opt pre:det(varlist)}}Time varying predetermined (sequentially exogenous) variables {p_end} {synopt :{opt ylag:s(numlist)}}Specifies lagged values of y to be included in the model. Default is lag 1. {p_end} {syntab:Dataset options} {synopt :{opt wide}}Data are already in wide format (default is long format with xtset preceding the command){p_end} {synopt :{opt stayw:ide}}Keep data in wide format after execution. May help with some sem post-estimation commands, e.g. predict.{p_end} {synopt :{opt tfix}}Recode time variable to equal 1, 2,..., T (number of waves). Set delta = 1.{p_end} {synopt :{opt std}}Standardize all variables in the model to have mean 0 and variance 1 (in long format) {p_end} {synopt :{opt std(varlist)}}Standardize specified variables to have mean 0 and variance 1{p_end} {syntab:Model Specification and Constraints Options} {synopt :{opt evars}}When there are no predetermined variables in the model this sometimes helps with convergence{p_end} {synopt :{opt alphafree}}Allow Alpha (fixed) effects to vary across time{p_end} {synopt :{opt xfree}}All x effects free to vary across time{p_end} {synopt :{opt xfree(varlist)}}x effects of specified variables free to vary across time{p_end} {synopt :{opt yfree}}All lagged y effects free to vary across time{p_end} {synopt :{opt yfree(numlist)}}effects of specified lagged ys free to vary across time{p_end} {synopt :{opt constinv}}constrains constants to be equal across waves. Alias for {it:nocsd}{p_end} {synopt :{opt nocsd}}Cross-sectional dependence is NOT allowed. Alias for {it:constinv}{p_end} {synopt :{opt errorinv}}constrains error variances to be equal across waves. May cause convergence problems{p_end} {synopt :{opt re}}Random Effects Model (Alpha uncorrelated with Xs){p_end} {syntab:Reporting} {synopt :{opt ti:tle(string)}}Gives a title to the analysis, e.g. {it: ti(Baseline Model)}{p_end} {synopt :{opt detail:s}}shows all the sem output + highlights. Otherwise you only get highlights.{p_end} {synopt :{opt show:cmd}}show the sem command generated by xtdpdml{p_end} {synopt :{opt gof}}report several goodness of fit measures{p_end} {synopt :{opt tsoff}}do not use time-series notation in the highlights output{p_end} {synopt :{it:{help estimation options##display_options:display_options}}}Assorted display options, e.g. noci, cformat(%8.3f){p_end} INCLUDE help shortdes-coeflegend {synopt :{opt dec:imals(integer)}}Specifies the number of decimal places to display for the coefficients, SEs and CIs. {p_end} {syntab:Other options} {synopt :{opt mp:lus(fname, opts)}}Create Mplus input and data files. File may need some editing before running.{p_end} {synopt :{opt lav:aan(fname, opts)}}Create Lavaan input and data files. File may need some editing before running.{p_end} {synopt :{opt semf:ile(fname, r)}}Create do file with the generated sem commands{p_end} {synopt :{opt sto:re(stub)}}Stores the full & highlights-only results under the names stub_f and stub_h {p_end} {synopt :{opt dry:run}}Do not actually estimate the model.{p_end} {synopt :{opt iter:ate(#)}}Maximum number of iterations allowed. Default is 250.{p_end} {synopt :{opt tech:nique(options)}}Estimation technique used. Default is {it: nr 25 bhhh 25} unless {opt method(adf)} is specified.{p_end} {synopt :{opt semopts(options)}}Additional sem options to be included in the generated sem command.{p_end} {synopt :{opt fiml}}Full Information Maximum Likelihood is used for missing data.{p_end} {synopt :{opt v12}}Lets xtdpdml run under Stata 12.1. Probably ok but use at own risk.{p_end} {synopt :{opt skipcfa:transform}}Changes the way start values are computed in Stata 14.2 and later.{p_end} {synopt :{opt skipcond:itional}}Changes the way start values are computed in Stata 14.2 and later.{p_end} {synopt :{opt altst:art}}Convenient way to specify both {it: skipcfatransform} and {it: skipconditional}{p_end} {synopt :{opt meth:od(method)}}Methods supported by {it: sem}, e.g. ml, mlmv, adf{p_end} {synopt :{opt vce(vcetype)}}vceypes supported by sem, e.g. oim, robust.{p_end} {synopt :{it:{help maximize:maximize_options}}}control the maximization process; seldom used{p_end} {synoptline} {p 4 6 2} Factor variable notation is NOT supported.{p_end} {p 4 6 2} {it:Strictly exogenous} and {it:predetermined} variables may contain time-series operators; see {help tsvarlist}.{p_end} {p 4 6 2} Many/most sem postestimation commands will work after xtdpdml. See {manhelp sem_postestimation R:sem postestimation} for features available after estimation. You may need to use {it:staywide} to get some options to work. {p_end} {marker description}{...} {title:Description} {pstd} {cmd:xtdpdml} fits Dynamic Panel Data Models using Maximum Likelihood. It basically works as a shell for {it:sem}, generating the necessary {it:sem} commands. It can also generate code for running these models in Mplus. It tends to work best when panels are strongly balanced, T is relatively small (e.g. less than 10), and there is no missing data. See the section on Special Topics below for suggestions on what to do if your data do not meet these criteria. {pstd} Panel data make it possible both to control for unobserved confounders and to include lagged, endogenous regressors. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been solved by using lagged instrumental variables together with the generalized method of moments (GMM). In Stata, commands such as xtabond and xtdpdsys have been used for these models. {pstd} xtdpdml addresses the same problems via maximum likelihood estimation implemented with Stata's structural equation modeling (sem) command. The ML (sem) method is substantially more efficient than the GMM method when the normality assumption is met and suffers less from finite sample biases. xtdpdml simplifies the SEM model specification process; makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models; unlike most related methods, allows for the inclusion of time-invariant variables in the model; and takes advantage of Stata's ability to use full information maximum likelihood (FIML) for dealing with missing data. xtdpdml also provides an overall goodness of fit measure by default and provides access to others via the sem postestimation command {cmd:estat gof, stats(all)}. Many other sem postestimation commands can be used as well. Since xtdpdml is a shell for sem, you should use the {cmd:sem} command if you want to replay the full results and {cmd:xtdpdml} to replay the highlights-only results. {pstd} {it:Data should be xtset with both the panel id and time variable specified.} The time variable should be coded t = 1, 2, 3, ..., T, and delta (the period between observations) should equal 1. Other values for t (e.g. years, or starting at 0, or skipped values of t) will likely produce error messages or incorrect results. If necessary, recode the time variable before running xtdpdml. Or, you can use the {it:tfix} option and let xtdpdml recode the time variable for you (but you can still get errors if, say, delta was not specified correctly in the source data set, e.g. data were collected every two years and delta was set to 1). The model assumes that time intervals are equally spaced. {p 6 6 2} Note: unless you specify {opt fiml}, panels with missing data will be deleted from any intermediate data files xtdpdml creates and from data files created by the {opt mplus} or {opt lavaan} options. {p 6 6 2} Note: unless you specify {opt staywide}, your original data are always restored after xtdpdml execution. If you do specify {opt staywide}, be careful if you then save the data file. You don't want to overwrite a file you want to keep. {pstd} {it:All variable names should start with lowercase letters.} As the Stata sem manual points out, "In the command language, variables are assumed to be observed if they are typed in lowercase and are assumed to be latent if the first letter is capitalized. Variable educ is observed, while variable Knowledge or KNOWLEDGE is latent. If the observed variables in your dataset have uppercase names, type {cmd:rename all, lower} to convert them to lowercase." {pstd} By default, most effects (with the exceptions of the constants and error variances) are constrained to be equal across waves, making it possible to present only a single set of parameter estimates for each variable in the model. These constraints can be relaxed via options such as {it:xfree}, {it:yfree} and {it:alphafree}. {pstd} The models include a latent variable ALPHA that reflects the fixed effects that are common to all time periods. By default, The coefficient of ALPHA is constrained to have a value of 1.0 at each time period. The alphafree option can be used to allow the effects of ALPHA to vary across waves. Also by default, ALPHA freely covaries with the time-varying exogenous variables. If {it:re} is specified, a random effects model is estimated where ALPHA is uncorrelated with all of the X variables. {pstd} The are FOUR types of independent variables that can be specified. There is considerable flexibility in specifying which lagged values of variables (if any) should be included in the model, e.g. no lags or heterogeneous lags can be specified. {p 6 6 2} The lag 1 value of y (e.g. L1.y) is included by default. This can be changed with the {it:ylag} option. {p 6 6 2} Strictly exogenous variables are those that (by assumption) are uncorrelated with the error terms at all points in time. Equivalently, we assume that they are not affected by prior values of the dependent variable. These variables are specified on the left side of the comma, before the options. Time series notation can be used, e.g. {it: xtdpdml y L1.wages L2.wages} would include the first and second lagged values of wages as independent variables. {p 6 6 2} Predetermined variables, also known as sequentially exogenous, are variables that can be affected by prior values of the dependent variable. Time series notation can be used. These are specified with the {it:pre} option. {p 6 6 2} Time-invariant exogenous variables are variables whose values are constant across time, such as year of birth. You of course DO NOT use time series notation with these. The ability to use time-invariant exogenous variables in the model is one of the key advantages of the sem approach. These are specified with the {it:inv} option. These variables are assumed to be uncorrelated with ALPHA. {marker options}{...} {title:Options} {dlgtab:Independent variables (other than strictly exogenous)} {phang} {opt inv(varlist)} Time-invariant exogenous variables, e.g. year of birth. {p_end} {phang} {opt predet(varlist)} Predetermined variables, also known as sequentially exogenous. Predermined variables can be affected by prior values of the dependent variable. Time series notation can be used.{p_end} {phang} {opt ylag(numlist)} By default the lag 1 value of y is included as an independent variable. Different or multiple lags can be specified, e.g. ylag(1 2) would include lags 1 and 2 of y. ylag(0) will cause no lagged value of y to be included in the model.{p_end} {dlgtab:Dataset Options} {phang} {opt wide} By default, data are assumed to be xtset long with both time and panelid variables specified. The data set is temporarily converted to wide format for use with sem. If data are already in wide format use the {it:wide} option. However, note that the file must have been created by a reshape wide command, using a file that is in long format and that was xtset, or else it won't have information that xtdpdml needs. Use of this option is generally discouraged. {p_end} {phang} {opt staywide} This will keep the data in wide format after runinng xtdpdml. This may be necessary if you want to use post-estimation commands like predict. If you use staywide be careful you don't accidentally save the wide .dta file and overwrite a file you want to keep! {p_end} {phang} {opt tfix} Time should be coded t = 1, 2, ..., T where T = number of waves. By default, units like years (e.g. 1990, 1991,) will cause errors or incorrect results. There will also be errors or incorrect results if delta does not equal 1, e.g. t = 1, 3, 5. The tfix option will recode time to equal 1, 2, ..., T and set delta = 1. You can still have problems though if delta was not specified correctly in the source data set or if interval width is not consistent. It is safest if you correctly code time yourself but tfix should work in most cases. {p_end} {phang} {opt std} std standardizes all the variables in the model to have mean 0 and variance 1. It does this while the data set is still in long format. You probably will not want to use this option in most cases but it can sometimes help when the model is having trouble converging. {p_end} {phang} {opt std(varlist)} standardizes only the selected variables to have mean 0 and variance 1. Does not work if the {opt wide} option has been specified. Do NOT use time series notation; just list the names of the variables you want standardized.{p_end} {dlgtab:Model Specification and Constraints Options} {phang} {opt evars} sometimes helps with convergence when there are no predetermined variables in the model. It is an alternative and usually less efficient way of specifying the error terms. But sometimes it helps and may be necessary for replicating results from earlier versions of the program. {p_end} {phang} {opt alphafree} alphafree lets the Alpha (fixed) effects differ across time. Note that, if this option is used, Alpha will be normalized by fixing its variance at 1; otherwise the model sometimes has convergence problems. {p_end} {phang} {opt xfree} lets the effects of all the independent variables (except lagged y) freely differ across time. {p_end} {phang} {opt xfree(varlist)} lets the effects of the specified independent variables freely differ across time. {p_end} {phang} {opt yfree} lets all lagged y effects freely differ across time. {p_end} {phang} {opt yfree(numlist)} allows the specified lagged y effects to freely differ across time. {p_end} {phang} {opt nocsd} (alias is {opt constinv}} Cross-sectional dependence is NOT allowed, i.e. constants are constrained to be equal across waves. This is equivalent to no effect of time. This option sometimes causes convergence problems.{p_end} {phang} {opt errorinv} constrains error variances to be equal across waves. May cause convergence problems{p_end} {phang} {opt re} Random Effects Model (Alphas uncorrelated with Xs){p_end} {dlgtab:Reporting Options} {phang} {opt title(string)} Gives a title to the analysis. This title will appear in both the highlights results and (if requested) the Mplus and lavaan code. For example, {it:ti(Baseline Model)} {p_end} {phang} {opt details} This will show all the output generated by the sem command. Otherwise only a highlights version is presented. This can be useful if you want to make sure the model specification is correct or if you want information not contained in the highlights. {p_end} {phang} {opt showcmd} This will show the sem command generated by xtdpdml. This can be useful to make sure the estimated model is what you wanted. {p_end} {phang} {opt gof} Reports several goodness of fit measures after model estimation. It has the same effect as running the sem postestimation command {cmd:estat gof, stats(all)} after xtdpdml. {p_end} {phang} {opt tsoff} By default, when possible the highlights output produced by xtdpdml will use time-series notation similar to what you see with commands like xtabond, e.g. L3.xvar will represent the lag 3 value of xvar. Since the data are reshaped wide, this is not the same as the name of the variable that was actually used, e.g. it might be that L3.xvar corresponds to xvar2. tsoff will turn off the use of time series notation in the highlights printout and show the names of the variables actually used in the reshaped wide data. This may be useful if you are going to hand-modify the code generated by xtdpdml. {p_end} INCLUDE help displayopts_list {phang} {opt coeflegend} Display the legend instead of the statistics. This can be useful if, say, you are trying to use post-estimation test commands to test hypotheses about effects. {p_end} {phang} {opt decimals(integer)} specifies the number of decimal places to display for the coefficients, standard errors, and confidence limits. It is a shorthand way of specifying {opt cformat}, e.g. {opt dec(3)} is the same as specifying {opt cformat(%9.3f)}. You will get an error if you specify both {opt dec} and {opt cformat}. The value specified must range between 0 and 8; 3 is often a good choice for making the output easier to read. {p_end} {dlgtab:Other Options} {phang} {opt mplus(filenamestub, mplus options)} This will create inp (mplus commands) and data files that can be used by Mplus (has only been tested with Mplus 7.4). This is adapted (with permission) from UCLA's and Michael Mitchell's stata2mplus command but does not require that it be installed. The filenamestub must be specified; it will be used to name the Mplus .inp and .dat files. Everything else is optional. Options {opt r:eplace}, {opt mi:ssing(#)}, {opt a:nalysis}, and {opt out:put} are supported. {opt replace} will cause existing .inp and .dat files to be overwritten. {opt missing} specifies the missing value for all variables; default is -9999. {opt analysis} and {opt output} specify options to be passed to the Mplus analysis and output options. As is the case in Mplus, multiple analysis and output options should be separated by semicolons. {opt xtdpdml} cannot check your Mplus syntax so be careful. {phang} So, for example, if the user specified {cmd:mplus(myfile, r missing(-999999) analysis(iterations = 2000) out(mod(3.84); sampstat))} mpl_myfile.inp and mpl_myfile.dat would be created (replacing any existing files by those names). All missing values would be set to -999999. The Mplus analysis option would set iterations equal to 2000 (default is 1,000). The output option (note how ; was used to separate the two options requested) would request that modification indices > 3.84 be printed out and that sample statistics be included in the output. Obviously you need to understand Mplus to use the analysis and output options; if you don't use them the default values will probably meet most of your needs. You can, of course, edit the .inp file on your own before running Mplus. {phang} Include the {opt dryrun} option if you only want the mplus code. Keep in mind that Mplus only shows the first 8 characters of variable names; also since data are reshaped wide the names of time-varying variables should be 7 characters or less (or 6 characters or less if T > 10) if you want to see the full variable name in the output. Some editing of the .inp file may be required first, e.g. variable names may need to be shortened and/or long lines may have to be split. Like Stata, the Mplus code will default to listwise deletion unless {opt fiml} is specified. Most xdpdml model specification options are supported but the user should still check the coding, e.g. options like {opt semopts(whatever)} will not be carried over into the mplus code. {p_end} {phang} {opt lavaan(filenamestub, r)} This will create R (lavaan commands) and Stata dta files that can be used by R's lavaan package. The filenamestub must be specified; it will be used to name the lavaan .R and .dta files. {opt replace} will cause existing .R and .dta files by those names to be overwritten. You of course need to have R installed and know how to use it. You may want to edit the generated code if you want to change or add options. So, for example, if the user specified {cmd:lav(myfile, r)} lav_myfile.R and lav_myfile.dta would be created (replacing any existing files by those names). {phang} {opt semfile(filename, r)} The generated sem commands will be output to a file called filename.do. The r option can be specified to replace an existing do file by that name. This is useful if you want to try to modify the sem commands in ways that are not easily done with xtdpdml. You may wish to also specify the {opt staywide} option so that data remain correctly formatted for use with the generated do file. {p_end} {phang} {opt store(stubname)} {it: xtdpdml} generates two sets of results: the full results, generated by sem, and a highlights-only set of results which can used with programs like {it: esttab}. The stored results have the names stubname_f and stubname_h, e.g. if you specify {it: store(model1)} the results will be stored as model1_f and model1_h. The default stubname is xtdpdml, so after running {it: xtdpdml} without the {it: store} option you should have stored results xtdpdml_f and xtdpdml_h. You shouldn't try to do any post-estimation commands with the highlights version (e.g. predict, margins) because necessary information may not be stored in the file; use the full version instead. {p_end} {phang} {opt dryrun} This will keep sem from actually being executed. This will catch some errors immediately and can be useful if you want to see the sem command that is generated and/or wish to specify {it:staywide} to reformat the data from long to wide and/or just want to generate mplus or lavaan code and data files. This will often be combined with the {it:showcmd}, {it:mplus}, {it:lavaan}, {it:semfile}, or {it:staywide} options. {p_end} {phang} {opt iterate(#)} Maximum number of iterations allowed. Default is 250. You can increase this number and/or change the maximization technique if the model is having trouble converging. {p_end} {phang} {opt technique(methods)} Maximization techniques used. Default is {it:technique(nr 25 bhhh 25)} unless {opt method(adf)} is specified. You can change this if the model is having trouble converging. If you use {opt method(adf)} (asymptotic distribution free) the default technique is set to {it:technique(nr 25 bfgs 10)} since adf and the bhhh technique do not seem to work together. See {help maximize} for details as well as for information on other options that can be used, e.g. {it:difficult}. {p_end} {phang} {opt semopts(otions)} Other options allowed by sem will be included in the generated sem command. See, for example, {help sem_reporting_options}. {p_end} {phang} {opt fiml} Full Information Maximum Likelihood is used for missing data. This is the equivalent of specifying method(mlmv) on the sem command. Use of fiml sometimes dramatically slows down execution so be patient if you use it! Unless you specify fiml, intermediate data files created by xtdpdml will have panels with missing data deleted. {p_end} {phang} {opt skipcfatransform} and {opt skipconditional} Stata 14.2 changed the way start values are computed. Usually the new procedures work better, especially when fiml is used, but sometimes the old start values speed up execution and/or are better for getting models to converge. These options are ignored in Stata 14.1 or earlier. {p_end} {phang} {opt altstart} is a convenient way to specify both {opt skipcfatransform} and {opt skipconditional}. {p_end} {phang} {opt method()} and {opt vce()} specify the method used to obtain parameter estimates and the technique used to obtain the variance-covariance matrix of the estimates. See {it:{help sem_option_method}}. {opt fiml} is another way of specifying {opt method(mlmv)}. Be patient if you specify something like {opt vce(bootstrap)} as it may take a very long time to run. {p_end} {phang} {opt v12} xtdpdml was written and tested using Stata 13 and 14. The v12 option will also cause it to run under Stata 12.1. This has not been extensively tested so use at your own risk. {p_end} {marker "Special Topics"}{...} {title:Special Topics} {dlgtab:Interactions with Time} {pstd}Users sometimes want constants and variable effects to differ across time. xtdpdml can do this but, because data are reshaped wide, the procedure is different than it is with other programs. {pstd}By default, xtdpdml lets the constants differ across time periods. In other programs this would be like including i.time in the model. The {opt constinv} or {opt nocsd} options can be specified if the user wants the constants to be invariant across time. Note that using these options will sometimes cause convergence problems. {pstd}In other situations the user might want interactions with time where the effect of a variable is free to differ across time periods. In other programs this might be accomplished by specifying something like i.time#c.ses. With xtdpml you use the free options instead, e.g. {opt xfree(ses)} will allow the effect of ses to differ at each time period. {dlgtab:Convergence Problems} {pstd}xtdpdml sometimes has trouble converging to a solution. Here are some things you can try when that happens. {pstd}xtdpdml works best when panels are strongly balanced, T is small (e.g. less than 10), and there is no missing data. If these conditions do not apply to your data, consider doing the following. {p 6 6 2} The {it:fiml} option will often help when some data are missing. {p 6 6 2} Consider restricting your data to a smaller range of time periods where most or all cases have complete data. See the example using the abdata given below. Or, you might consider using only every kth year, e.g. 1980, 1985, 1990, ..., 2015. Using fewer variables in the model may also help. {p 6 6 2} Consider rescaling variables, e.g. measure income in thousands of dollars rather than in dollars. This can help with numerical precision problems. The {opt std} option makes rescaling and standardizing variables easy, although it may make coefficients a little harder to interpret. If {opt std} solves a convergence problem then you may want to rescale the variables yourself in a more interpretable way. {p 6 6 2} Stata 14.2 changed the way start values are computed. Our experience is that models using fiml tend to run far more quickly now. However, sometimes the new start values actually make the models run more slowly or cause convergence problems. If you are running Stata 14.2 or later, you can add the option {opt altstart} to make Stata use the old starting values methods. This is equivalent to specifying both {opt skipcfatransform} and {opt skipconditional}, which can also be specified separately if you want. {p 6 6 2} Mplus sometimes succeeds when Stata has problems and is often much faster. Try the {opt mplus} option if you have access to the program. The {opt lavaan} option may also be worth trying. We prefer Mplus to lavaan but Mplus costs money while lavaan is free (since R and RStudio are free). {p 6 6 2} Finally, remember that problems with regressing Y on lagged Y may not be that severe when T and/or N is large. Methods like xtreg may meet your needs in such situations. But even then, features like fiml and time-invariant independent variables may make it worth your while to pair your dataset down so you can do at least some analyses with xtdpdml. {pstd}There are several other options you can try if you are having problems achieving convergence. Much of this advice applies to many programs, not just xtdpdml. {p 6 6 2} The {it:difficult} option will sometimes work miracles. There is no guarantee it will work but it is very easy to try. {p 6 6 2} The {it:technique} option can be specified to use different maximization techniques. See the help for {help maximize}. {p 6 6 2} {opt evars} sometimes helps with convergence when there are no predetermined variables in the model. It is an alternative and usually less efficient way of specifying the error terms. But sometimes it helps and may be necessary for replicating results from earlier versions of xtdpdml. {p 6 6 2} The {it:iterate} option can be used to increase or decrease the number of iterations xtdpdml tries before giving up. The {it:details} option will show the iteration log. You can increase or decrease the number of iterations depending on whether it appears the program is converging to a solution. {marker examples}{...} {title:Examples} {pstd}Data setup. Data should be xtset first with both panel id and time variable specified. Run these commands before trying the other examples. NOTE: Some of the examples also require that {opt estout} (available from SSC) be installed to run the full example. You may also need to specify {opt set matsize} for bigger problems.{p_end} {phang2}{cmd} use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end} {phang2}xtset id t{p_end} {txt} {pstd}Lag 1 for the y, strictly exogenous and pretermined variables, and a time-invariant variable{p_end} {phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline Model) show{p_end}{txt} {pstd}Same as above, writing out the equivalent sem code.{p_end} {phang2}{cmd}preserve{p_end} {phang2}keep wks lwage union ed id t{p_end} {phang2}reshape wide wks lwage union, i(id) j(t){p_end} {phang2}sem (wks2 <- wks1@b1 lwage1@b2 union1@b3 ed@b4 Alpha@1 E2@1 ) ///{p_end} {phang2} (wks3 <- wks2@b1 lwage2@b2 union2@b3 ed@b4 Alpha@1 E3@1) ///{p_end} {phang2} (wks4 <- wks3@b1 lwage3@b2 union3@b3 ed@b4 Alpha@1 E4@1) ///{p_end} {phang2} (wks5 <- wks4@b1 lwage4@b2 union4@b3 ed@b4 Alpha@1 E5@1) ///{p_end} {phang2} (wks6 <- wks5@b1 lwage5@b2 union5@b3 ed@b4 Alpha@1 E6@1) ///{p_end} {phang2} (wks7 <- wks6@b1 lwage6@b2 union6@b3 ed@b4 Alpha@1), ///{p_end} {phang2} var(e.wks2@0 e.wks3@0 e.wks4@0 e.wks5@0 e.wks6@0) var(Alpha) ///{p_end} {phang2} cov(Alpha*(ed)@0) cov(Alpha*(E2 E3 E4 E5 E6)@0) /// {p_end} {phang2} cov(_OEx*(E2 E3 E4 E5 E6)@0) cov(E2*(E3 E4 E5 E6)@0) ///{p_end} {phang2} cov(E3*(E4 E5 E6)@0) cov(E4*(E5 E6)@0) cov(E5*(E6)@0) ///{p_end} {phang2} cov(union3*(E2)) cov(union4*(E2 E3)) cov(union5*(E2 E3 E4)) ///{p_end} {phang2} cov(union6*(E2 E3 E4 E5)) ///{p_end} {phang2} iterate(250) technique(nr 25 bhhh 25) noxconditional{p_end} {phang2}restore{p_end}{txt} {pstd}Lags 0 and 1 of union are included as independent variables.{p_end} {phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L(0 1).union) ti(Baseline Model + lag 0 of union){p_end}{txt} {pstd}No lag on Xs{p_end} {phang2}{cmd}xtdpdml wks lwage, inv(ed) pre(union) {p_end}{txt} {pstd}No lagged ys included in the model{p_end} {phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ylag(0){p_end}{txt} {pstd}xfree and yfree options -- All lagged Ys and Xs effects free to vary across time. This is how you allow for interactions with time.{p_end} {phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline Model) {p_end} {phang2}est store m1{p_end} {phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) yfree xfree ti(Baseline Model + yfree xfree){p_end} {phang2}est store m2{p_end} {phang2}lrtest m1 m2, stats{p_end}{txt} {pstd}Postestimation commands. Many/most sem postestimation commands work with xtdpdml. For some commands it may be necessary to specify the staywide option so the data set is properly formatted. In the following examples we get several goodness of fit measures. We also replay all the results using 99% confidence levels.{p_end} {phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) {p_end} {phang2}estat gof, stats(all){p_end} {phang2}sem, l(99) nocnsr {p_end}{txt} {pstd}Missing data. The fiml (Full Information Maximum Likelihood) option can be very effective for dealing with data that are missing on a random basis. It is generally much easier to use fiml than it is to use multiple imputation. This example also shows how to use the store option and {opt esttab} (if it is installed) to present a table of results.{p_end} {phang2}{cmd}* Results with no missing data -- provides a baseline for{p_end} {phang2}* assessing how well fiml works.{p_end} {phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline with no missing data) sto(nomiss){p_end} {phang2}* Now we randomly create MD since there is none. But normally you{p_end} {phang2}* would not do this!{p_end} {phang2}replace union = . if _n/10 == int(_n/10){p_end} {phang2}* fiml not used -- 60% of cases lost, estimates are quite a bit off.{p_end} {phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline with missing data, no fiml) sto(nofiml){p_end} {phang2}* fiml used -- works extremely well, at least in this case{p_end} {phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) fiml ti (Baseline with missing data, using fiml) sto(fiml){p_end} {phang2}esttab nomiss_h nofiml_h fiml_h, z scalars(chi2_ms df_ms p_ms BIC AIC) mtitles(nomiss nofiml fiml){p_end} {txt} {pstd}Bollen and Brand (2010) replication. In their 2010 Social Forces paper, Bollen and Brand present a series of Panel Models with Random and Fixed Effects. Many, perhaps all, of their models can be easily replicated with xtdpdml (although hand tweaking of the code may be required in a few cases). Sometimes xtdpdml yields a modestly different model chi-square value than what they reported but we believe the xtdpdml value is the correct one. Here we present the fixed effects model 2 from their Table 3.{p_end} {phang2}{cmd}* Bollen & Brand Social Forces 2010 Fixed Effects Table 3 Model 2 p. 15 {p_end} {phang2}use https://www3.nd.edu/~rwilliam/statafiles/bollenbrand, clear {p_end} {phang2}xtdpdml lnwg hchild marr div, ylag(0) fiml tfix errorinv gof {p_end} {txt} {pstd}Comparisons with xtabond -- coefficients similar, xtdpdml tends to be more significant. Include time dummies in xtabond since constants are free to vary across time (by default) in xtdpdml. Alternatively you could leave the time dummies out of xtabond and use constinv option with xtdpdml. tfix option is necessary since year is coded in years rather than t = 1, 2, ..., T. If you have trouble getting xtdpdml to converge with your version of Stata try adding the {it:evars} or {it:altstart} option. The A/B data are very unbalanced so we restrict the analysis to a shorter time frame. xtabond and xtdpdml report N differently but the same data are analyzed by both. {p_end} {phang2}{cmd}webuse abdata, clear{p_end} {phang2}keep if year >=1978 & year <= 1982{p_end} {phang2}xtabond n l(0/1).w l(0/2).(k ys) yr1976-yr1984, lags(2){p_end} {phang2}estimates store xtabond{p_end} {phang2}xtdpdml n l(0/1).w l(0/2).(k ys) , ylags(1 2) tfix ti(A/B data 1978 - 1982 Only){p_end} {phang2}esttab xtabond xtdpdml_h, mtitles(xtabond xtdpdml){p_end}{txt} {pstd}Create files for Mplus -- This will create Mplus .dat and .inp files but some editing may be necessary. Files are written to the current directory so make sure it is writing to the directory you want. The following will create mpl_m1.dat and mpl_m1.inp, replacing any existing files by those names. dryrun will keep Stata from actually estimating the model, which can be a good idea if you only want the Mplus files. The Mplus output will include the Modification Indices and the descriptive sample statistics. Be sure to use semicolons if you have multiple options for either analysis or output. {p_end}{cmd} {phang2}use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end} {phang2}xtset id t{p_end} {phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) dryrun ti(Baseline Model) mplus(m1, r out(mod; sampstat)){p_end} {phang2}* View or edit the mplus .inp file if you want {p_end} {phang2}doedit mpl_m1.inp {p_end} {phang2}* Run mplus if you want to. Mplus mut be installed! {p_end} {phang2}* The correct command may depend on your OS and your computer setup. {p_end} {phang2}!mplus mpl_m1.inp {p_end} {phang2}* View or edit the mplus output file if you want {p_end} {phang2}doedit mpl_m1.out {p_end} {txt} {pstd}Create files for lavaan -- This will create lavaan .R and and .dta files but some editing may be necessary. Files are written to the current directory so make sure it is writing to the directory you want. The following will create lav_m1.R and lav_m1.R, replacing any existing files by those names. dryrun will keep Stata from actually estimating the model, which can be a good idea if you only want the lavaan files. You will of course need to install R (and probably Rstudio) and know how to run it, but that is pretty easy to do. {p_end}{cmd} {phang2}use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end} {phang2}xtset id t{p_end} {phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) dryrun ti(Baseline Model) lavaan(m1, r){p_end} {phang2}* View or edit the lavaan .R file if you want {p_end} {phang2}doedit lav_m1.R {p_end} {txt} {pstd}Generate a sem do file -- You can output the generated sem commands to a do file. This may be useful if you want to modify the commands in ways not easily done with xtdpdml. In this example the file mytry.do is created and (because the r option is specified) any existing file by that name is overwritten. The staywide option keeps the data in the wide format that is required by sem. {p_end}{cmd} {phang2}use https://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end} {phang2}xtset id t{p_end} {phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) staywide semfile(mytry, r){p_end} {txt} {marker authors}{...} {title:Authors} {p 5 5} Richard Williams, University of Notre Dame, Department of Sociology{break} Paul Allison, University of Pennsylvania, Department of Sociology{break} Enrique Moral Benito, Banco de Espana, Madrid {break} Support: Richard.A.Williams.5@ND.Edu{break} Web Page: {browse "https://www3.nd.edu/~rwilliam/dynamic/index.html"}{break} {marker acknowledgments}{...} {title:Acknowledgments} {p 5 5} Ken Bollen and Jennie Brand graciously provided us with the data from their 2010 Social Forces paper to use in our examples. UCLA and Michael Mitchell kindly allowed us to take their stata2mplus program and adapt it for our purposes. Code from Mead Over's linewrap program was modified for use with the semfile option. William Lisowski and Clyde Schechter provided comments that improved program coding. Jacob Long gave us ideas that were very useful for writing the lavaan option. Paul von Hippel offered helpful comments on the program's documentation. Kristin MacDonald and other Stata Corp staff were very helpful in modifying Stata so that sem and xtdpdml would execute much more quickly. {marker references}{...} {title:References} {p 5 5} Allison, Paul D., Richard Williams and Enrique Moral-Benito. 2017. "Maximum Likelihood for Cross-Lagged Panel Models with Fixed Effects." Socius 3: 1-17. {browse "http://journals.sagepub.com/doi/suppl/10.1177/2378023117710578"} {break} {p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2018. "Linear Dynamic Panel-Data Estimation using Maximum Likelihood and Structural Equation Modeling." The Stata Journal 18(2): 293-326. A pre-publication version is at {browse "https://www3.nd.edu/~rwilliam/dynamic/SJPaper.pdf"}{break} {p 5 5} Moral-Benito, Enrique, Paul Allison & Richard Williams (2018): Dynamic panel data modelling using maximum likelihood: an alternative to Arellano-Bond. Applied Economics, DOI: 10.1080/00036846.2018.1540854. A pre-publication version is at {browse "https://www3.nd.edu/~rwilliam/dynamic/Benito_Allison_Williams.pdf"}{break} {p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2015. "Linear Dynamic Panel-Data Estimation using Maximum Likelihood and Structural Equation Modeling". Presented July 30, 2015 at the 2015 Stata Users Conference in Columbus, Ohio. {browse "https://www3.nd.edu/~rwilliam/dynamic/xtdpdml_Stata2015.pdf"}{break} {p 5 5}Allison, Paul D. 2015. "Don't Put Lagged Dependent Variables in Mixed Models." {browse "http://statisticalhorizons.com/lagged-dependent-variables"} {break} {p 5 5}Moral-Benito, Enrique. 2013. "Likelihood-based Estimation of Dynamic Panels with Predetermined Regressors." Journal of Business and Economic Statistics 31:4, 451-472. {p 5 5}Bollen, Kenneth, and Jennie Brand. 2010. "A General Panel Model with Random and Fixed Effects: A Structural Equations Approach." Social Forces 89:1, 1-34. {marker "suggested citation"}{...} {title:Suggested citations if using {cmd:xtdpdml} in published work } {p 5 5}{cmd:xtdpdml} is not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such. The suggested citations are {p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2018. "Linear Dynamic Panel-Data Estimation using Maximum Likelihood and Structural Equation Modeling." The Stata Journal 18(2): 293-326. A pre-publication version is at {browse "https://www3.nd.edu/~rwilliam/dynamic/SJPaper.pdf"}{break} {p 5 5} Allison, Paul D., Richard Williams and Enrique Moral-Benito. 2017. "Maximum Likelihood for Cross-Lagged Panel Models with Fixed Effects." Socius 3: 1-17. {browse "http://journals.sagepub.com/doi/suppl/10.1177/2378023117710578"} {break} {p 5 5} Moral-Benito, Enrique, Paul Allison & Richard Williams (2018): Dynamic panel data modelling using maximum likelihood: an alternative to Arellano-Bond. Applied Economics, DOI: 10.1080/00036846.2018.1540854. A pre-publication version is at {browse "https://www3.nd.edu/~rwilliam/dynamic/Benito_Allison_Williams.pdf"}{break} {p 5 5}Williams, Richard, Paul D. Allison and Enrique Moral-Benito. 2015. "Linear Dynamic Panel-Data Estimation using Maximum Likelihood and Structural Equation Modeling". Presented July 30, 2015 at the 2015 Stata Users Conference in Columbus, Ohio. {browse "https://www3.nd.edu/~rwilliam/dynamic/xtdpdml_Stata2015.pdf"}{break}