*! PPML (Panel) Structural Gravity Estimation, by Tom Zylkin *! Department of Economics, National University of Singapore *! Example do file, April, 2017 *! *! Suggested citation: Larch, Wanner, Yotov, & Zylkin (2017): *! "The Currency Union Effect: A PPML Re-assessment with High-dimensional Fixed Effects" *! Drexel University School of Economics Working Paper 2017-07 clear all global this_dir "D:\Tom (local)\PPML experiment" cd "$this_dir" cap set matsize 800 cap set matsize 11000 cap set maxvar 32000 use EXAMPLE_TRADE_FTA_DATA, clear // Trade between 35 countries for the years 1986 - 2004, every four years. // Broken out by manufacturing, non-manufacturing, as well as total trade. // Sources: // - Trade: UN COMTRADE // - FTAs: NSF-Kellogg (Baier & Bergstrand) database // - "gravity" variables: CEPII (Head & Mayer) gravity data // I use a small sample (35 countries) so that verifying the result using glm // can be done in a reasonable amount of time (here, ~5-10 minutes.) // solve for the average partial effect of an FTA on total trade ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) // The equivalent estimation code using glm is: cap egen exp_time = group(isoexp year) cap egen imp_time = group(isoimp year) cap egen pair = group(isoexp isoimp) xi i.exp_time i.imp_time i.pair qui glm trade _Iexp_time* _Iimp_time* _Ipair* fta if category == "TOTAL", family(poisson) diff iter(25) cluster(pair) est save GLM_RESULT, replace est use GLM_RESULT esttab, keep(fta) se stats(ll N) /* or, if "ppml" is installed: xi: ppml trade i.exp_time i.imp_time i.pair fta if category == "TOTAL", diff iter(25) cluster(pair) est tab, keep(fta) se ll */ // some options: // use symmetric pair fixed effects ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) sym // add time trends ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) trend // multi-way clustering (1): exporter, importer, year ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) multi // multi-way clustering (2): user-specified cap egen pair = group(isoexp isoimp) ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) cluster(pair year) // manufacturing trade only ppml_panel_sg trade fta if category == "MANUF", ex(isoexp) im(isoimp) y(year) // non-manufacturing trade only ppml_panel_sg trade fta if category == "NONMANUF", ex(isoexp) im(isoimp) y(year) // Test if FTAs have had a larger effect on non-manufacturing trade vs manufacturing trade // (requires including an "industry" code) gen fta_NONMANUF = fta * (category == "NONMANUF") ppml_panel_sg trade fta* if category != "TOTAL", ex(isoexp) im(isoimp) ind(category) y(year) // Estimating more traditional gravity variables (using nopair), year 2000 only: ppml_panel_sg trade ln_dist colony contig comlang_off comleg fta if category == "TOTAL" & year == 2000, ex(isoexp) im(isoimp) y(year) nopair // The equivalent estimation code using glm is: cap egen exp_time = group(isoexp year) cap egen imp_time = group(isoimp year) cap egen pair = group(isoexp isoimp) xi: glm trade i.exp_time i.imp_time ln_dist colony contig comlang_off comleg fta if category == "TOTAL" & year == 2000, diff iter(25) family(poisson) ro // for one year only, "multiway" defaults to clustering on ppml_panel_sg trade ln_dist colony contig comlang_off comleg fta if category == "TOTAL" & year == 2000, ex(isoexp) im(isoimp) y(year) nopair multi // Notes on some common issues: ** 1. Collinearity. *Consider the following regression: ppml_panel_sg trade fta ln_distw if category == "TOTAL", ex(isoexp) im(isoimp) y(year) * Note that ln_distw is a pairwise variable that does not vary over time. Thus, it is collinear * with the implied "pair" fixed effects. If you want to estimate the effects of time-invariant bilateral * regressors such as ln_distw, use the -nopair- option. ** 2a. Non-existence. * Instead of ln_dist, consider now the following variable gen test = ln_distw * (trade > 0) + uniform() * (trade == 0) * which is a variation of ln_dist that will no longer be invariant over time within pairs. * However, it will still be collinear with the implied set of pair fixed effects over the subsample where trade>0. * Santos Silva & Tenreyro (2010) refer to this as a "non-existence" issue: while it is not technically a "collinearity" problem, * it is still possible that estimates from this regression will not actually exist. * Thus, ppml_panel_sg checks and excludes cases like this as well: ppml_panel_sg trade fta test if category == "TOTAL", ex(isoexp) im(isoimp) y(year) ** 2b. Dropping observations that are perfectly predicted by exluded regressors * by default, ppml_panel_sg drops all y=0 observations that are perfectly predicted by * excluded regressors. (This is the same default behavior as in -ppml-.) * Example: a dummy which is 1 for y=0, 0 otherwise gen test2 = (trade == 0) ppml_panel_sg trade fta test2 if category == "TOTAL", ex(isoexp) im(isoimp) y(year) * To prevent these observations from being dropped, use the "keep" option ppml_panel_sg trade fta test2 if category == "TOTAL", ex(isoexp) im(isoimp) y(year) keep ** 3. Multiple trade flows for the same pair in a given year. * Note that there are 3 industry category in the current data ("MANUF", "NONMANUF", and "TOTAL"). * Suppose I forgot that the data is structured this way and went ahead with the following: ppml_panel_sg trade fta, ex(isoexp) im(isoimp) y(year) // (note: I have forgotten the "ind" option here and have not used an "if" statement) * This will produce an error saying that the ID vars provided do not uniquely describe the data. * The error will also remind you that, if this really is the specification you intended, ie, * without exploiting industry-level variation, you can usually collapse the data to get the same result. * To see this, compare cap egen exp_time = group(isoexp year) cap egen imp_time = group(isoimp year) cap egen pair = group(isoexp isoimp) xi: glm trade i.exp_time i.imp_time ln_dist colony contig comlang_off comleg fta if year == 2000, diff iter(25) family(poisson) cluster(pair) * with the original results from lines 81 and 88, which only used the "TOTAL" category.