*! PPML (Panel) Structural Gravity Estimation, by Tom Zylkin
*! Department of Economics, National University of Singapore
*! Example do file, April, 2017
*!
*! Suggested citation: Larch, Wanner, Yotov, & Zylkin (2017):
*! "The Currency Union Effect: A PPML Re-assessment with High-dimensional Fixed Effects"
*! Drexel University School of Economics Working Paper 2017-07
clear all
global this_dir "D:\Tom (local)\PPML experiment"
cd "$this_dir"
cap set matsize 800
cap set matsize 11000
cap set maxvar 32000
use EXAMPLE_TRADE_FTA_DATA, clear
// Trade between 35 countries for the years 1986 - 2004, every four years.
// Broken out by manufacturing, non-manufacturing, as well as total trade.
// Sources:
// - Trade: UN COMTRADE
// - FTAs: NSF-Kellogg (Baier & Bergstrand) database
// - "gravity" variables: CEPII (Head & Mayer) gravity data
// I use a small sample (35 countries) so that verifying the result using glm
// can be done in a reasonable amount of time (here, ~5-10 minutes.)
// solve for the average partial effect of an FTA on total trade
ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year)
// The equivalent estimation code using glm is:
cap egen exp_time = group(isoexp year)
cap egen imp_time = group(isoimp year)
cap egen pair = group(isoexp isoimp)
xi i.exp_time i.imp_time i.pair
qui glm trade _Iexp_time* _Iimp_time* _Ipair* fta if category == "TOTAL", family(poisson) diff iter(25) cluster(pair)
est save GLM_RESULT, replace
est use GLM_RESULT
esttab, keep(fta) se stats(ll N)
/* or, if "ppml" is installed:
xi: ppml trade i.exp_time i.imp_time i.pair fta if category == "TOTAL", diff iter(25) cluster(pair)
est tab, keep(fta) se ll
*/
// some options:
// use symmetric pair fixed effects
ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) sym
// add time trends
ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) trend
// multi-way clustering (1): exporter, importer, year
ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) multi
// multi-way clustering (2): user-specified
cap egen pair = group(isoexp isoimp)
ppml_panel_sg trade fta if category == "TOTAL", ex(isoexp) im(isoimp) y(year) cluster(pair year)
// manufacturing trade only
ppml_panel_sg trade fta if category == "MANUF", ex(isoexp) im(isoimp) y(year)
// non-manufacturing trade only
ppml_panel_sg trade fta if category == "NONMANUF", ex(isoexp) im(isoimp) y(year)
// Test if FTAs have had a larger effect on non-manufacturing trade vs manufacturing trade
// (requires including an "industry" code)
gen fta_NONMANUF = fta * (category == "NONMANUF")
ppml_panel_sg trade fta* if category != "TOTAL", ex(isoexp) im(isoimp) ind(category) y(year)
// Estimating more traditional gravity variables (using nopair), year 2000 only:
ppml_panel_sg trade ln_dist colony contig comlang_off comleg fta if category == "TOTAL" & year == 2000, ex(isoexp) im(isoimp) y(year) nopair
// The equivalent estimation code using glm is:
cap egen exp_time = group(isoexp year)
cap egen imp_time = group(isoimp year)
cap egen pair = group(isoexp isoimp)
xi: glm trade i.exp_time i.imp_time ln_dist colony contig comlang_off comleg fta if category == "TOTAL" & year == 2000, diff iter(25) family(poisson) ro
// for one year only, "multiway" defaults to clustering on
ppml_panel_sg trade ln_dist colony contig comlang_off comleg fta if category == "TOTAL" & year == 2000, ex(isoexp) im(isoimp) y(year) nopair multi
// Notes on some common issues:
** 1. Collinearity.
*Consider the following regression:
ppml_panel_sg trade fta ln_distw if category == "TOTAL", ex(isoexp) im(isoimp) y(year)
* Note that ln_distw is a pairwise variable that does not vary over time. Thus, it is collinear
* with the implied "pair" fixed effects. If you want to estimate the effects of time-invariant bilateral
* regressors such as ln_distw, use the -nopair- option.
** 2a. Non-existence.
* Instead of ln_dist, consider now the following variable
gen test = ln_distw * (trade > 0) + uniform() * (trade == 0)
* which is a variation of ln_dist that will no longer be invariant over time within pairs.
* However, it will still be collinear with the implied set of pair fixed effects over the subsample where trade>0.
* Santos Silva & Tenreyro (2010) refer to this as a "non-existence" issue: while it is not technically a "collinearity" problem,
* it is still possible that estimates from this regression will not actually exist.
* Thus, ppml_panel_sg checks and excludes cases like this as well:
ppml_panel_sg trade fta test if category == "TOTAL", ex(isoexp) im(isoimp) y(year)
** 2b. Dropping observations that are perfectly predicted by exluded regressors
* by default, ppml_panel_sg drops all y=0 observations that are perfectly predicted by
* excluded regressors. (This is the same default behavior as in -ppml-.)
* Example: a dummy which is 1 for y=0, 0 otherwise
gen test2 = (trade == 0)
ppml_panel_sg trade fta test2 if category == "TOTAL", ex(isoexp) im(isoimp) y(year)
* To prevent these observations from being dropped, use the "keep" option
ppml_panel_sg trade fta test2 if category == "TOTAL", ex(isoexp) im(isoimp) y(year) keep
** 3. Multiple trade flows for the same pair in a given year.
* Note that there are 3 industry category in the current data ("MANUF", "NONMANUF", and "TOTAL").
* Suppose I forgot that the data is structured this way and went ahead with the following:
ppml_panel_sg trade fta, ex(isoexp) im(isoimp) y(year) // (note: I have forgotten the "ind" option here and have not used an "if" statement)
* This will produce an error saying that the ID vars provided do not uniquely describe the data.
* The error will also remind you that, if this really is the specification you intended, ie,
* without exploiting industry-level variation, you can usually collapse the data to get the same result.
* To see this, compare
cap egen exp_time = group(isoexp year)
cap egen imp_time = group(isoimp year)
cap egen pair = group(isoexp isoimp)
xi: glm trade i.exp_time i.imp_time ln_dist colony contig comlang_off comleg fta if year == 2000, diff iter(25) family(poisson) cluster(pair)
* with the original results from lines 81 and 88, which only used the "TOTAL" category.