------------------------------------------------------------------------------- help forivpois-------------------------------------------------------------------------------

IV/GMM Poisson regression

Syntax

ivpois[varlist] [if] [in] [,exog(varlist)endog(varlist)otheroptions]+--------------------+ ----+ Table of Contents +-----------------------------------------------

General description of estimator Examples Description of options Remarks and saved results References Acknowledgements Citation of

ivpoisAuthor information+-------------+ ----+ Description +------------------------------------------------------

Using the Mata function optimize available in Stata 10,

ivpoisimplements a Generalized Method of Moments (GMM) estimator (due to Mullahy, 1997) of Poisson regression that allows additional exogenous variables that have no direct impact on the dependent variable to be specified, and endogenous variables to be instrumented by excluded instruments (see ivreg2 or rd if installed, and references therein; ssc install ivreg2 and ssc install rd to install), hence the acronym for Instrumental Variables (IV) in its name (see Baum et al. 2007 or Nichols 2007 for more on IV).Standard errors are estimated by the asymptotic approximation outlined by Hansen (1982), requiring "large" samples, though bootstrapped standard errors may outperform these in many situations (the latter are obtained by prefixing the command with bootstrap:). If clustering of errors is suspected, the cluster option may be supplied to bootstrap.

Variables specified in the

exog(varlist)option and not in the primaryvarlistare used as excluded instruments that are correlated with the endogenous variables inendog(varlist)and not the error term.Note that Poisson regression assumes

E[y|X]=exp(Xb)to get a consistent estimate ofb, so it is appropriate for a wide variety of models where the dependent variable is nonnegative (zero or positive), not just where the dependent variable measures counts of events. Wherever you might be inclined to take the logarithm of a nonnegative dependent variableyand use ivregress,ivpoisoffers an alternative that includes in the estimation observations whereyis zero.Assuming

E[y|X]=exp(Xb), one can assume either an additive error or a multiplicative error, which produce different versions of the moment conditions. The model used byivpoisassumes a multiplicative error. There is no explicit support for panel models and cluster-robust SEs are supplied by bootstrap, using thecluster(varlist)option.On the moment conditions, the additive form for the error posits that

y=exp(xb)+uand gives moment conditions of the formZ'(y-exp(xb))=0, whereas the multiplicative form positsy=exp(xb)uand gives moment conditions of the formZ'(y-exp(Xb)/exp(Xb)))=Z'(y*exp(-Xb)-1))=0for instruments Z satisfyingE(Z'u)=0(where Z includes all exogenous variables, both included and excluded instruments). Angrist (2001) shows that in a model with endogenous binary treatment and a binary instrument, the latter procedure (assuming a multiplicative error) estimates a proportional local average treatment effect (LATE) parameter in models with no covariates. The latter is also more intuitively appealing and congruent with poisson and glm, and the assumption can be rewritteny=exp(xb)u=exp(xb)*exp(v)=exp(xb+v)soln(y)=xb+vassumingy>0to provide the natural link to OLS. Windmeijer (2006) has a very useful discussion and further related models.+----------+ ----+ Examples +---------------------------------------------------------

In each example, you can cut and paste the entire block of code to the Command window, or click on commands one by one to run.

------------------------------------------------------------------------------- There is no theoretical model to support the next set of commands; they merely illustrate syntax. You will need to install estout from SSC to run the

esttabcommand. ------------------------------------------------------------------------------- est clear sysuse auto, clear poisson mpg disp wei, r est sto pois ivpois mpg wei, exog(turn) endog(disp) est sto endog ivpois mpg disp wei, exog(turn) est sto excl ivpois mpg disp wei est sto noexcl g manuf=word(make,1) bs, cl(manuf): ivpois mpg wei, exog(turn) endog(disp) est sto clustbs esttab *, nogaps se mti------------------------------------------------------------------------------- Comparison of poisson to

ivpoiswith an exposure variable and a small sample. ------------------------------------------------------------------------------- est clear webuse dollhill3, clear tab agecat, gen(a) drop a4 a5 poisson deaths smokes a?, exposure(pyears) r est sto p bs: poisson deaths smokes a?, exposure(pyears) est sto bsp ivpois deaths smokes a?, exposure(pyears) est sto gmm bs: ivpois deaths smokes a?, exposure(pyears) est sto bsgmm esttab *, nogaps se mti------------------------------------------------------------------------------- The following three examples offer a comparison of linear regression of ln(y) on X to Poisson regression of y on X, and each model has some real economic content. You will need to install ivreg2 from SSC to run the

ivreg2command. ------------------------------------------------------------------------------- An example from Card (1995): use http://fmwww.bc.edu/ec-p/data/wooldridge/card, clear loc x "exper* smsa* south mar black reg662-reg669" ivreg2 lw `x' (educ=nearc4) ivpois wage `x', endog(educ) exog(nearc4)An example from Mullahy (1997) where ivreg2 reports no evidence of a weak instruments problem: use http://fmwww.bc.edu/RePEc/bocode/i/ivp_bwt.dta, clear g lnbw=ln(bw) loc x "parity white male" loc z "edfwhite edmwhite incwhite cigtax88" ivreg2 lnbw `x' (cigspreg=`z') ivpois bw `x', endog(cigspreg) exog(`z')

An example from Mullahy (1997) where ivreg2 reports evidence of a weak instruments problem: use http://fmwww.bc.edu/RePEc/bocode/i/ivp_cig.dta, clear g lnc=ln(cigpacks) loc x "pcigs79 rest79 income age qage educ qeduc famsize white" loc z "ageeduc cage ceduc pcigs78 restock" ivreg2 lnc `x' (k210=`z') ivpois cigpacks `x', endog(k210) exog(`z')

------------------------------------------------------------------------------- An alternative Generalized Linear Model (glm) approach, due to Hardin, Schmiediche, and Carroll (2003), is designed to address endogeneity due to measurement error. Type findit qvf to install. The following example, loosely based on the qvf help file, favors the GMM approach: ------------------------------------------------------------------------------- clear all set obs 1000 gen x1 = uniform() gen x2 = uniform() gen x3 = uniform() gen err = invnorm(uniform()) gen y = exp(1+2*x1+3*x2+4*x3+err) gen t3 = .8*x3 + .6*invnorm(uniform()) qvf y x1 x2 x3 (x1 x2 t3), link(log) fam(poisson) est sto qvf bs: qvf y x1 x2 x3 (x1 x2 t3), link(log) fam(poisson) est sto bsqvf ivpois y x1 x2, endog(x3) exog(t3) est sto gmm bs: ivpois y x1 x2, endog(x3) exog(t3) est sto bsgmm cap ssc inst estout esttab *, nogaps se mti

+-----------------+ ----+ Options summary +--------------------------------------------------

exog(varlist)specifies a list of exogenous variables, possibly included in the primaryvarlist. Exogenous variables not included in the primaryvarlistare consideredexcluded instruments.

endog(varlist)specifies a list of endogenous variables, possibly included in the primaryvarlist. Endogenous variables not included in the primaryvarlistare added to that list.

exposure(varname_e),offset(varname_o),constraints(constraints),collinear; see[R] estimation options.

from(matrix)specifies a row matrix of initial values for optimize to use.

level(#); see[R] estimation options.Other

optionscan be supplied to a bootstrap prefix, includingreps(n)requesting a number of repetitions other than the default of 50 and thecluster(varlist)option. In practice, the number of bootstrap replications should probably be much larger than 50, and convergence should be examined, though simulations show that for a correctly specified model, 50 are sufficient for good performance.+---------------------------+ ----+ Remarks and saved results +----------------------------------------

The command saves the following results in

e():Scalars

e(N)Number of observations used in estimationMacros

e(cmd)ivpoise(version)Version numbere(depvar)Name of dependent variableMatrices

e(b)Coefficient vectore(V)VCE estimateFunctions

e(sample)Marks estimation sample

On the GMM approach, see:

Angrist, Joshua D. 2001. "Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practice."

Journal of Business and Economic Statistics,19:2-16.Hansen, Lars P. 1982. "Large Sample Properties of Generalized Methods of Moments Estimators."

Econometrica,50:1029-1054.Mullahy, John. 1997. "Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior."

TheReview of Economics and Statistics,79(4):586-593.Windmeijer, Frank. 2006. "GMM for Panel Count Data Models." Discussion Paper No. 06/591, Department of Economics, University of Bristol.

On IV methods, see:

Baum, Christopher F., Mark E. Schaffer, and Steven Stillman. 2007. "Enhanced routines for instrumental variables/generalized method of moments estimation and testing." Stata Journal 7(4):465-506.

Nichols, Austin. 2007. "Causal inference with observational data." Stata Journal 7(4):507-541.

For the glm-style approach, see:

Hardin, James W., Henrik Schmiediche, and Raymond J. Carroll. 2003. "Instrumental variables, bootstrapping, and generalized linear models."

The Stata Journal3(4): 351-360. See also http://www.stata.com/merror/.The example using earnings as the outcome:

Card, David E. 1995. "Using Geographic Variation in College Proximity to Estimate the Return to Schooling" in

Aspects of Labour Economics:Essays in Honour of John Vanderkamp,edited by Louis Christofides, E. Kenneth Grant and Robert Swindinsky. University of Toronto Press. See also NBER WP 4483.The central code was promulgated by Bill Gould at a Seminar in DC on November 2, 2007. That Mata code was written by David Drukker at an earlier date, and reads as follows:

m=((1/rows(Z)):*Z'((y:*exp(-X*b') :- 1)))'crit=(m*W*m')Type viewsource ivpois.ado to see that code

in situ(four lines into the section that beginsmata:, where all the Mata code appears).Thanks to John Mullahy for sharing the data used in Mullahy (1997), and for writing that paper. Thanks to Mark Schaffer for pointing out that using the zero vector as an initial value can result in failure of optimize in some cases, and suggesting a return to using the estimated coefficients from poisson as initial values. Thanks to Henry Schneider for asking for an exposure option. Thanks to John Zedlewski for asking for a speed improvment using the d1 evaluatortype of optimize and asymptotic standard errors.

ivpoisis not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such:Nichols, Austin. 2007. ivpois: Stata module for IV/GMM Poisson regression. http://ideas.repec.org/c/boc/bocode/s456890.html

AuthorAustin Nichols Urban Institute Washington, DC, USA austinnichols@gmail.com

Also seeManual:

[U] 23Estimationandpost-estimationcommands[R] bootstrap[R] poisson[R] regress[R] ivregressOn-line: help for (if installed) ivreg2, overid, ivendog, ivhettest, ivreset, xtivreg2, xtoverid, ranktest, condivreg; qvf.