help for ivpois

IV/GMM Poisson regression


ivpois [varlist] [if] [in] [, exog(varlist) endog(varlist) other options]

+--------------------+ ----+ Table of Contents +-----------------------------------------------

General description of estimator Examples Description of options Remarks and saved results References Acknowledgements Citation of ivpois Author information

+-------------+ ----+ Description +------------------------------------------------------

Using the Mata function optimize available in Stata 10, ivpois implements a Generalized Method of Moments (GMM) estimator (due to Mullahy, 1997) of Poisson regression that allows additional exogenous variables that have no direct impact on the dependent variable to be specified, and endogenous variables to be instrumented by excluded instruments (see ivreg2 or rd if installed, and references therein; ssc install ivreg2 and ssc install rd to install), hence the acronym for Instrumental Variables (IV) in its name (see Baum et al. 2007 or Nichols 2007 for more on IV).

Standard errors are estimated by the asymptotic approximation outlined by Hansen (1982), requiring "large" samples, though bootstrapped standard errors may outperform these in many situations (the latter are obtained by prefixing the command with bootstrap:). If clustering of errors is suspected, the cluster option may be supplied to bootstrap.

Variables specified in the exog(varlist) option and not in the primary varlist are used as excluded instruments that are correlated with the endogenous variables in endog(varlist) and not the error term.

Note that Poisson regression assumes E[y|X]=exp(Xb) to get a consistent estimate of b, so it is appropriate for a wide variety of models where the dependent variable is nonnegative (zero or positive), not just where the dependent variable measures counts of events. Wherever you might be inclined to take the logarithm of a nonnegative dependent variable y and use ivregress, ivpois offers an alternative that includes in the estimation observations where y is zero.

Assuming E[y|X]=exp(Xb), one can assume either an additive error or a multiplicative error, which produce different versions of the moment conditions. The model used by ivpois assumes a multiplicative error. There is no explicit support for panel models and cluster-robust SEs are supplied by bootstrap, using the cluster(varlist) option.

On the moment conditions, the additive form for the error posits that y=exp(xb)+u and gives moment conditions of the form Z'(y-exp(xb))=0, whereas the multiplicative form posits y=exp(xb)u and gives moment conditions of the form Z'(y-exp(Xb)/exp(Xb)))=Z'(y*exp(-Xb)-1))=0 for instruments Z satisfying E(Z'u)=0 (where Z includes all exogenous variables, both included and excluded instruments). Angrist (2001) shows that in a model with endogenous binary treatment and a binary instrument, the latter procedure (assuming a multiplicative error) estimates a proportional local average treatment effect (LATE) parameter in models with no covariates. The latter is also more intuitively appealing and congruent with poisson and glm, and the assumption can be rewritten y=exp(xb)u=exp(xb)*exp(v)=exp(xb+v) so ln(y)=xb+v assuming y>0 to provide the natural link to OLS. Windmeijer (2006) has a very useful discussion and further related models.

+----------+ ----+ Examples +---------------------------------------------------------

In each example, you can cut and paste the entire block of code to the Command window, or click on commands one by one to run.

------------------------------------------------------------------------------- There is no theoretical model to support the next set of commands; they merely illustrate syntax. You will need to install estout from SSC to run the esttab command. ------------------------------------------------------------------------------- est clear sysuse auto, clear poisson mpg disp wei, r est sto pois ivpois mpg wei, exog(turn) endog(disp) est sto endog ivpois mpg disp wei, exog(turn) est sto excl ivpois mpg disp wei est sto noexcl g manuf=word(make,1) bs, cl(manuf): ivpois mpg wei, exog(turn) endog(disp) est sto clustbs esttab *, nogaps se mti

------------------------------------------------------------------------------- Comparison of poisson to ivpois with an exposure variable and a small sample. ------------------------------------------------------------------------------- est clear webuse dollhill3, clear tab agecat, gen(a) drop a4 a5 poisson deaths smokes a?, exposure(pyears) r est sto p bs: poisson deaths smokes a?, exposure(pyears) est sto bsp ivpois deaths smokes a?, exposure(pyears) est sto gmm bs: ivpois deaths smokes a?, exposure(pyears) est sto bsgmm esttab *, nogaps se mti

------------------------------------------------------------------------------- The following three examples offer a comparison of linear regression of ln(y) on X to Poisson regression of y on X, and each model has some real economic content. You will need to install ivreg2 from SSC to run the ivreg2 command. ------------------------------------------------------------------------------- An example from Card (1995): use http://fmwww.bc.edu/ec-p/data/wooldridge/card, clear loc x "exper* smsa* south mar black reg662-reg669" ivreg2 lw `x' (educ=nearc4) ivpois wage `x', endog(educ) exog(nearc4)

An example from Mullahy (1997) where ivreg2 reports no evidence of a weak instruments problem: use http://fmwww.bc.edu/RePEc/bocode/i/ivp_bwt.dta, clear g lnbw=ln(bw) loc x "parity white male" loc z "edfwhite edmwhite incwhite cigtax88" ivreg2 lnbw `x' (cigspreg=`z') ivpois bw `x', endog(cigspreg) exog(`z')

An example from Mullahy (1997) where ivreg2 reports evidence of a weak instruments problem: use http://fmwww.bc.edu/RePEc/bocode/i/ivp_cig.dta, clear g lnc=ln(cigpacks) loc x "pcigs79 rest79 income age qage educ qeduc famsize white" loc z "ageeduc cage ceduc pcigs78 restock" ivreg2 lnc `x' (k210=`z') ivpois cigpacks `x', endog(k210) exog(`z')

------------------------------------------------------------------------------- An alternative Generalized Linear Model (glm) approach, due to Hardin, Schmiediche, and Carroll (2003), is designed to address endogeneity due to measurement error. Type findit qvf to install. The following example, loosely based on the qvf help file, favors the GMM approach: ------------------------------------------------------------------------------- clear all set obs 1000 gen x1 = uniform() gen x2 = uniform() gen x3 = uniform() gen err = invnorm(uniform()) gen y = exp(1+2*x1+3*x2+4*x3+err) gen t3 = .8*x3 + .6*invnorm(uniform()) qvf y x1 x2 x3 (x1 x2 t3), link(log) fam(poisson) est sto qvf bs: qvf y x1 x2 x3 (x1 x2 t3), link(log) fam(poisson) est sto bsqvf ivpois y x1 x2, endog(x3) exog(t3) est sto gmm bs: ivpois y x1 x2, endog(x3) exog(t3) est sto bsgmm cap ssc inst estout esttab *, nogaps se mti

+-----------------+ ----+ Options summary +--------------------------------------------------

exog(varlist) specifies a list of exogenous variables, possibly included in the primary varlist. Exogenous variables not included in the primary varlist are considered excluded instruments.

endog(varlist) specifies a list of endogenous variables, possibly included in the primary varlist. Endogenous variables not included in the primary varlist are added to that list.

exposure(varname_e), offset(varname_o), constraints(constraints), collinear; see [R] estimation options.

from(matrix) specifies a row matrix of initial values for optimize to use.

level(#); see [R] estimation options.

Other options can be supplied to a bootstrap prefix, including reps(n) requesting a number of repetitions other than the default of 50 and the cluster(varlist) option. In practice, the number of bootstrap replications should probably be much larger than 50, and convergence should be examined, though simulations show that for a correctly specified model, 50 are sufficient for good performance.

+---------------------------+ ----+ Remarks and saved results +----------------------------------------

The command saves the following results in e():

Scalars e(N) Number of observations used in estimation

Macros e(cmd) ivpois e(version) Version number e(depvar) Name of dependent variable

Matrices e(b) Coefficient vector e(V) VCE estimate

Functions e(sample) Marks estimation sample


On the GMM approach, see:

Angrist, Joshua D. 2001. "Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practice." Journal of Business and Economic Statistics, 19:2-16.

Hansen, Lars P. 1982. "Large Sample Properties of Generalized Methods of Moments Estimators." Econometrica, 50:1029-1054.

Mullahy, John. 1997. "Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior." The Review of Economics and Statistics, 79(4):586-593.

Windmeijer, Frank. 2006. "GMM for Panel Count Data Models." Discussion Paper No. 06/591, Department of Economics, University of Bristol.

On IV methods, see:

Baum, Christopher F., Mark E. Schaffer, and Steven Stillman. 2007. "Enhanced routines for instrumental variables/generalized method of moments estimation and testing." Stata Journal 7(4):465-506.

Nichols, Austin. 2007. "Causal inference with observational data." Stata Journal 7(4):507-541.

For the glm-style approach, see:

Hardin, James W., Henrik Schmiediche, and Raymond J. Carroll. 2003. "Instrumental variables, bootstrapping, and generalized linear models." The Stata Journal 3(4): 351-360. See also http://www.stata.com/merror/.

The example using earnings as the outcome:

Card, David E. 1995. "Using Geographic Variation in College Proximity to Estimate the Return to Schooling" in Aspects of Labour Economics: Essays in Honour of John Vanderkamp, edited by Louis Christofides, E. Kenneth Grant and Robert Swindinsky. University of Toronto Press. See also NBER WP 4483.


The central code was promulgated by Bill Gould at a Seminar in DC on November 2, 2007. That Mata code was written by David Drukker at an earlier date, and reads as follows:

m=((1/rows(Z)):*Z'((y:*exp(-X*b') :- 1)))' crit=(m*W*m')

Type viewsource ivpois.ado to see that code in situ (four lines into the section that begins mata:, where all the Mata code appears).

Thanks to John Mullahy for sharing the data used in Mullahy (1997), and for writing that paper. Thanks to Mark Schaffer for pointing out that using the zero vector as an initial value can result in failure of optimize in some cases, and suggesting a return to using the estimated coefficients from poisson as initial values. Thanks to Henry Schneider for asking for an exposure option. Thanks to John Zedlewski for asking for a speed improvment using the d1 evaluatortype of optimize and asymptotic standard errors.

Citation of ivpois

ivpois is not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such:

Nichols, Austin. 2007. ivpois: Stata module for IV/GMM Poisson regression. http://ideas.repec.org/c/boc/bocode/s456890.html


Austin Nichols Urban Institute Washington, DC, USA austinnichols@gmail.com

Also see

Manual: [U] 23 Estimation and post-estimation commands [R] bootstrap [R] poisson [R] regress [R] ivregress

On-line: help for (if installed) ivreg2, overid, ivendog, ivhettest, ivreset, xtivreg2, xtoverid, ranktest, condivreg; qvf.