------------------------------------------------------------------------------- help forrd_obs-------------------------------------------------------------------------------

Regression discontinuity (RD) estimator: obsolete version provided for backward> compatibility

Syntax

rd_obs[varlist] [if] [in] [weight] [,options]where

varlisthas the formoutcomevar[treatmentvar]assignmentvar+---------+ ----+ Weights +----------------------------------------------------------

aweights,fweights, andpweights are allowed; see help weights. Under Stata versions 9.2 or before (using locpoly to construct local regression estimates)aweights andpweights will be converted tofweights automatically and the data expanded. If this would exceed system memory limits, error r(901) will be issued; in this case, the user is advised to round weights. In any case, the validity of bootstrapped standard errors will depend on the expanded data correctly representing sampling variability, which may require rounding or replacing weight variables. Under Stata versions 10 or later (using lpoly to construct local regression estimates), all weights will be treated asaweights.+----------------+ ----+ Important Note +---------------------------------------------------

Standard errors are currently only available by bootstrapping the command like so:

bs[,options]:rd_obsvarlist[if] [in] [weight] [,options]+----------------------------+ ----+ Table of Further Contents +---------------------------------------

General description of estimator Examples Detailed syntax Description of options Remarks and saved results References Acknowledgements Citation of

rd_obsAuthor information+-------------+ ----+ Description +------------------------------------------------------

rd_obsimplements a set of regression-discontinuity estimation methods that are thought to have very good internal validity, for estimating the causal effect of some explanatory variable (called the treatment variable) for a particular subpopulation, under some often plausible assumptions. In this sense, it is much like an experimental design, except that levels of the treatment variable are not assigned randomly by the researcher. Instead, there is a jump in the conditional mean of the treatment variable at a known cutoff in another variable, called the assignment variable, which is perfectly observed, and this allows us to estimate the effect of treatmentas ifit were randomly assigned in the neighborhood of the known cutoff.

rd_obsis an alternative to various regression techniques that purport to allow causal inference (e.g. panel methods such as xtreg), instrumental variables (IV) and other IV-type methods (see the ivreg2 help file and references therein), and matching estimators (see the psmatch2 and nnmatch help files and references therein). Therd_obsapproach is closest in spirit to an IV model with one exogenous variable excluded from the regression (excluded instrument), and one endogenous regressor.

rd_obsestimates local linear or "kernel" regression models on both sides of the cutoff. Estimates are sensitive to the choice of bandwidth, so by default several estimates are constructed using different bandwidths.Further discussion of

rd_obsappears in Nichols (2007).+----------+ ----+ Examples +---------------------------------------------------------

In the simplest case, assignment to treatment depends on a variable Z being above a cutoff Z0. Frequently, Z is defined so that Z0=0. In this case, treatment is 1 for Z>=0 and 0 for Z<0, and we estimate local linear regressions on both sides of the cutoff to obtain estimates of the outcome at Z=0. The difference between the two estimates (for the samples where Z>=0 and where Z<0) is the estimated effect of treatment.

For example, having a Democratic representative in the US Congress may be considered a treatment applied to a Congressional district, and the assignment variable Z is the vote share garnered by the Democratic candidate. At Z=50%, the probability of treatment=1 jumps from zero to one. Suppose we are interested in the effect a Democratic representative has on the federal spending within a Congressional district.

rd_obsestimates local linear regressions on both sides of the cutoff like so:ssc inst rd, replace net get rd use votex if i==1 rd lne d, gr mbw(100) rd_obs lne d, gr mbw(100) line(`"xla(-.2 "Repub" 0 .3 "Democ", noticks > )"') rd_obs lne d, gr ddens bs: rd_obs lne d, x(pop-vet)

In a fuzzy RD design, the conditional mean of treatment jumps at the cutoff, and that jump forms the denominator of a Local Wald Estimator. The numerator is the jump in the outcome, and both are reported along with their ratio. Note that any sharp RD design may be estimated using the fuzzy RD syntax, since the denominator in that case is just one:

use votex if i==1 rd_obs lne win d, gr mbw(100) bs: rd_obs lne win d, x(pop-vet) erase votex.dta

+-----------------------------+ ----+ Detailed Syntax and Options +--------------------------------------

There should be two or three variables specified after the

rd_obscommand; if two are specified, a sharp RD design is assumed, where the treatment variable jumps from zero to one at the cutoff. If no variables are specified after therd_obscommand, the estimates table is displayed.

rd_obsoutcomevar[treatmentvar]assignmentvar[if] [in] [weight] [,options]

+-----------------+ ----+ Options summary +--------------------------------------------------

mbw(numlist)specifies a list of multiples for bandwidths, in percentage terms. The default is "100 50 200" (i.e. half and twice the requested bandwidth) and 100 is always included in the list, regardless of whether it is specified.

z0(real)specifies the cutoff Z0 inassignmentvar.

x(varlist)requests estimates of jumps in control variablesvarlist.

ddensrequests a computation of a discontinuity in the density of Z. This is computed in a relatively ad hoc way, and should be redone using McCrary's test described at http://www.econ.berkeley.edu/~jmccrary/DCdensity/.

s(stubname)requests that estimates be saved as new variables beginning withstubname.

graphrequests that graphs for each bandwidth be produced.

noscattersuppresses the scatterplot on those graphs.

scopt(string)supplies an option list to the scatter plot.

lineopt(string)supplies an option list to the overlaid line plots.

n(real)specifies the number of points at which to calculate local linear regressions. The default is to calculate the regressions at 50 points above the cutoff, with equal steps in the grid, and to use equal steps below the cutoff, with the number of points determined by the step size.

bwidth(real)allows specification of a bandwidth for local linear regressions. The default is to choose a bandwidth that gives positive weight to at least 30 observations on each side of the discontinuity when estimating the conditional mean at the cutoff.

kernel(kerneltype)allows specification of a kernel for local linear regressions.

kerneltypeDescription -------------------------------------------------------------------------epanechnikovEpanechnikov kernel functionepan2alternative Epanechnikov kernel functionbiweightbiweight kernel functioncosinecosine trace kernel functiongaussianGaussian kernel functionparzenParzen kernel functionrectanglerectangle kernel functiontriangletriangle kernel function; the default -------------------------------------------------------------------------

+---------------------------+ ----+ Remarks and saved results +----------------------------------------

rd_obsdoes not report standard errors by default, nor does it report all saved estimates. Instead, it reports the Local Wald Estimate for each bandwidth used, and its components where applicable. To get all saved estimates, typerd_obswithout arguments or type ereturn list.To facilitate bootstrapping,

rd_obssaves the following results ine():Scalars

e(N)Number of observations used in estimatione(w)Bandwidth in base model; other bandwidths are reported in e.g. e(w50) for the 50% multiple.Macros

e(cmd)rd_obse(rdversion)Version number ofrd_obse(depvar)Name of dependent variableMatrices

e(b)Coefficient vector of estimated jumps in variables at different percentage bandwidth multiplesFunctions

e(sample)Marks estimation sample

Complete references appear in

Nichols, Austin. 2007. "Causal Inference with Observational Data." Prepublication draft available as http://pped.org/stata/ciwod.pdf

The interested reader is directed also to

Imbens, Guido and Thomas Lemieux. 2007. "Regression Discontinuity Designs: A Guide to Practice." NBER Working Paper 13039.

McCrary, Justin. 2007. "Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test." NBER Technical Working Paper 334.

Shadish, William R., Thomas D. Cook, and Donald T. Campbell. 2002.

Experimental and Quasi-Experimental Designs for Generalized CausalInference. Boston: Houghton Mifflin.

I would like to thank Justin McCrary for helpful discussions. Any errors are my own.

rd_obsis not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such:Nichols, Austin. 2007. rd: Stata module for regression discontinuity estimation. http://ideas.repec.org/c/boc/bocode/s456888.html

AuthorAustin Nichols Urban Institute Washington, DC, USA austinnichols@gmail.com

Also seeManual:

[U] 23Estimationandpost-estimationcommands[R] bootstrap[R] lpolyin Stata 10, else locpoly (findit locpoly to install)[R] ivregressin Stata 10, else[R] ivreg[R] regress[XT] xtregOn-line: help for (if installed) rd (newer version), ivreg2, overid, ivendog, ivhettest, ivreset, xtivreg2, xtoverid, ranktest, condivreg; psmatch2, nnmatch.