{smcl}
{* *! version 1.0.1  01nov2022}{...}
{title:Title}

{phang}
{bf:cooksd2} {hline 2} Cook's distance after {help regress} or {help xtreg}


{marker syntax}{...}
{title:Syntax}

{p 8 17 2}
{cmdab:cooksd2}
{newvar}
[{cmd:,} {opt cvars:}({indepvars})
{opt parms:}({newvar})
{opt panel:}({varname})
{opt nocons:tant}]


{marker description}{...}
{title:Description}

{pstd}
{cmd:cooksd2} generates Cook's (1977) distance measures after {help regress} or {help xtreg}, which summarize the effect of deleting an observation, or an 
entire subject, on the estimated regression coefficients. The procedure uses efficient updating formulas, based on Christensen et al. (1992) and 
Banerjee and Frees (1997). For further details, see Vincent, D. (2022, September)

{marker options}{...}
{title:Options}

{phang}
{opt cooksd2} {newvar} generates {it: newvar} containing the Cook's distance measures and their percentiles {it: newvar}_pr_F and {it: newvar}_pr_chi2 of 
the F-, and chi-square distributions.

{phang}
{opt cvars:}({indepvars}) restricts Cook's distance to the influence on the J-coefficients of the explanatory variables specified in 
{it: indepvars}, including the constant. The default is to include all K-coefficients in the model, which is appropriate when each are of equal interest.

{phang}
{opt parms:}({newvar}) adds the corresponding jackknifed regression coefficients to the dataset. These take the variable names with prefix {it: newvar}_b_. For
the panel data estimators, the standard deviations of the errors are also added, with the names {it: newvar}_sigma_e, {it: newvar}_sigma_u and
{it: newvar}_sigma_b for the between-effects estimator.

{phang}
{opt panel:}([{varname}]) evaluates the influence of an entire subject. After {cmd: xtreg}, this is the {it: panelvar} specified in {help xtset} 
and {it: varname} cannot be specified. After {cmd: regress}, any grouping of the data can be applied and {it: varname} specifies to which group 
each observation belongs.

{phang}
{opt nocons:tant} excludes the regression constant in the Cook's distance calculation. This is helpful when the influence on 
the constant is unimportant.

	
{marker remarks}{...}
{title:Remarks}

{pstd}
The Cook's distance of a data point or subject, can be compared to the percentiles of an F(J,N-K)-distribution, to determine what confidence region 
for the unknown parameters, is attained by the distance between the full sample and leave-one-out estimates. Suppose, for example, that the Cook's distance Di 
of the i{it:th} data point corresponds to the 50th percentile of the F-distribution. Then, the removal of this observation, moves the estimated 
coefficients to the edge of a 50% confidence region for the unknown parameters based on the full sample estimates. Since the 50th-percentile of 
the F(J,N-K) distribution is approximately 1 when J and N are large, some texts suggest that data points (or subjects) where Di>1 are influential, 
whereas others use 4/N. In general, it is preferrable to look for large relative differences and not just whether the values exceed suggested cut-offs. 

{pstd}
{cmd: cooksd2} computes the Cook’s distance statistics and their percentiles of the F(J,N-K)-distribution. The percentiles that correspond to J*Di, 
are also reported for the chi-square distribution with J-degrees of freedom. This is appropriate when confidence regions are based on the asymptotic 
distributions. 


{marker TechnicalNote}{...}
{title:Technical Note}

{pstd}
When {opt cvars:}({indepvars}) is included, all of the parameters in the model are still updated when each observation (or subject) is removed, but only those
associated with the regressors in {it: indepvars} are used in the calculation of the Cook's distance statistic. 

{pstd}
To obtain an efficient updating formula for the random effects estimator when the variance parameters change (i.e., omitting a row or subject), the transformed data for the i{it:th} 
subject is based on both Ti and the average group size Tbar, rather than Ti only. Thus, when the panels are balanced, Ti=Tbar, and the jackknifed parameters 
will be the same as those generated by {help jackknife}, but when the panels are unbalanced, the coefficients will differ slightly. 


{marker examples}{...}
{title:Examples}

{pstd}Setup{p_end}
{phang2}{cmd:. use  http://www.stata-press.com/data/imeus/traffic,clear}{p_end}
{phang2}{cmd:. xtset state year}{p_end}

{title:Ordinary least-squares regression}

{phang2}{cmd:. reg fatal spircons unrate yngdrv c.spircons#c.yngdrv}{p_end}

{pstd}Influence of each row{p_end}
{phang2}{cmd:. cooksd2 cd_ols}{p_end}

{pstd}Influence of each state{p_end}
{phang2}{cmd:. cooksd2 cd_ols2, panel(state)}{p_end}

{pstd}Influence of each year{p_end}
{phang2}{cmd:. cooksd2 cd_ols3, panel(year)}{p_end}

{title:Fixed-effects regression}

{phang2}{cmd:. xtreg fatal spircons unrate yngdrv c.spircons#c.yngdrv, fe}{p_end}

{pstd}Influence of each row and add jackknifed coefficients{p_end}
{phang2}{cmd:. cooksd2 cd_fe, parms(fe)}{p_end}

{pstd}Influence of each state {p_end}
{phang2}{cmd:. cooksd2 cd_fe2, panel}{p_end}

{title:Random-effects regression}

{phang2}{cmd:. xtreg fatal spircons unrate yngdrv c.spircons#c.yngdrv, re}{p_end}

{pstd}Influence of each row on all coefficients excluding the constant and add jackknifed coefficients{p_end}
{phang2}{cmd:. cooksd2 cd_re,  parms(re) nocons}{p_end}

{pstd}Influence of each row on subsets of the coefficients including the constant{p_end}
{phang2}{cmd:. cooksd2 cd_re2, cvars(spircons unrate)}{p_end}

{title:Between-effects regression}

{phang2}{cmd:. xtreg fatal spircons unrate yngdrv c.spircons#c.yngdrv, be}{p_end}

{pstd}Influence of each row{p_end}
{phang2}{cmd:. cooksd2 cd_be}{p_end}

{pstd}Influence of each state{p_end}
{phang2}{cmd:. cooksd2 cd_be2, panel}{p_end}


{title:References}


{phang} Banerjee, M., & Frees, E. W. (1997). 
Influence diagnostics for linear longitudinal models. Journal of the American Statistical Association, 92(439), 999-1005.{p_end}

{phang} Christensen, R., Pearson, L. M., & Johnson, W. (1992). 
Case-deletion diagnostics for mixed models. Technometrics, 34(1), 38-45.{p_end}

{phang}Cook, R. D. (1977). 
Detection of influential observation in linear regression. Technometrics, 19(1), 15-18.{p_end}

{phang} Vincent, D. (2022, September). 
Cook’s distance measures for panel data models. In London Stata Conference 2022 (No. 03). Stata Users Group.{p_end}


{title:Author}

{phang}This command was written by David Vincent (dvincent@dveconometrics.co.uk).
Comments and suggestions are welcome. {p_end}