Symmetric nearest neighbour smoothing

runningyvar[xvar] [ifexp] [inrange] [weight] [,cidouble{knn(#|knnvar)|span(#)}logitmeanrepeat(#)twicegenb(bvar)generate(newvar)gense(sevar)replaceciopts(rarea_options)lineopts(line_options)nographnopts[]novjitterscatter(scatter_options)plot(plot)addplot(plot)twoway_options]

Description

runningsmoothsyvaronxvar. By default the smoothed version is a running line: a running mean is also available. A graph is given ofyvartogether with its smooth plotted againstxvar, unless suppressed. Ifxvaris not provided,yvaris smoothed against the current order of observations.Only analytic weights (

aweights) are allowed; see weights.

Options

Smoothing options

ciproduces a pointwise confidence interval for the smoothed values ofyvar. The width is determined by the current value of the macro $S_level. Not available withtwice,repeat()orlogit.

doubledoubles the value ofrepeat(). Ifrepeat()is not specified,doubleis equivalent torepeat(2).

knn(#|knnvar)controls the numberkof nearest neighbours used on each side of the smoothed point. You may specify a constant#or a variableknnvar. The value#is stored inr(knn). The greater the value, the greater the smoothing. You are not allowed to specify bothspan()andknn().

logittransforms the smooth and plots theyaxis on a logit scale. Values of 0 and 1 are plotted just above and outside the range of the smoothed curve.

meanspecifies running-mean least-squares smoothing; default is running-line.

repeat(#)specifies the number of times the data are to be smoothed. The default#is 1. Increasing#increases the time it takes to calculate the smooth but improves the smooth.repeat(2)corresponds to "smoothing the smooth". The value of#may not exceed 7.

span(#)controls the span or proportion of the data to be used in the symmetric nearest neighbours. Ifspan()is specified andnis the number of observations, the argument ofknn()is defined to be (n*span- 1) / 2.spanmust be in the range (0,2]. (It must be less than 1 when usingmean.) Span 2 corresponds to fitting a straight line. The value of#is stored inr(span). You are not allowed to specify bothspan()andknn().

twicecarries out Tukey's 'twicing' procedure whereby residuals from the original fit are smoothed and added back to the fit to obtain the final smooth ("smoothing the rough" or "reroughing" in Tukey's terminology). The result is somewhat rougher than would have been obtained without the application of twicing, but may be a better fit to the data.

Saving results options

genb(bvar)createsbvarcontaining the local slope estimates. They constitute a local estimate of the derivative of the smoothed values ofyvarwith respect toxvar. Not available withmean,twiceorlogit.

generate(newvar)createsnewvarcontaining the smoothed values ofyvar. Note thatnewvarwill be on a logit scale iflogitis used.

gense(sevar)createssevarcontaining the pointwise standard error of smoothed values ofyvar. Not available withtwice,repeat()orlogit.

replaceallows variables specified by any of thegenerate(),genb(),gense()options to be replaced if they already exist.

Graphics options

ciopts()are options of twoway rarea. These should be specified to control the rendering of the confidence interval.

lineopts()are options of twoway line. These should be specified to control the rendering of the smoothed lines.

nographsuppresses the graph.

noptssuppresses the scatter plot ofyvar. Only the smoothed line (and ifciis specified, the pointwise CI) is plotted.

novjitterspecifies no vertical jittering of 0 and 1 values. The default withlogitis that they are jittered vertically only. Note thatjitter()may be specified withinscatter(), but that this specifies standard Stata jittering which is both vertical and horizontal.

scatter(scatter_options)are options of scatter. These should be specified to control the rendering of the original data points. The default includesms(oh)(ms(p)with over 299 observations).

plot(plot)provides a way to add other plots to the generated graph; see help plot. (Stata 8 only)

addplot(plot)provides a way to add other plots to the generated graph; see help plot. (Stata 9 up)

twoway_optionsare other options of twoway.

RemarksSubsets of 2

k+ 1 observations are used for calculating smoothed values for each point in the data except for end points, for which smaller uncentred subsets are used. The subsets consist of the closestkpoints withxvarvalues less than or equal to that of the given point, the point itself, and the closestkpoints withxvarvalues greater than or equal to the given point.It should be noted that since the neighbourhoods are asymmetric in the tails, the running mean is subject to bias in the tails. Other than in the tails, using

meanwill produce the same result as using the default smooth whenever thexvarvalues are evenly spaced.

repeat(3), for instance, first smoothsyvarcreatingyhat1, say; nextyhat1is smoothed creatingyhat2, and finallyyhat2is smoothed creatingyhat3.See Royston and Cox (2005) for a multivariable implementation of

running.

Examples

. running mpg weight

. running mpg weight, span(0.75) ci

. running mpg weight, knn(5) generate(fit) gense(sfit) replace

. running mpg weight, twice

