Symmetric nearest neighbour smoothing
running yvar [xvar] [if exp] [in range] [weight] [, ci double { knn(# | knnvar) | span(#) } logit mean repeat(#) twice genb(bvar) generate(newvar) gense(sevar) replace ciopts(rarea_options) lineopts(line_options) nograph nopts [no]vjitter scatter(scatter_options) plot(plot) addplot(plot) twoway_options ]
Description
running smooths yvar on xvar. By default the smoothed version is a running line: a running mean is also available. A graph is given of yvar together with its smooth plotted against xvar, unless suppressed. If xvar is not provided, yvar is smoothed against the current order of observations.
Only analytic weights (aweights) are allowed; see weights.
Options
Smoothing options
ci produces a pointwise confidence interval for the smoothed values of yvar. The width is determined by the current value of the macro $S_level. Not available with twice, repeat() or logit.
double doubles the value of repeat(). If repeat() is not specified, double is equivalent to repeat(2).
knn(# | knnvar) controls the number k of nearest neighbours used on each side of the smoothed point. You may specify a constant # or a variable knnvar. The value # is stored in r(knn). The greater the value, the greater the smoothing. You are not allowed to specify both span() and knn().
logit transforms the smooth and plots the y axis on a logit scale. Values of 0 and 1 are plotted just above and outside the range of the smoothed curve.
mean specifies running-mean least-squares smoothing; default is running-line.
repeat(#) specifies the number of times the data are to be smoothed. The default # is 1. Increasing # increases the time it takes to calculate the smooth but improves the smooth. repeat(2) corresponds to "smoothing the smooth". The value of # may not exceed 7.
span(#) controls the span or proportion of the data to be used in the symmetric nearest neighbours. If span() is specified and n is the number of observations, the argument of knn() is defined to be (n * span - 1) / 2. span must be in the range (0,2]. (It must be less than 1 when using mean.) Span 2 corresponds to fitting a straight line. The value of # is stored in r(span). You are not allowed to specify both span() and knn().
twice carries out Tukey's 'twicing' procedure whereby residuals from the original fit are smoothed and added back to the fit to obtain the final smooth ("smoothing the rough" or "reroughing" in Tukey's terminology). The result is somewhat rougher than would have been obtained without the application of twicing, but may be a better fit to the data.
Saving results options
genb(bvar) creates bvar containing the local slope estimates. They constitute a local estimate of the derivative of the smoothed values of yvar with respect to xvar. Not available with mean, twice or logit.
generate(newvar) creates newvar containing the smoothed values of yvar. Note that newvar will be on a logit scale if logit is used.
gense(sevar) creates sevar containing the pointwise standard error of smoothed values of yvar. Not available with twice, repeat() or logit.
replace allows variables specified by any of the generate(), genb(), gense() options to be replaced if they already exist.
Graphics options
ciopts() are options of twoway rarea. These should be specified to control the rendering of the confidence interval.
lineopts() are options of twoway line. These should be specified to control the rendering of the smoothed lines.
nograph suppresses the graph.
nopts suppresses the scatter plot of yvar. Only the smoothed line (and if ci is specified, the pointwise CI) is plotted.
novjitter specifies no vertical jittering of 0 and 1 values. The default with logit is that they are jittered vertically only. Note that jitter() may be specified within scatter(), but that this specifies standard Stata jittering which is both vertical and horizontal.
scatter(scatter_options) are options of scatter. These should be specified to control the rendering of the original data points. The default includes ms(oh) (ms(p) with over 299 observations).
plot(plot) provides a way to add other plots to the generated graph; see help plot. (Stata 8 only)
addplot(plot) provides a way to add other plots to the generated graph; see help plot. (Stata 9 up)
twoway_options are other options of twoway.
Remarks
Subsets of 2k + 1 observations are used for calculating smoothed values for each point in the data except for end points, for which smaller uncentred subsets are used. The subsets consist of the closest k points with xvar values less than or equal to that of the given point, the point itself, and the closest k points with xvar values greater than or equal to the given point.
It should be noted that since the neighbourhoods are asymmetric in the tails, the running mean is subject to bias in the tails. Other than in the tails, using mean will produce the same result as using the default smooth whenever the xvar values are evenly spaced.
repeat(3), for instance, first smooths yvar creating yhat1, say; next yhat1 is smoothed creating yhat2, and finally yhat2 is smoothed creating yhat3.
See Royston and Cox (2005) for a multivariable implementation of running.
Examples
. running mpg weight
. running mpg weight, span(0.75) ci
. running mpg weight, knn(5) generate(fit) gense(sfit) replace
. running mpg weight, twice
Authors
Peter Sasieni Queen Mary, University of London peter.sasieni@cancer.org.uk
Patrick Royston MRC Clinical Trials Unit patrick.royston@ctu.mrc.ac.uk
Nicholas J. Cox Durham University n.j.cox@durham.ac.uk
References
Royston, P. and Cox, N.J. 2005. A multivariable scatterplot smoother Stata Journal 5(3): 405-412. http://www.stata-journal.com/sjpdf.html?articlenum=gr0017
Sasieni, P. 1995. Symmetric nearest neighbor linear smoothers. Stata Technical Bulletin 24: 10-14 (STB Reprints Vol. 4, 97-101). http://www.stata.com/products/stb/journals/stb24.pdf
Sasieni, P. and Royston, P. 1998. Pointwise confidence intervals for running. Stata Technical Bulletin 41: 17-23 (STB Reprints Vol. 7, 156-163). http://www.stata.com/products/stb/journals/stb41.pdf
Sasieni, P., Royston, P. and Cox, N.J. 2005. Software update for running. Stata Journal 5(2): 285. http://www.stata-journal.com/sjpdf.html?articlenum=up0011
Also see
Manual: [R] lowess, [R] lpoly, [R] smooth