Restricted cubic spline smoothing
rcspline yvar xvar [if] [in] [weight] [ , mkspline_options stub(stub) regressopts(regress_options) generate(newvar) scatter(scatter_options) ci[(rarea_options)] level(#) showknots mspline_options addplot(plot) ]
fweights are allowed; see weight.
rcspline computes and graphs a restricted cubic spline smooth of yvar given xvar.
rcspline calls mkspline, cubic to create variables containing a restricted cubic spline of xvar. It then calls regress to regress yvar against those new variables, and thus obtains predicted (smoothed) values of yvar given xvar. Finally, it calls graph to plot data and smooth.
R-square (squared correlation coefficient) and RMSE (root mean square error) are provided as goodness of fit indicators. However, these can typically be 'improved' simply by smoothing less, which is often likely to be unhelpful. As the resulting predictions come closer to interpolating the data, R-square will increase, and RMSE will decrease, but scientific usefulness and the possibility of insight will usually diminish.
More generally, the main intended usage of rcspline is for informal exploratory analysis in which relationships are checked for linearity or nonlinearity and appropriate transformations or link functions are considered. More formal uses would require specification of the stub() option to save the variables created. Some consideration might need to be given to the implications of any data snooping.
mkspline_options are options of mkspline, cubic.
nknots() specifies the number of knots that are to be used for a restricted cubic spline. This number must be between 3 and 7 unless the knot locations are specified using knots(). The default number of knots is 5.
knots() specifies the exact location of the knots to be used for a restricted cubic spline. The values of these knots must be given in increasing order. When this option is omitted, the default knot values are based on Harrell's recommended percentiles with the additional restriction that the smallest knot may not be less than the fifth-smallest value of xvar and the largest knot may not be greater than the fifth-largest value of xvar. If both nknots() and knots() are given, they must specify the same number of knots.
stub(stub) specifies that the variables containing the spline be saved in variables with prefix stub. This option is essential if rcspline is to be followed by regress.
regressopts() contains options of regress. It is difficult to know why you would want to specify any.
generate(newvar) specifies that smoothed values be saved in a new variable newvar.
scatter() specifies options allowed by the scatter command. These should be specified to control the rendering of the data points.
mspline_options are any of the options allowed with twoway mspline. These should be specified to control the rendering of the smooth or the overall graph.
showknots specifies that the positions of the knots be shown on the graph by vertical lines.
ci[(rarea_options)] specifies that confidence intervals based on the standard error of the linear prediction be shown. ci may be specified with options of twoway rarea to tune the display of the confidence interval.
level() specifies a confidence level to use for confidence intervals. See help on level.
addplot(plot) provides a way to add other plots to the generated graph. See help on addplot_option.
. rcspline mpg weight
. rcspline mpg weight, scatter(ms(oh))
. rcspline mpg weight, generate(Smpg)
. rcspline mpg weight, ci(color(ltblue)) clw(medthick)
. rcspline mpg weight, addplot(lowess mpg weight)
r(N_knots) number of knots (scalar) r(knots) knot positions (matrix)
Nicholas J. Cox Durham University email@example.com
A question from Josť Maria Pacheco de Souza on Statalist led to the addition of saved r-class results as above.
Online: lowess, lpoly, mvrs (if installed)