Title
pspline -- Penalized spline scatterplot smoother based on xtmixed
Syntax
pspline yvar xvar [varlist] [if] [in] [, options ]
options Description ------------------------------------------------------------------------- Main degree(#) degree of spline; default is 1 nknots(#) number of knots; default is min(int(U/4), 35) where U is the number of distinct values of xvar knots(numlist) exact location of knots alpha(#) significance level for pilot goodness-of-fit test force force penalized spline estimation at(varname [if] [in]) obtain the smooth at the values of varname generate(newvar) store smoothed fit in newvar replace overwrite existing variable nopenalty do not apply a roughness penalty; treat spline coefficients as fixed effects discrete treat xvar as a factor variable estopts(options) estimation options as documented in help xtmixed noisily display estimation output nograph suppress graph noscatter suppress scatterplot only noknotpos suppress ticks indicating knot positions
Scatterplot marker_options change look of markers (color, size, etc.) marker_label_options add marker labels; change look or position
Smoothed line lineopts(cline_options) affect rendition of the smoothed line
Add plots addplot(plot) add other plots to the generated graph
Y axis, X axis, Titles, Legend, Overall twoway_options any options other than by() documented in [G] twoway_options -------------------------------------------------------------------------
Description
pspline uses xtmixed to fit a penalized spline regression of yvar on xvar as discussed in Ruppert et al. (2003) and plots the function. The knots of the spline are positioned at equally spaced quantiles of the distinct values of xvar.
pspline is an automatic smoother in that the optimal degree of smoothing is determined from the data by (restricted) maximum likelihood.
Specify varlist to adjust for additional covariates and plot partial residuals.
To circumvent convergence problems in situations where there is only little deviation in the data from a simple parametric model (e.g. a linear model if degree=1, a quadratic model if degree=2), pspline performs a pilot goodness-of-fit (GOF) test for the parametric model. The GOF test is implemented as a Wald test of the spline terms in a non-penalized model (see the nopenalty option). A low p-value indicates that there is a lot of evidence against the parametric model. pspline uses the penalized spline model only if the p-value is smaller than 0.3 (or as set by alpha()) and otherwise sticks with the parametric model. Specify force to skip the test and enforce the penalized spline model.
Options
+------+ ----+ Main +-------------------------------------------------------------
degree(#) specifies the degree of the spline to be used in the smoothing. The default is degree(1) (linear splines), resulting in a piecewise linear smooth. Use degree(2) (quadratic splines) for a continuous smooth (i.e. a smooth with a continuous first derivative). degree(0) results in a step function.
nknots(#) specifies the number of knots of the spline. The default is min(int(U/4), 35) where U is the number of distinct values of xvar. nknots(0) is allowed and causes a parametric model without splines to be fitted. This is equivalent to fitting a polynomial model using regress (i.e. a linear model if degree=1, a quadratic model if degree=2, etc.).
knots(numlist) specifies the exact locations of knots of the spline. The default is to position the knots at equally spaced quantiles of the distinct values of xvar. nknots() is not allowed if knots() is specified.
alpha(#) sets the significance level for the pilot goodness-of-fit test (see description above). The default is alpha(0.3).
force skips the pilot goodness-of-fit test and enforces estimation of the penalized spline model.
at(varname [if] [in]) obtains the smoothed fit at the values of varname. The default is to obtain the fit at the values of xvar. The fit at the values of varname is computed by linear interpolation (or extrapolation) from the fit at the values of xvar.
generate(newvar) stores the smoothed values in newvar.
replace permits pspline to overwrite an existing variable.
nopenalty fits a non-penalized spline smooth. This is accomplished by treating the spline coefficients as fixed instead of random in xtmixed and is equivalent to fitting a spline model using regress.
discrete causes xvar to be treated as a factor variable and fits a model containing a random effect among the levels of xvar instead of a spline. nknots(), knots(), and at() are not allowed if discrete is specified.
estopts(options) specified options to be passed through to xtmixed.
noisily causes output from xtmixed to be displayed.
nograph suppresses drawing the graph of the estimated smooth.
noscatter suppresses graphing a scatterplot of the observed data or partial residuals.
noknotpos suppresses the ticks indicating the knot positions.
+-------------+ ----+ Scatterplot +------------------------------------------------------
marker_options affect the rendition of markers drawn at the plotted points, including their shape, size, color, and outline; see [G] marker_options.
marker_label_options specify if and how the markers are to be labeled; see [G] marker_label_options.
+---------------+ ----+ Smoothed line +----------------------------------------------------
lineopts(cline_options) affects the rendition of the smoothed line; see [G] cline_options.
+-----------+ ----+ Add plots +--------------------------------------------------------
addplot(plot) provides a way to add other plots to the generated graph; see [G] addplot_option.
+-----------------------------------------+ ----+ Y axis, X axis, Titles, Legend, Overall +--------------------------
twoway_options are any of the options documented in [G] twoway_options, excluding by(). These include options for titling the graph (see [G] title_options) and for saving the graph to disk (see [G] saving_option).
Examples
Example using the auto data:
. sysuse auto . pspline price mpg // piecewise linear . pspline price mpg, degree(0) // step function . pspline price mpg, degree(2) // continuous . pspline price mpg weight foreign, degree(2) // covariate adjustment
Graph on titlepage of Ruppert et al. (2003):
. use http://fmwww.bc.edu/repec/bocode/l/lidar.dta . pspline logratio range
The motorcycle data:
. webuse motorcycle . pspline accel time, d(2)
Saved results
pspline returns the results from xtmixed in e() and saves the following in r():
Scalars r(degree) degree of spline r(nknots) number of knots r(alpha) significance level for pilot GOF test r(gof_chi2) chi-squared of pilot GOF test r(gof_df) degrees of freedom of pilot GOF test r(gof_p) p-value of pilot GOF test
Macros r(model) penalized, parametric, or non-penalized r(discrete) discrete or empty
Matrix r(knots) knot positions
References
Ruppert, D., M. P. Wand, and R. J. Carroll (2003). Semiparametric Regression. Cambridge University Press.
Authors
Ben Jann, ETH Zurich, jannb@ethz.ch
Roberto G. Gutierrez, StataCorp., rgutierrez@stata.com
Thanks for citing this software as follows:
Jann, B., and R. Gutierrez. 2008. pspline: Stata module providing a penalized spline scatterplot smoother based on linear mixed model technology. Available from http://ideas.repec.org/c/boc/bocode/s456972.html.
Also see