Lowess smoothing with multiple predictors
mlowess yvar xvarlist [if] [in] [, combine(combine_options) cycles(#) draw(numlist) generate(stub) nograph log lowess(lowess_options) omit(numlist) predict(newvar) nopts replace scatter(scatter_options) line_options]
Description
mlowess computes lowess smooths of yvar on all predictors in xvarlist simultaneously; that is, each smooth is adjusted for the others. Fitted values may be saved in new variables with names beginning with stub, as specified in the generate() option.
By default, for each xvar in xvarlist adjusted values of yvar and the lowess smooth for xvar are plotted against xvar. See Remarks for more details.
If you have just one predictor, use lowess directly.
Options
combine(combine_options) specifies any of the options allowed by the graph combine command. Useful examples are combine(ycommon) and combine(saving(graphname)).
cycles(#) sets the number of cycles. The default is cycles(3).
draw(numlist) specifies that smooths for a subset of the variables in xvarlist be plotted. The elements of numlist are indexes determined by the order of the variables in xvarlist. For example, mlowess y x1 x2 x3, draw(2 3) would plot smooths only for x2 and x3. By default results for all variables in varlist are plotted. draw() takes precedence over omit() in the sense that results for variables included (by index) in numlist are plotted, even if they are excluded by omit(). See also omit().
generate(stub) specifies that fitted values for each member of xvarlist be saved in new variables with names beginning with stub.
nograph suppresses the graph.
log displays the squared correlation coefficient between the overall fitted values and yvar at each cycle for monitoring convergence. This option is provided mainly for pedagogic interest.
lowess(lowess_options) control the operation of lowess in generating smooths. Key are
mean specifies running-mean smoothing; the default is running-line least-squares smoothing.
noweight prevents the use of Cleveland's tricube weighting function; the default is to use the weighting function.
bwidth(#) specifies the bandwidth. Centred subsets of bwidth() * n observations are used for calculating smoothed values for each point in the data except for end points, where smaller, uncentred subsets are used. The greater the bwidth(), the greater the smoothing. The default is 0.8.
Note that each choice applies to all predictors. There is no provision for treating predictors differently.
omit(numlist) specifies that smooths for a subset of the variables in xvarlist not be plotted. The elements of numlist are indexes determined by the order of the variables in varlist. For example, mlowess y x1 x2 x3, omit(3) would plot smooths only for x1 and x2. By default results for no variables in varlist are omitted. draw() takes precedence over omit(). See also draw().
predict(newvar) specifies that the predicted values be saved in new variable newvar.
nopts suppresses the points in the plots. Only the lines representing the smooths are drawn.
replace allows variables specified by any of the generate() and predict() options to be replaced if they already exist.
scatter(scatter_options) specifies any of the options allowed by the scatter command. These should be specified to control the rendering of the data points. The default includes msymbol(oh), or msymbol(p) with over 299 observations.
line_options are any of the options allowed with line. These should be specified to control the rendering of the smoothed lines or the overall graph.
Remarks
The approach of mlowess is based on methodology for generalised additive models (Hastie and Tibshirani 1990). mlowess is primarily intended for exploratory graphics, rather than model fitting with inferential apparatus.
An R-square (squared correlation coefficient) is provided as a goodness of fit indicator. However, this R-square can typically be increased simply by just smoothing less, which is often likely to be unhelpful. As the resulting predictions come closer to interpolating the data, R-square will approach 1, but scientific usefulness and the possibility of insight will usually diminish.
Suppose that there are p >= 1 predictors. mlowess estimates the smooths f_1,...,f_p by using a backfitting algorithm and a lowess smoother S[y|x_j] for each predictor, as follows:
1. Initialize: alpha = mean(yvar), f_1,...,f_p estimated by multiple linear regression.
2. Cycle: j = 1,...,p, 1,...,p, ...
f_j = S[y - alpha - sum_{i != j} f_i|x_j]
3. Continue for cycles() rounds.
No convergence criterion is applied. In practice, three cycles are usually more than sufficient to get results adequate for exploratory work.
The smooths are adjusted so that the mean of each equals the mean of yvar.
The points in the plots provided by mlowess depict y - sum_{i != j} f_i|x_j, i.e., the partial residuals plus alpha.
Examples
. mlowess mpg weight displ length
. mlowess mpg weight displ length, lowess(mean)
. mlowess mpg weight displ length, generate(S) nograph
. mlowess mpg weight displ length, omit(2) combine(saving(graph1))
For comparison, bivariate smooths may be compared like this:
. foreach v in weight displ length { . lowess mpg `v', combine(saving(lwss_`v')) . } . graph combine "lwss_weight" "lwss_displ" "lwss_length"
Author
Nicholas J. Cox Durham University n.j.cox@durham.ac.uk
Acknowledgements
The main features of the implementation here depend on the work of Patrick Royston, as reported by Royston and Cox (2005).
References
Hastie, T. and Tibshirani, R. 1990. Generalized additive models. London: Chapman and Hall.
Royston, P. and Cox, N.J. 2005. A multivariable scatterplot smoother. Stata Journal 5(3): 405-412.
Also see
Online: lowess