help for ^sparl^

Scatter plot and regression line --------------------------------

^sparl^ yvar xvar [weight] [^if^ exp] [^in^ range] [ ^, logy logx pow^er ^q^uad ^pgen(^prevar^) pv^alue ^corr^ ^yn^ame^(^string^) xn^ame^(^string^) afmt(^format^) bfmt(^format^)^ ^cfmt(^format^) pfmt(^format^) rfmt(^format^) ln ci level(^#^) rvl^ ^means r^ound^(^#^)^ graph_options ]

Description -----------

^sparl^ produces a scatter plot and regression line for yvar predicted from xvar. The data are yvar and xvar and the regression equation is by default ypred = a + b xvar.

The options ^logx^, ^logy^, ^power^ and ^quad^ allow the use of logarithmic transforms and the fitting of quadratics.

The scatter plot is basically ^graph^ yvar ypred ypred xvar. The extra ypred is redundant for many purposes, but makes it easier to get a scatter plot that emphasises the split into linear prediction and vertical residual, for example by specifying the options ^c(||l) sy(iii)^. ^rvl^ is a quick synonym for these particular choices.

Internally, ^sparl^ uses ^regress^, so it may be followed immediately by those commands that may follow ^regress^. ^regress^ itself gives a replay of the detailed regression results.

Options -------

- options for logarithmic transforms and fitting quadratics ---------------------------------------------------------

^logy^ means that the y variable will be logged before regression, by itself implying that the model equation is

log y = a + b x.

^logx^ means that the x variable will be logged before regression, by itself implying that the model equation is

y = a + b log x.

^power^ and ^logy logx^ are equivalent, so implying that the model equation is

log y = a + b log x.

^quad^ means that a quadratic in the x variable is fitted, by itself implying that the model equation is

2 y = a + bx + cx .

^quad^ may be combined with ^logy^ or ^logx^ or both.

Logarithms are natural logarithms, to base e = 2.71828 to 5 d.p.

If either ^logy^ or ^logx^ is used, then the ^ylog^ and ^xlog^ options of ^graph^ may be used to linearise the regression line. This has no effect on numerical results which refer to transformed values.

- options for predicted values ----------------------------

^pgen(^prevar^)^ places predicted (fitted) values in a new variable prevar. This variable is produced by ^predict^, which respects any restrictions imposed by ^if^ and ^in^. If ^logy^ has been used, the predictions are exponentiated so that they are on the original scale of measurement.

- options for P-value -------------------

^pvalue^ specifies that the model P-value is printed in the ^t2title^. This is the probability under the null hypothesis of getting an F statistic greater than that observed, given model and residual degrees of freedom.

- options for correlation -----------------------

^corr^ specifies that the correlation (before any transformation) is printed in the ^t2title^.

- options controlling equations on the graph ------------------------------------------ ^yname( )^ and ^xname( )^ control the names used for yvar and xvar in the ^t1title^. They default to the variable names. Long names can lead to problems with the ^t1title^, especially if any of ^logy^, ^logx^ or ^quad^ is specified.

^afmt(^format^)^, ^bfmt(^format^)^, ^cfmt(^format^)^, ^pfmt(^format^)^ and ^rfmt(^format^)^ control the formats with which numeric results are presented in the ^t1title^ and ^t2title^.

^afmt^ controls the format of a and RMSE, which have the units of y.

^bfmt^ controls the format of b, which has the units of y divided by the units of x.

^cfmt^ controls the format of c, which has the units of y divided by the square of the units of x.

^pfmt^ controls the format of the model P-value, presented if ^pvalue^ is specified.

^rfmt^ controls the format of the Pearson correlation r and of its square, the coefficient of determination.

The default value of all is ^%4.3f^. For very small or very large numbers, consider using an e format, such as ^%10.3e^.

^ln^ means that equations including logarithms are written using the abbreviation ^ln^, rather than ^log^.

- other options controlling the graph -----------------------------------

^ci^ specifies that confidence intervals are to be added. These are confidence intervals for the mean based on the standard error of prediction. The confidence level is ^$S_level^, which may be overridden by use of the ^level^ option. If ^logy^ has been used, the limits are exponentiated so that they are on the original scale of measurement.

^level(^#^)^ specifies the confidence level, in percent, for confidence intervals; see help @level@.

^rvl^ specifies that residuals are to be shown as vertical lines. More precisely, it is a synonym for ^c(||sss) sy(iiiii)^. Simultaneous calls to ^connect^ and ^symbol^ are not treated as errors but are ignored.

^means^ specifies that the mean of yvar is calculated for each rounded value of xvar. This mean is then plotted for each value of xvar. This option does not affect the regression, merely the graphical display.

^round(^#^)^ means that xvar is to be rounded to the nearest # before calculating a group mean for values that round to the same value. The default is 1. ^round(^#^)^ without ^means^ is not an error, but is ignored.

graph_options are options allowed with ^graph, twoway^. The default values include

^xla yla c(..l) sy(Oii) sort gap(6)^

^t1title^ gives the regression equation

^t2title^ gives (if ^corr^ option specified) the correlation (before any transformation)

the coefficient of determination and the root mean square error (both after any transformation)

and the number of observations

Examples --------

. ^sparl length width^ . ^sparl length width, rvl^ . ^sparl length width, power^ . ^sparl length width, sy([name]ii)^ . ^sparl length width, yn(Length (m)) xn(Width (m))^ . ^sparl length width, yn(Length) xn("Width (units m)")^

Author ------

Nicholas J. Cox, University of Durham, U.K. n.j.cox@@durham.ac.uk

Also see --------

Manual: ^[U] 19.5.1 Numeric formats^ ^[R] regress^ On-line: help for @graph@, @regress@, @predict@, @estimates@, @format@