```.-
help for ^sparl^
.-

Scatter plot and regression line
--------------------------------

^sparl^ yvar xvar [weight] [^if^ exp] [^in^ range]
[ ^, logy logx pow^er ^q^uad ^pgen(^prevar^) pv^alue ^corr^
^yn^ame^(^string^) xn^ame^(^string^) afmt(^format^) bfmt(^format^)^
^cfmt(^format^) pfmt(^format^) rfmt(^format^) ln ci level(^#^) rvl^
^means r^ound^(^#^)^ graph_options ]

Description
-----------

^sparl^ produces a scatter plot and regression line for yvar predicted
from xvar. The data are yvar and xvar and the regression equation is by
default ypred = a + b xvar.

The options ^logx^, ^logy^, ^power^ and ^quad^ allow the use of
logarithmic transforms and the fitting of quadratics.

The scatter plot is basically ^graph^ yvar ypred ypred xvar. The extra
ypred is redundant for many purposes, but makes it easier to get a
scatter plot that emphasises the split into linear prediction and
vertical residual, for example by specifying the options ^c(||l) sy(iii)^.
^rvl^ is a quick synonym for these particular choices.

Internally, ^sparl^ uses ^regress^, so it may be followed immediately by
those commands that may follow ^regress^. ^regress^ itself gives a replay
of the detailed regression results.

Options
-------

- options for logarithmic transforms and fitting quadratics
---------------------------------------------------------

^logy^ means that the y variable will be logged before regression, by
itself implying that the model equation is

log y = a + b x.

^logx^ means that the x variable will be logged before regression, by
itself implying that the model equation is

y = a + b log x.

^power^ and ^logy logx^ are equivalent, so implying that the model
equation is

log y = a + b log x.

^quad^ means that a quadratic in the x variable is fitted, by itself
implying that the model equation is

2
y = a + bx + cx .

^quad^ may be combined with ^logy^ or ^logx^ or both.

Logarithms are natural logarithms, to base e = 2.71828 to 5 d.p.

If either ^logy^ or ^logx^ is used, then the ^ylog^ and ^xlog^ options
of ^graph^ may be used to linearise the regression line. This has no
effect on numerical results which refer to transformed values.

- options for predicted values
----------------------------

^pgen(^prevar^)^ places predicted (fitted) values in a new variable
prevar. This variable is produced by ^predict^, which respects any
restrictions imposed by ^if^ and ^in^. If ^logy^ has been used, the
predictions are exponentiated so that they are on the original scale
of measurement.

- options for P-value
-------------------

^pvalue^ specifies that the model P-value is printed in the ^t2title^.
This is the probability under the null hypothesis of getting an F
statistic greater than that observed, given model and residual
degrees of freedom.

- options for correlation
-----------------------

^corr^ specifies that the correlation (before any transformation) is
printed in the ^t2title^.

- options controlling equations on the graph
------------------------------------------

^yname( )^ and ^xname( )^ control the names used for yvar and xvar in the
^t1title^. They default to the variable names. Long names can lead
to problems with the ^t1title^, especially if any of ^logy^, ^logx^

^afmt(^format^)^, ^bfmt(^format^)^, ^cfmt(^format^)^, ^pfmt(^format^)^
and ^rfmt(^format^)^ control the formats with which numeric results
are presented in the ^t1title^ and ^t2title^.

^afmt^ controls the format of a and RMSE, which have the units of y.

^bfmt^ controls the format of b, which has the units of y divided by
the units of x.

^cfmt^ controls the format of c, which has the units of y divided by
the square of the units of x.

^pfmt^ controls the format of the model P-value, presented if
^pvalue^ is specified.

^rfmt^ controls the format of the Pearson correlation r and of its
square, the coefficient of determination.

The default value of all is ^%4.3f^. For very small or very large
numbers, consider using an e format, such as ^%10.3e^.

^ln^ means that equations including logarithms are written using the
abbreviation ^ln^, rather than ^log^.

- other options controlling the graph
-----------------------------------

^ci^ specifies that confidence intervals are to be added. These are
confidence intervals for the mean based on the standard error of
prediction. The confidence level is ^\$S_level^, which may be overridden
by use of the ^level^ option. If ^logy^ has been used, the limits are
exponentiated so that they are on the original scale of measurement.

^level(^#^)^ specifies the confidence level, in percent, for confidence
intervals; see help @level@.

^rvl^ specifies that residuals are to be shown as vertical lines. More
precisely, it is a synonym for ^c(||sss) sy(iiiii)^. Simultaneous calls
to ^connect^ and ^symbol^ are not treated as errors but are ignored.

^means^ specifies that the mean of yvar is calculated for each rounded
value of xvar. This mean is then plotted for each value of xvar.
This option does not affect the regression, merely the graphical
display.

^round(^#^)^ means that xvar is to be rounded to the nearest # before
calculating a group mean for values that round to the same value.
The default is 1. ^round(^#^)^ without ^means^ is not an error, but
is ignored.

graph_options are options allowed with ^graph, twoway^. The default
values include

^xla yla c(..l) sy(Oii) sort gap(6)^

^t1title^ gives the regression equation

^t2title^ gives   (if ^corr^ option specified) the correlation
(before any transformation)

the coefficient of determination and
the root mean square error (both after any
transformation)

and the number of observations

Examples
--------

. ^sparl length width^
. ^sparl length width, rvl^
. ^sparl length width, power^
. ^sparl length width, sy([name]ii)^
. ^sparl length width, yn(Length (m)) xn(Width (m))^
. ^sparl length width, yn(Length) xn("Width    (units m)")^

Author
------

Nicholas J. Cox, University of Durham, U.K.
n.j.cox@@durham.ac.uk

Also see
--------

Manual: ^[U] 19.5.1 Numeric formats^
^[R] regress^
On-line: help for @graph@, @regress@, @predict@, @estimates@, @format@

```