Graphing confidence ellipses [An update of ellip for Stata 7]
ellip7 yvar [xvar] [if exp] [in range] [, {means|coefs [pool(#)]} constant(string [#]) level(#) generate(ynewvar xnewvar) add(yoldvar xoldvar) nograph replace evr(#) npoints(#) yformat[(%fmt)] xformat[(%fmt)] graph_options ]
Description
ellip7 graphs confidence ellipses for approximately normally distributed data, and is an update of ellip to Stata 7. A confidence ellipse is the boundary of an elliptical joint 100(1-alpha)% confidence region for two parameters. In ellip7, the centering variables yvar and xvar are two data variables or the first two independent variables after an immediately preceding regress. If coefs is specified without xvar, then the _cons in regress is used for xvar. The boundary constant determines the size of the confidence ellipse.
Options
{means|coefs} specifies how to center the confidence ellipse. The default, and the means option uses two variable means, whereas coefs uses the first two regression coefficients from an immediately preceding regress. If you restricted regress to a portion of the data using if or in, then you will generally want to use the same conditions with coefs.
pool(#) displays a confidence ellipse labeled bp using all the data, a confidence ellipse labeled b using a theoretically unproblematic subset, and # lines connecting #+1 dots of fractionally pooled regression coefficients dots at 1/# intervals. pool() must be used with if or in, and with coefs, generate(), and add(). pool() is incompatible with by().
constant(string [#]) specifies the boundary constant as a statname and an optional #. The overall default, and the means default is the standard deviation ellipse with constant(sd 2) or, squared, constant(sq 4). The standard deviation ellipse is a.k.a. the covariance, concentration, data, error, or inertia ellipse. With the statname sq, the confidence level in percent is (1 - exp^(-#/2)) * 100). It is the ellipse which is the most representative of the data points without any a priori statistical assumptions concerning their origin. The default corresponds to 95% INDIVIDUAL confidence intervals, or 86% JOINT confidence intervals. sd and sq cannot be used with level(). I have NOT implemented the standard deviation curve for geographical data, see Gong (2002). The coefs default is constant(f 2). The default # is 4 for sq, otherwise it is 2. Available statistics are:
statname definition --------------------------------------------------- sd standard deviation = #^2 cannot be used with level() sq squared standard deviation = sd^2 = # cannot be used with level() tsq Hotelling one-sample T-squared hotel same as tsq = #(n-1)/(n-#) * F tsqn sample-adjusted tsq = tsq / n hoteln same as tsqn ptsqn Hotelling T-squared prediction or tolerance ellipse = tsqn * (n+1) / n photeln same as ptsqn chisq Chi-squared chisqn sample-adjusted Chi-squared = chisq / n f F = 2F * (#,n-#) fadj F-adjusted = = 2F * (2,n-#) Defaults f2 and fadj2 are equivalent
level(#) specifies the confidence level, in percent, for calculation of the confidence ellipse; the default # is 95. level() cannot be used with constant(sd) and constant(sq).
generate(ynewvar xnewvar) generates two new variables, ynewvar and xnewvar, which define the confidence ellipse. If the current dataset contains fewer observations than in npoints(), then the length of the dataset will be expanded accordingly with missing values, even if ynewvar and xnewvar are temporary variables, and a warning message is displayed. generate() cannot be used with by().
add(yoldvar xoldvar) adds an old confidence ellipse to the new confidence ellipse. The result is two overlaid confidence ellipses in the same graph. May be used with but does not require generate().
nograph suppresses the display of the graph.
replace replaces any existing variables in generate().
evr(#) specifies the error variance ratio, where # is a floating point number between 0 and 10^36. The default is 1. evr(0) corresponds to regression of x on y, evr(1) to orthogonal regression, and a larger number, say evr(999), corresponds to regression of y on x. See McCartin (n.d.).
npoints(#) specifies # points to be calculated for the confidence ellipse. The default is 400. You seldom have to use this option, but users with Small Stata may want to lower the number and if the output looks jagged try increasing the number.
yformat[(%fmt)] specifies the display format of the y-axis. The default is to use a %9.0g format.
yformat specifies that the y-axis uses the yvar's display format.
yformat(%fmt) specifies the format to be used for the y-axis (see help format).
xformat(%fmt) specifies the display format of the x-axis; see the yformat[(%fmt)] option above.
graph_options are any options allowed with graph, twoway, including by(varname). by() is incompatible with pool(). by() with many groups may exceed the "width" of the dataset because of the stack included in ellip7. Defaults are: c(l) s(.) t1(" ") t2(" ") l1(yvar) b2(xvar) [or c(ll) s(..) if add() is specified, etc.], and l1(Estimated yvar) and b2(Estimated xvar) if coefs is specified.
Remarks
The latest version for Stata 7 is version 1.3.1 of ellip7. The last version for Stata 6 was version 1.2.0 of ellip6. To use the pool(#) option, you must have gphdt.ado and gphsave.ado installed.
ellip7 is a graphics command, but generate() may lengthen the dataset. Only one statistics may be requested in constant(); A simple but limited workaround is to use add().
The by(varname) graph option bug has been fixed in ellip7 but not in ellip6. In ellip7, the graph option now displays an ellipse for each value of varname in by(), as expected. ellip7 also introduce the nograph option, and the sq argument to constant().
Stata 8 became available in January 2003. Stata 8 has a new graphics programming language, and many new graphics features. For example, Stata 8 has a new built-in method for overlaying graphs with a ||-separator and a ()-binding notation. To create overlaid confidence ellipses with Stata 7 or Stata 6, I recommend Nick Cox's muxyplot.ado. That is, generate the ellipse variables with the gen() option, and then use muxyplot yvarlist xvarlist. The complementary command muxyplot with helpfile can be downloaded separately from SSC.
Version 1.3.0 from 20030116 had bugs which have been fixed in 1.3.1. That is, by() with coefs would not report results and, more importantly, would incorrectly report the default (means) by-results; this bug does not apply to ellip6 and ellip5, because they are not byable. pool() would only use two independent variables as part of its calculations even if the immediately preceding regress command used more independent variables; the bug still affects ellip6 (the original version of ellip/ellip5 never had the bug, because it was used after fit rather than after regress).
The author is currently developing the program in Stata 8. Please contact the author if you want to contribute in any way.
Examples
. ellip y x (graph sd ellipse) . ellip y x, g(sdy sdx) (graph and generate sd ellipse) . ellip y x, c(hoteln) a(sdy sdx) (overlaid graph of 95% Hotelling confidence ellipse and previous sd ellipse) . reg dv iv . ellip iv, coefs c(chisq) (graphs a 95% Chi-square confidence ellipse around the regression coeffient for iv and around _cons in the preceding regression)
Author
Anders Alexandersson <aalex@its.msstate.edu> ITS, Mississippi State University Mississippi State, MS 39762 USA References Batschelet, E. 1981. Circular Statistics in Biology. London and New York: Academic Press.
Gong, J. 2002. Clarifying the Standard Deviational Ellipse. Geographical Analysis 34(2): 155-167.
Johnson, R., and D. Wichern. 2002. 5th ed. Applied Multivariate Statistical Analysis. Upper Saddle River, NJ: Prentice Hall.
McCartin, B. n.d. A Geometric Characterization of Linear Regression. Statistics: A Journal of Theoretical and Applied Statistics.
Also see
Manual: [R] graph, [R] gph STB: STB-46 gr32, STB-34 gr20 On-line: help for gphsave, gphdt, muxyplot (if installed)