{smcl}
{* 10june2004}{...}
{hline}
help for {hi:circular} statistics commands
{hline}
{title:Introduction}
{p 4 4 2}
Circular data are a large class of directional data, which are of interest to
scientists in many fields, including biologists (movements of migrating
animals), meteorologists (winds), geologists (directions of joints and faults)
and geomorphologists (landforms, oriented stones). Such examples are all
recordable as compass bearings relative to North. Other examples include
phenomena that are periodic in time, including those dependent on time of day
(of interest to biomedical statisticians: hospital visits, times of birth,
etc.) or time of year (of interest to applied economists: unemployment or sales
variations).
{p 4 4 2}
The elementary but also fundamental property of circular data is that the
beginning and end of the scale coincide: for example, 0 degrees = 360 degrees.
An immediate implication is that the arithmetic mean is likely to be a poor
summary: the mean of 1{c 176} and 359{c 176} cannot sensibly be 180{c 176}.
The solution is to use the vector mean direction as circular mean. More
generally, the different outcome space means that many standard methods
designed for variables measured on the line are of little or no use with
circular variables.
{p 4 4 2}
The programs written rest, so far, on the assumption that data are recorded in
degrees. Users working with other scales (e.g. time of day on a 24 hour clock,
day or month of year) could write their own trivial preprocessor and fix
cosmetic details such as graph axis labels. In due course I may implement,
possibly through characteristics modified by some {cmd:circset} command, user
setting of different scales. Stata expects angles to be in radians (pi radians
= 180{c 176}), but I have never seen radians used for reporting data. In Stata,
the factors {cmd:_pi / 180} and {cmd: 180 / _pi} are thus useful for conversion
between angles and radians.
{p 4 4 2}
In addition, the compass or clock convention of measurement clockwise from a
vertical axis (e.g. North) is used throughout for circular graphs, not the
mathematical convention of measuring angles counterclockwise (anticlockwise)
from a horizontal axis.
{p 4 4 2}
The degree symbol may be invoked (e.g. in text for graphs) as
{cmd:"{c -(}c 176{c )-}"}. If that fails, try {cmd:"`=char(176)'"}. To
see such symbols in various Stata windows, you may need to change the font.
{title:Utilities}
{p 4 4 2}
{help circcentre} rotates a set of directions to a new centre: the result is
between -180{c 176} and 180{c 176}.
{p 4 4 2}
{help circdiff} measures difference between circular variables or constants as
the shorter arc around the circle.
{p 4 4 2}
{help fourier} generates pairs of sine and cosine variables sin {it:j}
{it:theta}, cos {it:j} {it:theta} for {it:j} = 1, ..., {it:k}.
{p 4 4 2}
{help egencirc:atan2()} is an arctangent {help egen} function
giving results on the whole circle.
{title:Summary statistics and significance tests}
{p 4 4 2}
{help circsummarize} is a basic workhorse that calculates vector mean and
strength and the circular range and offers, as options, approximate confidence
intervals for the vector mean and Rayleigh and Kuiper tests of uniform
distribution on the circle. (The abbreviation {help circsu} is allowed.)
{p 4 4 2}
{help circrao} carries out a uniformity test suggested by J.S. Rao. One merit
of this test is that it works well for data which are not unimodal.
{p 4 4 2}
If circular data arrive coarsely grouped (e.g. 4 or 8 points of the compass),
then a chi-square test as applied by {help chitest} or {help chitesti}
is a possible alternative test of uniformity. If data are measured more
precisely, then it is arguable that the chi-square test is a poor choice
compared with the alternatives: not only does it require arbitrary
decisions on bin width and origin, it takes no account of the circular
nature of the data.
{p 4 4 2}
{help circmedian} calculates the circular median and mean deviation from the
median.
{p 4 4 2}
{help circovmean} and {help circovstr} show the effects of omitting
individual values on the vector mean and the vector strength.
{p 4 4 2}
{help circtwosample} and {help circwwmardia} offer nonparametric tests for
comparing two or more subsets of directions. {help circtwosample} offers two
test statistics based on empirical distribution functions to test whether two
distributions are identical, namely Watson's U{c 178} and Kuiper's k*.
{help circwwmardia} carries out a homogeneity test due to Wheeler and Watson and to
Mardia given subdivision into two or more groups.
{title:Univariate and bivariate graphics}
{p 4 4 2}
{help circrplot} loosely resembles {help spikeplot}; {help circdplot} loosely
resembles {help dotplot}. {help circvplot} shows the ordered directions added
end to end with the vector mean as resultant. Many users like such
intrinsically circular representations, but note that it may be necessary to
use {help graph_display:graph display}, typically with equal {cmd:xsize()}
and {cmd:ysize()}, to fix the aspect ratio.
{p 4 4 2}
Another approach is to wrap around the scale, showing up to two full cycles on
a linear graph. {help circhistogram} is a wrapper for {help histogram}, adding
a pad of values (default 180{c 176}) to both extremes. (The abbreviation
{help circhist} is allowed.) {help circscatter} is a wrapper for {help scatter}
that adds a pad to both extremes on either or both of {it:x} and {it:y} axes.
(The abbreviation {help circsc} is allowed.)
{p 4 4 2}
{help circqqplot} is a quantile-quantile plot for two circular
variables. It is a wrapper for {help qqplot}. Data are rotated so that
each variable is centred on a specified value.
{p 4 4 2}
Note that a quantile plot of directions can be useful: {help quantile} {c -} or
alternatively, {help qplot} (see {it:Stata Journal} 4(1): 97, 2004) {c -} is already
available for this purpose.
{title:Smoothing, relationships and modelling}
{p 4 4 2}
{help circkdensity} drives a nonparametric density estimation
routine with biweight kernel. Despite the name, it does not call
{help kdensity}.
{p 4 4 2}
For exploratory smoothing, {help circylowess} is for circular response and
non-circular covariate and {help circxlowess} is for non-circular response and
circular covariate. Both are wrappers for {help lowess}. With
{help circylowess}, the recipe is to smooth sine and cosine components and to
recombine using arctangent: smooth of {it:theta} = arctan(smooth of sin
{it:theta}, smooth of cos {it:theta}). With {help circxlowess}, the recipe is
to smooth around the circle by temporarily adding sufficiently large pads at
each end.
{p 4 4 2}
{help circlccorr} and {help circcorr} implement correlation methods for
cases where one or both variables are circular.
{p 4 4 2}
Note that regression of a non-circular response on various terms of a Fourier
series requires nothing extra in Stata beyond {help regress} and other basic
modelling commands (although {help fourier} can help in producing covariates).
It is often extremely useful, and can be extended to include non-circular
covariates.
{title:von Mises distributions}
{p 4 4 2}
{help circvm} fits a von Mises distribution, the most important unimodal
reference distribution on the circle, using an approximate maximum likelihood
method. (Doing it properly with {help ml} is on the agenda.)
{p 4 4 2}
{help circqvm} shows a quantile-quantile plot for data versus a
fitted von Mises distribution. Data are rotated so that the mean is at the
centre of the plot.
{p 4 4 2}
{help circpvm} shows a probability plot (P-P plot) for data versus a
fitted von Mises distribution.
{p 4 4 2}
{help circdpvm} shows a density probability plot for data versus a
fitted von Mises distribution.
{p 4 4 2}
{help egen} functions {help egencirc:invvm()},
{help egencirc:rndvm()}, {help egencirc:vm()} and
{help egencirc:vmden()} and a calculator function {help i0kappa} are supporting
utilities, occasionally used directly.
{title:Author}
{p 4 4 2}Nicholas J. Cox, University of Durham, U.K.{break}
n.j.cox@durham.ac.uk
{title:Acknowledgements}
{p 4 4 2}Ian S. Evans has kindly provided me with information and requests on
circular statistics over more than thirty years.