Introduction
Circular data are a large class of directional data, which are of interest to scientists in many fields, including biologists (movements of migrating animals), meteorologists (winds), geologists (directions of joints and faults) and geomorphologists (landforms, oriented stones). Such examples are all recordable as compass bearings relative to North. Other examples include phenomena that are periodic in time, including those dependent on time of day (of interest to biomedical statisticians: hospital visits, times of birth, etc.) or time of year (of interest to applied economists: unemployment or sales variations).
The elementary but also fundamental property of circular data is that the beginning and end of the scale coincide: for example, 0 degrees = 360 degrees. An immediate implication is that the arithmetic mean is likely to be a poor summary: the mean of 1° and 359° cannot sensibly be 180°. The solution is to use the vector mean direction as circular mean. More generally, the different outcome space means that many standard methods designed for variables measured on the line are of little or no use with circular variables.
The programs written rest, so far, on the assumption that data are recorded in degrees. Users working with other scales (e.g. time of day on a 24 hour clock, day or month of year) could write their own trivial preprocessor and fix cosmetic details such as graph axis labels. In due course I may implement, possibly through characteristics modified by some circset command, user setting of different scales. Stata expects angles to be in radians (pi radians = 180°), but I have never seen radians used for reporting data. In Stata, the factors _pi / 180 and 180 / _pi are thus useful for conversion between angles and radians.
In addition, the compass or clock convention of measurement clockwise from a vertical axis (e.g. North) is used throughout for circular graphs, not the mathematical convention of measuring angles counterclockwise (anticlockwise) from a horizontal axis.
The degree symbol may be invoked (e.g. in text for graphs) as "{c 176}". If that fails, try "`=char(176)'". To see such symbols in various Stata windows, you may need to change the font.
Utilities
circcentre rotates a set of directions to a new centre: the result is between -180° and 180°.
circdiff measures difference between circular variables or constants as the shorter arc around the circle.
fourier generates pairs of sine and cosine variables sin j theta, cos j theta for j = 1, ..., k.
atan2() is an arctangent egen function giving results on the whole circle.
Summary statistics and significance tests
circsummarize is a basic workhorse that calculates vector mean and strength and the circular range and offers, as options, approximate confidence intervals for the vector mean and Rayleigh and Kuiper tests of uniform distribution on the circle. (The abbreviation circsu is allowed.)
circrao carries out a uniformity test suggested by J.S. Rao. One merit of this test is that it works well for data which are not unimodal.
If circular data arrive coarsely grouped (e.g. 4 or 8 points of the compass), then a chi-square test as applied by chitest or chitesti is a possible alternative test of uniformity. If data are measured more precisely, then it is arguable that the chi-square test is a poor choice compared with the alternatives: not only does it require arbitrary decisions on bin width and origin, it takes no account of the circular nature of the data.
circmedian calculates the circular median and mean deviation from the median.
circovmean and circovstr show the effects of omitting individual values on the vector mean and the vector strength.
circtwosample and circwwmardia offer nonparametric tests for comparing two or more subsets of directions. circtwosample offers two test statistics based on empirical distribution functions to test whether two distributions are identical, namely Watson's UČ and Kuiper's k*. circwwmardia carries out a homogeneity test due to Wheeler and Watson and to Mardia given subdivision into two or more groups.
Univariate and bivariate graphics
circrplot loosely resembles spikeplot; circdplot loosely resembles dotplot. circvplot shows the ordered directions added end to end with the vector mean as resultant. Many users like such intrinsically circular representations, but note that it may be necessary to use graph display, typically with equal xsize() and ysize(), to fix the aspect ratio.
Another approach is to wrap around the scale, showing up to two full cycles on a linear graph. circhistogram is a wrapper for histogram, adding a pad of values (default 180°) to both extremes. (The abbreviation circhist is allowed.) circscatter is a wrapper for scatter that adds a pad to both extremes on either or both of x and y axes. (The abbreviation circsc is allowed.)
circqqplot is a quantile-quantile plot for two circular variables. It is a wrapper for qqplot. Data are rotated so that each variable is centred on a specified value.
Note that a quantile plot of directions can be useful: quantile - or alternatively, qplot (see Stata Journal 4(1): 97, 2004) - is already available for this purpose.
Smoothing, relationships and modelling
circkdensity drives a nonparametric density estimation routine with biweight kernel. Despite the name, it does not call kdensity.
For exploratory smoothing, circylowess is for circular response and non-circular covariate and circxlowess is for non-circular response and circular covariate. Both are wrappers for lowess. With circylowess, the recipe is to smooth sine and cosine components and to recombine using arctangent: smooth of theta = arctan(smooth of sin theta, smooth of cos theta). With circxlowess, the recipe is to smooth around the circle by temporarily adding sufficiently large pads at each end.
circlccorr and circcorr implement correlation methods for cases where one or both variables are circular.
Note that regression of a non-circular response on various terms of a Fourier series requires nothing extra in Stata beyond regress and other basic modelling commands (although fourier can help in producing covariates). It is often extremely useful, and can be extended to include non-circular covariates.
von Mises distributions
circvm fits a von Mises distribution, the most important unimodal reference distribution on the circle, using an approximate maximum likelihood method. (Doing it properly with ml is on the agenda.)
circqvm shows a quantile-quantile plot for data versus a fitted von Mises distribution. Data are rotated so that the mean is at the centre of the plot.
circpvm shows a probability plot (P-P plot) for data versus a fitted von Mises distribution.
circdpvm shows a density probability plot for data versus a fitted von Mises distribution.
egen functions invvm(), rndvm(), vm() and vmden() and a calculator function i0kappa are supporting utilities, occasionally used directly.
Author
Nicholas J. Cox, University of Durham, U.K. n.j.cox@durham.ac.uk
Acknowledgements
Ian S. Evans has kindly provided me with information and requests on circular statistics over more than thirty years.