```-------------------------------------------------------------------------------
help diptest
-------------------------------------------------------------------------------

Dip statistic to test for unimodality

Syntax

diptest varname [if] [in] [ , by(byvarlist) reps(#) list_options ]

Description

diptest calculates and displays the dip statistic to test for
unimodality. This statistic is the maximum difference between the
empirical distribution function and the unimodal distribution function
that minimises that maximum difference. The dip thus measures departure
of a sample from unimodality and was proposed by Hartigan and Hartigan
(1985) as a test statistic for unimodality. Hartigan (1985) published
Fortran code. Mðchler (2003) published corrected C code as part of an R
package diptest.

The reference distribution for calculating the dip statistic is the
uniform, as a worst case unimodal distribution. P-values are calculated
by comparing the dip statistic obtained with those for repeated samples
of the same size from a uniform distribution. If the true distribution is
not uniform, other methods may be more appropriate or more powerful.  For
further discussion, see Hartigan and Hartigan (1985) and Cheng and Hall
(1998).

Remarks

For reproducibility of P-values, set seed beforehand.

Hartigan and Hartigan (1985, p.80), and also Hartigan (1985, p.321), give
a table of percent points (1, 5, 10, 50, 90, 95, 99, 99.5, 99.9%) for the
null distribution of the dip statistic for sample sizes 4(1)10 15 20 30
50 100 200 based on 9999 simulations. The R package diptest includes a
larger table based on 1000001 simulations.

For sample sizes 1 to 3 or samples of identical values, the dip is
returned as 0.

Note that this procedure is independent of any density estimation
procedure.

As a side-effect of the calculation, diptest returns low and high ends of
the modal interval for the best-fitting unimodal distribution
corresponding to the data. The mean of values in that interval is also
reported, without warranty of any merits as a summary.

diptest uses Mata for its innermost calculations.  Thus Stata 9 up is
required.

Options

by() specifies that calculations are to be carried out separately for the
distinct groups defined by byvarlist. The variable(s) in byvarlist
may be numeric or string.

reps() specifies the number of repetitions of sampling from a uniform
distribution of the same size. The default is 10000. Note that
reps(0) suppresses P-value calculation.

list_options are options of list other than noobs and subvarname. They
may be specified to tune the display of results.

Examples

. diptest quality

Saved results

r(n_1)       number of observations for first group
r(dip_1)     dip for first group
r(low_1)     low end of modal interval for first group
r(high_1)    high end of modal interval for second group
r(mean_1)    mean of modal interval for second group
r(P_1)       P-value for first group

etc. (suffixes _2 etc. indicating second group etc.)

Author

Nicholas J. Cox, Durham University, UK
n.j.cox@durham.ac.uk

Acknowledgments

Mata code is based on Fortran code in Hartigan (1985) and C code in
Mðchler (2003).

References

Cheng, M-Y. and P. Hall. 1998. Calibrating the excess mass and dip tests
of modality. Journal, Royal Statistical Society, Series B 60:
579-589.

Hartigan, J.A. and P.M. Hartigan. 1985. The dip test of unimodality.
Annals of Statistics 13: 70-84.

Hartigan, P.M. 1985. Algorithm AS 217: Computation of the dip statistic
to test for unimodality. Applied Statistics 34: 320-325.

Mðchler, M. 2003. diptest 0.25-1.  http://www.r-project.org/

Also see

help for kdensity, modes (if installed), hsmode (if installed)

```