-------------------------------------------------------------------------------
help diptest
-------------------------------------------------------------------------------

Dip statistic to test for unimodality

Syntax

diptest varname [if] [in] [ , by(byvarlist) reps(#) list_options ]

Description

diptest calculates and displays the dip statistic to test for unimodality. This statistic is the maximum difference between the empirical distribution function and the unimodal distribution function that minimises that maximum difference. The dip thus measures departure of a sample from unimodality and was proposed by Hartigan and Hartigan (1985) as a test statistic for unimodality. Hartigan (1985) published Fortran code. Mächler (2003) published corrected C code as part of an R package diptest.

The reference distribution for calculating the dip statistic is the uniform, as a worst case unimodal distribution. P-values are calculated by comparing the dip statistic obtained with those for repeated samples of the same size from a uniform distribution. If the true distribution is not uniform, other methods may be more appropriate or more powerful. For further discussion, see Hartigan and Hartigan (1985) and Cheng and Hall (1998).

Remarks

For reproducibility of P-values, set seed beforehand.

Hartigan and Hartigan (1985, p.80), and also Hartigan (1985, p.321), give a table of percent points (1, 5, 10, 50, 90, 95, 99, 99.5, 99.9%) for the null distribution of the dip statistic for sample sizes 4(1)10 15 20 30 50 100 200 based on 9999 simulations. The R package diptest includes a larger table based on 1000001 simulations.

For sample sizes 1 to 3 or samples of identical values, the dip is returned as 0.

Note that this procedure is independent of any density estimation procedure.

As a side-effect of the calculation, diptest returns low and high ends of the modal interval for the best-fitting unimodal distribution corresponding to the data. The mean of values in that interval is also reported, without warranty of any merits as a summary.

diptest uses Mata for its innermost calculations. Thus Stata 9 up is required.

Options

by() specifies that calculations are to be carried out separately for the distinct groups defined by byvarlist. The variable(s) in byvarlist may be numeric or string.

reps() specifies the number of repetitions of sampling from a uniform distribution of the same size. The default is 10000. Note that reps(0) suppresses P-value calculation.

list_options are options of list other than noobs and subvarname. They may be specified to tune the display of results.

Examples

. diptest quality

Saved results

r(n_1) number of observations for first group r(dip_1) dip for first group r(low_1) low end of modal interval for first group r(high_1) high end of modal interval for second group r(mean_1) mean of modal interval for second group r(P_1) P-value for first group

etc. (suffixes _2 etc. indicating second group etc.)

Author

Nicholas J. Cox, Durham University, UK n.j.cox@durham.ac.uk

Acknowledgments

Mata code is based on Fortran code in Hartigan (1985) and C code in Mächler (2003).

References

Cheng, M-Y. and P. Hall. 1998. Calibrating the excess mass and dip tests of modality. Journal, Royal Statistical Society, Series B 60: 579-589.

Hartigan, J.A. and P.M. Hartigan. 1985. The dip test of unimodality. Annals of Statistics 13: 70-84.

Hartigan, P.M. 1985. Algorithm AS 217: Computation of the dip statistic to test for unimodality. Applied Statistics 34: 320-325.

Mächler, M. 2003. diptest 0.25-1. http://www.r-project.org/

Also see

help for kdensity, modes (if installed), hsmode (if installed)