-------------------------------------------------------------------------------help diptest-------------------------------------------------------------------------------

Dip statistic to test for unimodality

Syntax

diptestvarname[if] [in] [,by(byvarlist)reps(#)list_options]

Description

diptestcalculates and displays the dip statistic to test for unimodality. This statistic is the maximum difference between the empirical distribution function and the unimodal distribution function that minimises that maximum difference. The dip thus measures departure of a sample from unimodality and was proposed by Hartigan and Hartigan (1985) as a test statistic for unimodality. Hartigan (1985) published Fortran code. Mächler (2003) published corrected C code as part of an R package diptest.The reference distribution for calculating the dip statistic is the uniform, as a worst case unimodal distribution. P-values are calculated by comparing the dip statistic obtained with those for repeated samples of the same size from a uniform distribution. If the true distribution is not uniform, other methods may be more appropriate or more powerful. For further discussion, see Hartigan and Hartigan (1985) and Cheng and Hall (1998).

RemarksFor reproducibility of P-values, set seed beforehand.

Hartigan and Hartigan (1985, p.80), and also Hartigan (1985, p.321), give a table of percent points (1, 5, 10, 50, 90, 95, 99, 99.5, 99.9%) for the null distribution of the dip statistic for sample sizes 4(1)10 15 20 30 50 100 200 based on 9999 simulations. The R package diptest includes a larger table based on 1000001 simulations.

For sample sizes 1 to 3 or samples of identical values, the dip is returned as 0.

Note that this procedure is independent of any density estimation procedure.

As a side-effect of the calculation,

diptestreturns low and high ends of the modal interval for the best-fitting unimodal distribution corresponding to the data. The mean of values in that interval is also reported, without warranty of any merits as a summary.

diptestuses Mata for its innermost calculations. Thus Stata 9 up is required.

Options

by()specifies that calculations are to be carried out separately for the distinct groups defined bybyvarlist. The variable(s) inbyvarlistmay be numeric or string.

reps()specifies the number of repetitions of sampling from a uniform distribution of the same size. The default is 10000. Note thatreps(0)suppresses P-value calculation.

list_optionsare options of list other thannoobsandsubvarname. They may be specified to tune the display of results.

Examples

. diptest quality

Saved resultsr(n_1) number of observations for first group r(dip_1) dip for first group r(low_1) low end of modal interval for first group r(high_1) high end of modal interval for second group r(mean_1) mean of modal interval for second group r(P_1) P-value for first group

etc. (suffixes _2 etc. indicating second group etc.)

AuthorNicholas J. Cox, Durham University, UK n.j.cox@durham.ac.uk

AcknowledgmentsMata code is based on Fortran code in Hartigan (1985) and C code in Mächler (2003).

ReferencesCheng, M-Y. and P. Hall. 1998. Calibrating the excess mass and dip tests of modality.

Journal, Royal Statistical Society, Series B60: 579-589.Hartigan, J.A. and P.M. Hartigan. 1985. The dip test of unimodality.

Annals of Statistics13: 70-84.Hartigan, P.M. 1985. Algorithm AS 217: Computation of the dip statistic to test for unimodality.

Applied Statistics34: 320-325.Mächler, M. 2003. diptest 0.25-1. http://www.r-project.org/

Also seehelp for kdensity, modes (if installed), hsmode (if installed)