{smcl} {* 2 Feb 2009}{...} {hline} {cmd:help diptest} {hline} {title:Dip statistic to test for unimodality} {title:Syntax} {p 8 18 2} {cmd:diptest} {it:varname} {ifin} [ {cmd:,} {cmd:by(}{it:byvarlist}{cmd:)} {cmdab:r:eps(}{it:#}{cmd:)} {it:list_options} ] {title:Description} {p 4 4 2} {cmd:diptest} calculates and displays the dip statistic to test for unimodality. This statistic is the maximum difference between the empirical distribution function and the unimodal distribution function that minimises that maximum difference. The dip thus measures departure of a sample from unimodality and was proposed by Hartigan and Hartigan (1985) as a test statistic for unimodality. Hartigan (1985) published Fortran code. M{c a:}chler (2003) published corrected C code as part of an R package diptest. {p 4 4 2} The reference distribution for calculating the dip statistic is the uniform, as a worst case unimodal distribution. P-values are calculated by comparing the dip statistic obtained with those for repeated samples of the same size from a uniform distribution. If the true distribution is not uniform, other methods may be more appropriate or more powerful. For further discussion, see Hartigan and Hartigan (1985) and Cheng and Hall (1998). {title:Remarks} {p 4 4 2} For reproducibility of P-values, {help generate:set seed} beforehand. {p 4 4 2} Hartigan and Hartigan (1985, p.80), and also Hartigan (1985, p.321), give a table of percent points (1, 5, 10, 50, 90, 95, 99, 99.5, 99.9%) for the null distribution of the dip statistic for sample sizes 4(1)10 15 20 30 50 100 200 based on 9999 simulations. The R package diptest includes a larger table based on 1000001 simulations. {p 4 4 2} For sample sizes 1 to 3 or samples of identical values, the dip is returned as 0. {p 4 4 2} Note that this procedure is independent of any density estimation procedure. {p 4 4 2} As a side-effect of the calculation, {cmd:diptest} returns low and high ends of the modal interval for the best-fitting unimodal distribution corresponding to the data. The mean of values in that interval is also reported, without warranty of any merits as a summary. {p 4 4 2} {cmd:diptest} uses Mata for its innermost calculations. Thus Stata 9 up is required. {title:Options} {p 4 8 2} {cmd:by()} specifies that calculations are to be carried out separately for the distinct groups defined by {it:byvarlist}. The variable(s) in {it:byvarlist} may be numeric or string. {p 4 8 2} {cmd:reps()} specifies the number of repetitions of sampling from a uniform distribution of the same size. The default is 10000. Note that {cmd:reps(0)} suppresses P-value calculation. {p 4 8 2} {it:list_options} are options of {help list} other than {cmd:noobs} and {cmd:subvarname}. They may be specified to tune the display of results. {title:Examples} {p 4 8 2}{cmd:. diptest quality} {title:Saved results} r(n_1) number of observations for first group r(dip_1) dip for first group r(low_1) low end of modal interval for first group r(high_1) high end of modal interval for second group r(mean_1) mean of modal interval for second group r(P_1) P-value for first group etc. (suffixes _2 etc. indicating second group etc.) {title:Author} {p 4 4 2}Nicholas J. Cox, Durham University, UK{break} n.j.cox@durham.ac.uk {title:Acknowledgments} {p 4 8 2}Mata code is based on Fortran code in Hartigan (1985) and C code in M{c a:}chler (2003). {title:References} {p 4 8 2} Cheng, M-Y. and P. Hall. 1998. Calibrating the excess mass and dip tests of modality. {it:Journal, Royal Statistical Society, Series B} 60: 579{c -}589. {p 4 8 2} Hartigan, J.A. and P.M. Hartigan. 1985. The dip test of unimodality. {it:Annals of Statistics} 13: 70{c -}84. {p 4 8 2} Hartigan, P.M. 1985. Algorithm AS 217: Computation of the dip statistic to test for unimodality. {it:Applied Statistics} 34: 320{c -}325. {p 4 8 2} M{c a:}chler, M. 2003. diptest 0.25-1. {browse "http://www.r-project.org/":http://www.r-project.org/} {title:Also see} {p 4 13 2}help for {help kdensity}, {help modes} (if installed), {help hsmode} (if installed)