-------------------------------------------------------------------------------
help for hdquantile
-------------------------------------------------------------------------------

Harrell-Davis estimator of quantiles

hdquantile varlist [if exp] [in range] , generate(newvarlist) [ a(#) by(byvarlist) ]

hdquantile varlist [if exp] [in range] , p(numlist) [ matname(matrix_name) matrix_list_options ]

hdquantile varname [if exp] [in range] , p(numlist) [ by(byvarlist) matname(matrix_name) matrix_list_options ]

Description

hdquantile estimates quantiles using the method of Harrell and Davis (1982). There are two main syntaxes, depending on which of the generate() and p() options is specified.

If the option generate() is specified, as many quantiles are there are non-missing values for all the variables specified are estimated. Given n order statistics y_(i) such that y_(1) <= ... <= y_(n), quantiles are calculated at the plotting positions (i - a)/(n - 2a + 1), where a may be tuned using the a() option. By default a = 0.5. The by() option is permissible with this syntax.

If the option p() is specified, selected quantiles for the percent points in p() are estimated and displayed (and optionally saved) as a matrix. This matrix may be either for one or more variables or for one variable grouped according to the by() option.

Remarks

The quantile for cumulative proportion p is estimated as a weighted mean of all order statistics y_(i) with weights

ibeta((n + 1)p, (n + 1)(1 - p), i/n) - ibeta((n + 1)p, (n + 1)(1 - p), (i - 1)/n)

See ibeta().

Options

Either generate() or p() must be specified.

generate() specifies the names of as many new variables as there are variables in varlist to hold estimates of quantiles.

a() specifies a in the formula for plotting position. The default is a = 0.5, giving (i - 0.5) / n. Other choices include a = 0, giving i / (n + 1), and a = 1/3, giving (i - 1/3) / (n + 1/3). This is relevant only with generate().

p() specifies one or more integers between 1 and 99 indicating percent points (plotting positions) for which quantiles should be estimated. Thus p(25(25)75) specifies estimation for the 25%, 50% and 75% percent points, or for plotting positions 0.25, 0.50, 0.75.

matname() specifies the name of a matrix in which to save the results of calculations. This is relevant only with p().

matrix_list_options are options of matrix list tuning the display of the matrix of quantiles. This is relevant only with p().

by() specifies one or more variables defining distinct groups for which quantiles should be estimated. Under by() the group size n and the ranking from 1 to n are determined within each group.

Examples

. hdquantile length width height, gen(Qlength Qwidth Qheight)

. hdquantile length, by(grade) gen(Qlength)

. hdquantile length, p(10 25 50 75 90)

. hdquantile length, p(10 25 50 75 90) m(Qmatrix)

Author

Nicholas J. Cox, University of Durham, U.K. n.j.cox@durham.ac.uk

References

Harrell, F.E. and C.E. Davis. 1982. A new distribution-free quantile estimator. Biometrika 69: 635-640.

Sheather, S.J. and J.S. Marron. 1990. Kernel quantile estimators. Journal, American Statistical Association 85: 410-416.

Dielman, T.E., C. Lowry and R. Pfaffenberger. 1994. A comparison of quantile estimators. Communications in Statistics - Simulation and Computation 23: 355-371.

Hutson, A.D. and M.D. Ernst. 2000. The exact bootstrap mean and variance of an L-estimator. Journal, Royal Statistical Society B 62: 89-94.

Ernst, M.D. and A.D. Hutson. 2003. Utilizing a quantile function approach to obtain exact bootstrap solutions. Statistical Science 18: 231-240.

Also see

Online: help for qplot (if installed)