-------------------------------------------------------------------------------help iquantile-------------------------------------------------------------------------------

Interpolated quantiles

Syntax

iquantilevarlist[if] [in] [weight] [,by(byvarlist)format(format)p(numlist)list_options]

fweights andaweights are allowed.

Description

iquantilecalculates and displays quantiles estimated by linear interpolation in the mid-distribution function. The user may specify one or more numeric variables, one or more grouping variables and one or more quantiles.

RemarksBy quantiles here are meant those summaries defined by the fact that some percent of a batch of values is fewer. Thus the median (50%) and the quartiles (25% and 75%) are examples. Most commands in Stata that calculate such summaries select particular sample values or at most average two sample values. That is often sufficient for the purpose intended.

iquantileoffers an alternative, which is perhaps most useful when the number of distinct values is small. For example, although the variable in question may be measured coarsely, say on an integer scale, and many ties may be observed, it may be hoped or imagined that a property on a continuous scale lies beneath. Note thatiquantileperforms no white magic, just elementary linear interpolation.The cumulative probability is here defined as

SUM counts of values below + (1/2) count of this value ------------------------------------------------------. SUM counts of all values With terminology from Tukey (1977, 496-497), this could be called a `split fraction below'. It is also a `ridit' as defined by Bross (1958): see also Fleiss et al. (2003, 198-205) or Flora (1988). Yet again, it is also the mid-distribution function of Parzen (1993, 3295) and the grade function of Haberman (1996, 240-241). Parzen's term appears best for the purposes of this command. The numerator is a `split count'. Using this numerator, rather than

SUM counts of values below

or

SUM counts of values below + count of this value, treats distributions symmetrically. For applications to plotting ordinal categorical data, see Cox (2004).

The technique used in

iquantileis illustrated by a worked example using Mata calculator-style. We first enter the data as values and frequencies:: y = 2, 3, 4, 5

: f = 2, 9, 8, 8

Then we can work out the cumulative frequencies:

: runningsum(f) 1 2 3 4 +---------------------+ 1 | 2 11 19 27 | +---------------------+

Subtract half the frequencies and get the cumulative proportions, symmetrically considered, i.e. the mid-distribution function:

: runningsum(f) :- f/2 1 2 3 4 +-------------------------+ 1 | 1 6.5 15 23 | +-------------------------+

: (runningsum(f) :- f/2) / 27 1 2 3 4 +---------------------------------------------------------+ 1 | .037037037 .2407407407 .5555555556 .8518518519 | +---------------------------------------------------------+

: cup = (runningsum(f) :- f/2) / 27

To get the median, we need to interpolate between the 2nd and 3rd values of y.

: y[2] + (y[3] - y[2]) * (0.5 - cup[2]) / (cup[3] - cup[2]) 3.823529412

iquantileuses list to show results.

iquantileissues a warning if any quantile was calculated by extrapolation, i.e. it lies in one or other tail of the distribution beyond the observed mid-distribution function. Such results should be treated with extreme caution.If the data consist of a single distinct value, then exactly that value is always returned as a quantile.

iquantileuses Mata for its innermost calculations. Thus Stata 9 up is required.

Options

by()specifies that calculations are to be carried out separately for the distinct groups defined bybyvarlist. The variable(s) inbyvarlistmay be numeric or string.

format()specifies a numeric format to be used to display the quantiles. This option has no lasting effect.

p()specifies a numlist of integers betweem 1 and 99 to indicate thep% quantiles. Ifp()is not specified, it defaults to 50, i.e. the 50% point or median is calculated.p(25(25)75)specifies the median and quartiles.

list_optionsare options of list other thannoobsandsubvarname. They may be specified to tune the display of quantiles.

Examples

. iquantile mpg. iquantile mpg, p(25 50 70). iquantile mpg, p(25 50 70) format(%2.1f). iquantile mpg, p(25 50 70) format(%2.1f) by(rep78). iquantile mpg weight price

Saved resultsSaved results are best explained by example. After

iquantile mpg, two results are saved,r(mpg_50_1)andr(mpg_50_1_epolate). The elements of the name for both are first, the variable name (if necessary, abbreviated to 16 characters); second, the percent defining the quantile; third, the number of the group in question in the observations processed (here, the first of one). The extra flagepolateindicates whether extrapolation was needed (1 for true, 0 for false).

AuthorNicholas J. Cox, Durham University, UK n.j.cox@durham.ac.uk

AcknowledgmentsThis command grew out of a thread on Statalist started by Taggert J. Brooks. See http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statali > st.0901/date/article-689.html

ReferencesBross, I. D. J. 1958. How to use ridit analysis.

Biometrics14: 38-58.Cox, N. J. 2004. Speaking Stata: Graphing categorical and compositional data.

Stata Journal4(2): 190-215. See Section 5. http://www.stata-journal.com/sjpdf.html?articlenum=gr0004Fleiss, J. L., B. Levin, and M. C. Paik. 2003.

Statistical Methods forRates and Proportions. Hoboken, NJ: Wiley.Flora, J. D. 1988. Ridit analysis. In

Encyclopedia of StatisticalSciences, ed. S. Kotz and N. L. Johnson, (8) 136-139. New York: Wiley.Haberman, S. J. 1996.

Advanced Statistics Volume I: Description ofPopulations. New York: Springer.Parzen, E. 1993. Change

PPplot and continuous sample quantile function.Communications in Statistics-Theory and Methods22: 3287-3304.Tukey, J. W. 1977.

Exploratory Data Analysis. Reading, MA: Addison-Wesley.

Also seehelp for summarize, centile, pctile, tabstat, hdquantile (if installed)