```-------------------------------------------------------------------------------
help iquantile
-------------------------------------------------------------------------------

Interpolated quantiles

Syntax

iquantile varlist [if] [in] [weight] [ , by(byvarlist) format(format)
p(numlist) list_options ]

fweights and aweights are allowed.

Description

iquantile calculates and displays quantiles estimated by linear
interpolation in the mid-distribution function. The user may specify one
or more numeric variables, one or more grouping variables and one or more
quantiles.

Remarks

By quantiles here are meant those summaries defined by the fact that some
percent of a batch of values is fewer.  Thus the median (50%) and the
quartiles (25% and 75%) are examples. Most commands in Stata that
calculate such summaries select particular sample values or at most
average two sample values. That is often sufficient for the purpose
intended. iquantile offers an alternative, which is perhaps most useful
when the number of distinct values is small. For example, although the
variable in question may be measured coarsely, say on an integer scale,
and many ties may be observed, it may be hoped or imagined that a
property on a continuous scale lies beneath. Note that iquantile performs
no white magic, just elementary linear interpolation.

The cumulative probability is here defined as

SUM counts of values below + (1/2) count of this value
------------------------------------------------------.
SUM counts of all values

With terminology from Tukey (1977, 496-497), this could be called a
`split fraction below'. It is also a `ridit' as defined by Bross (1958):
see also Fleiss et al. (2003, 198-205) or Flora (1988).  Yet again, it is
also the mid-distribution function of Parzen (1993, 3295) and the grade
function of Haberman (1996, 240-241). Parzen's term appears best for the
purposes of this command. The numerator is a `split count'. Using this
numerator, rather than

SUM counts of values below

or

SUM counts of values below + count of this value,

treats distributions symmetrically. For applications to plotting ordinal
categorical data, see Cox (2004).

The technique used in iquantile is illustrated by a worked example using
Mata calculator-style. We first enter the data as values and frequencies:

: y = 2, 3, 4, 5

: f = 2, 9, 8, 8

Then we can work out the cumulative frequencies:

: runningsum(f)
1    2    3    4
+---------------------+
1 |   2   11   19   27  |
+---------------------+

Subtract half the frequencies and get the cumulative proportions,
symmetrically considered, i.e. the mid-distribution function:

: runningsum(f) :- f/2
1     2     3     4
+-------------------------+
1 |    1   6.5    15    23  |
+-------------------------+

: (runningsum(f) :- f/2) / 27
1             2             3             4
+---------------------------------------------------------+
1 |   .037037037   .2407407407   .5555555556   .8518518519  |
+---------------------------------------------------------+

: cup = (runningsum(f) :- f/2) / 27

To get the median, we need to interpolate between the 2nd and 3rd values
of y.

: y + (y - y) * (0.5 - cup) / (cup - cup)
3.823529412

iquantile uses list to show results.

iquantile issues a warning if any quantile was calculated by
extrapolation, i.e. it lies in one or other tail of the distribution
beyond the observed mid-distribution function. Such results should be
treated with extreme caution.

If the data consist of a single distinct value, then exactly that value
is always returned as a quantile.

iquantile uses Mata for its innermost calculations.  Thus Stata 9 up is
required.

Options

by() specifies that calculations are to be carried out separately for the
distinct groups defined by byvarlist. The variable(s) in byvarlist
may be numeric or string.

format() specifies a numeric format to be used to display the quantiles.
This option has no lasting effect.

p() specifies a numlist of integers betweem 1 and 99 to indicate the p%
quantiles. If p() is not specified, it defaults to 50, i.e. the 50%
point or median is calculated.  p(25(25)75) specifies the median and
quartiles.

list_options are options of list other than noobs and subvarname. They
may be specified to tune the display of quantiles.

Examples

. iquantile mpg
. iquantile mpg, p(25 50 70)
. iquantile mpg, p(25 50 70) format(%2.1f)
. iquantile mpg, p(25 50 70) format(%2.1f) by(rep78)
. iquantile mpg weight price

Saved results

Saved results are best explained by example. After iquantile mpg, two
results are saved, r(mpg_50_1) and r(mpg_50_1_epolate).  The elements of
the name for both are first, the variable name (if necessary, abbreviated
to 16 characters); second, the percent defining the quantile; third, the
number of the group in question in the observations processed (here, the
first of one). The extra flag epolate indicates whether extrapolation was
needed (1 for true, 0 for false).

Author

Nicholas J. Cox, Durham University, UK
n.j.cox@durham.ac.uk

Acknowledgments

This command grew out of a thread on Statalist started by Taggert J.
Brooks. See
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statali
> st.0901/date/article-689.html

References

Bross, I. D. J. 1958. How to use ridit analysis. Biometrics 14: 38-58.

Cox, N. J. 2004. Speaking Stata: Graphing categorical and compositional
data. Stata Journal 4(2): 190-215.  See Section 5.
http://www.stata-journal.com/sjpdf.html?articlenum=gr0004

Fleiss, J. L., B. Levin, and M. C. Paik. 2003.  Statistical Methods for
Rates and Proportions.  Hoboken, NJ: Wiley.

Flora, J. D. 1988. Ridit analysis. In Encyclopedia of Statistical
Sciences, ed. S. Kotz and N. L. Johnson, (8) 136-139.  New York:
Wiley.

Haberman, S. J. 1996.  Advanced Statistics Volume I: Description of
Populations.  New York: Springer.

Parzen, E. 1993. Change PP plot and continuous sample quantile function.
Communications in Statistics -Theory and Methods 22: 3287-3304.

Tukey, J. W. 1977. Exploratory Data Analysis.  Reading, MA: