Title
mm_quantile() -- Empirical quantile function
Syntax
real matrix mm_quantile(X [, w, P, altdef])
real rowvector mm_median(X [, w])
real rowvector mm_iqrange(X [, w, altdef])
where
X: real matrix containing data (rows are observations, columns variables)
w: real colvector containing weights
P: real matrix containing probabilities (default is P = (0, .25, .5, .75, 1)')
altdef: real scalar causing interpolation formula to be used
Description
mm_quantile() applies to P the inverse empirical cumulative distribution function of X (the so called quantile function). That is, mm_quantile() returns the quantiles X corresponding to the probabilities provided by P. For example,
mm_quantile(x, 1, 0.25)
returns the first quartile (i.e. the 0.25-quantile) of x.
Note that mm_quantile() works column by column. If P has one column and X has several columns, then the quantiles corresponding to P are computed for each column of X. If X has one column and P has several columns, then the quantile function of X is applied to each column of P. If X and P both have several columns, then the number of columns is required to be the same and quantiles are computed column by column.
mm_median() and mm_iqrange() are wrappers for mm_quantile() and return the median (the 0.5-quantile) and the inter-quartile range (IQR = 0.75-quantile - 0.25-quantile).
w specifies weights associated with the observations (rows) in X. Omit w, or specify w as 1 to obtain unweighted results. Note that the arguments in the above functions must not contain missing values.
altdef!=0 in mm_quantile() and mm_iqrange() uses an interpolation formula for calculating the quantiles (see remarks below).
Remarks
Example:
: x = invnormal(uniform(10000,1)) : mm_quantile(x, 1, (0.25 \ 0.5 \ 0.75)) 1 +----------------+ 1 | -.6673752219 | 2 | -.0021958246 | 3 | .6880046299 | +----------------+
: mm_median(x, 1), mm_iqrange(x, 1) 1 2 +-------------------------------+ 1 | -.0021958246 1.355379852 | +-------------------------------+
The default for mm_quantile() is to apply a discontinuous quantile function using averages where the empirical distribution function is flat (this corresponds to Definition 2 in Hyndman and Fan 1996). The same method is used by summarize and is the default method in _pctile. However, if altdef!=0 is specified, a piecewise linear continuous function according to Definition 6 in Hyndman and Fan (1996) is applied. This is also used by centile and by _pctile with the altdef option.
Unlike centile and _pctile, mm_quantile() allows using weights with the interpolation method (altdef!=0). This functionality, however, is only intended to be used with (integer) frequency weights.
Conformability
mm_quantile(X, w, P, altdef): X: n x 1 or n x k w: n x 1 or 1 x 1 P: r x 1 or r x k altdef: 1 x 1 result: r x 1 or r x k
mm_median(X, w): X: n x k w: n x 1 or 1 x 1 result: 1 x k
mm_iqrange(X, w, altdef): X: n x k w: n x 1 or 1 x 1 altdef: 1 x 1 result: 1 x k
Diagnostics
mm_quantile(): p < 0 is treated as p = 0, p > 1 as p = 1.
Weights should not be negative.
Results may be misleading if altdef!=0 is used with non-integer weights.
Source code
mm_quantile.mata, mm_median.mata, mm_iqrange.mata
References
Hyndman, R. J., Fan, Y. (1996). Sample Quantiles in Statistical Packages. The American Statistician 50:361-365.
Author
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
Also see
Online: help for mm_ecdf(), summarize, pctile, centile, invcdf (if installed), moremata