Title
mm_ecdf() -- Cumulative distribution function
Syntax
real matrix mm_ecdf(X [, w, mid])
real matrix mm_relrank(X, w, Q [, mid])
real matrix mm_ranks(X [, w, ties, mid, norm])
where
X: real matrix containing data (rows are observations, columns variables)
w: real colvector containing weights
mid: real scalar indicating that midpoints be used
Q: real matrix containing evaluation points
ties: real scalar determining the treatment of ties; ties==0: randomly split ties (default); ties==1: use highest rank in case of ties; ties==2: use mean rank in case of ties; ties==3: use lowest rank in case of ties; ties==4: order ties by w
norm: real scalar indicating that the ranks be normalized
Description
mm_ecdf() returns the empirical cumulative distribution function (e.c.d.f.) of each column of X. Observations with equal values receive the same cumulative value.
mm_relrank() evaluates the e.c.d.f of X at the values provided by Q. That is, mm_relrank() returns the relative ranks of Q in the distribution of X. Note that mm_relrank() works column by column. If Q has one column and X has several columns, then the relative ranks of Q are computed in each column of X. If X has one column and Q has several columns, then the e.c.d.f of X is evaluated in each column of Q. If X and Q both have several columns, then the number of columns is required to be the same and relative ranks are computed column by column.
Note that
mm_relrank(x, w, x) = mm_ecdf(x, w)
if x is a column vector. Naturally, mm_ecdf() is faster.
mm_ranks() returns for each column of X the ranks of the values in X, where the smallest values are ranked highest (i.e. rank 1 is returned for the smallest value, rank 2 for the second smallest, etc.). Seen differently, mm_ranks() returns the absolute cumulative frequency distribution of each column of X or, if norm!=0 is specified, the relative cumulative distribution.
w specifies weights associated with the observations (rows) in X. Omit w, or specify w as 1 to obtain unweighted results. Using w!=1 in mm_ranks() does not seem to make much sense if the result is to be interpreted as ranks. It is useful, however, to compute the absolute frequency distribution function from weighted data.
mid!=0 specifies to use midpoints in the e.c.d.f. That is, at each step in the distribution, the value of the midpoint of the step is returned. mid!=0 in mm_relrank() only affects the results for Q-values that have an exact match in X.
Remarks
The formula for the empirical cumulative distribution function implemented in mm_ecdf() and mm_relrank() is:
{ 0 if x < x(1) F(x) = { W(i)/W if x(i) <= x < x(i+1), i=1,...,n-1 { 1 if x >= x(n)
where x(1), x(2), ..., x(n) are the ordered observations, W(i) is the running sum of weights, and W is the overall sum of weights.
The default for mm_ranks() is to return ranks in random order where x is tied. Alternatively, specify ties==1 to assign the highest occurring rank to tied observations, ties==2 to assign mean ranks, or ties==3 to assign the lowest rank. Example:
: x = (1,2,2,3)' : x, mm_ranks(x,1,0), mm_ranks(x,1,1), mm_ranks(x,1,2), > mm_ranks(x,1,3) 1 2 3 4 5 +-------------------------------+ 1 | 1 1 1 1 1 | 2 | 2 2 2 2.5 3 | 3 | 2 3 2 2.5 3 | 4 | 3 4 4 4 4 | +-------------------------------+
Furthermore, ties==4 ranks tied observations in order of w (observations with smallest weights are ranked highest). Where w is constant, ties==4 is equivalent to ties==0.
Note that mm_ecdf() is closely related to mm_ranks(). In fact:
mm_ecdf(x, w) = mm_ranks(x, w, 3, 0, 1)
Conformability
mm_ecdf(X, w, mid): X: n x k w: n x 1 or 1 x 1 mid: 1 x 1 result: n x k
mm_relrank(X, w, Q, mid): X: n x 1 or n x k w: n x 1 or 1 x 1 Q: r x 1 or r x k mid: 1 x 1 result: r x 1 or r x k
mm_ranks(X, w, ties, mid, norm): X: n x k w: n x 1 or 1 x 1 ties: 1 x 1 mid: 1 x 1 norm: 1 x 1 result: n x k
Diagnostics
The functions return missing if w contains missing values. Missing values in X or Q are ranked lowest.
Source code
mm_ecdf.mata, mm_relrank.mata, mm_ranks.mata
Author
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
Also see
Online: help for cumul, relrank (if installed), moremata