help mata mm_ecdf()
-------------------------------------------------------------------------------

Title

    mm_ecdf() -- Cumulative distribution function

Syntax

        real matrix mm_ecdf(X [, w, mid])

        real matrix mm_relrank(X, w, Q [, mid])

        real matrix mm_ranks(X [, w, ties, mid, norm])


    where

              X:  real matrix containing data (rows are observations, columns
                  variables)

              w:  real colvector containing weights

            mid:  real scalar indicating that midpoints be used

              Q:  real matrix containing evaluation points

           ties:  real scalar determining the treatment of ties; ties==0:
                  randomly split ties (default); ties==1: use highest rank in
                  case of ties; ties==2: use mean rank in case of ties;
                  ties==3: use lowest rank in case of ties; ties==4: order
                  ties by w

           norm:  real scalar indicating that the ranks be normalized


Description

    mm_ecdf() returns the empirical cumulative distribution function
    (e.c.d.f.) of each column of X. Observations with equal values receive
    the same cumulative value.

    mm_relrank() evaluates the e.c.d.f of X at the values provided by Q. That
    is, mm_relrank() returns the relative ranks of Q in the distribution of
    X. Note that mm_relrank() works column by column. If Q has one column and
    X has several columns, then the relative ranks of Q are computed in each
    column of X. If X has one column and Q has several columns, then the
    e.c.d.f of X is evaluated in each column of Q. If X and Q both have
    several columns, then the number of columns is required to be the same
    and relative ranks are computed column by column.

    Note that

        mm_relrank(x, w, x) = mm_ecdf(x, w)

    if x is a column vector. Naturally, mm_ecdf() is faster.

    mm_ranks() returns for each column of X the ranks of the values in X,
    where the smallest values are ranked highest (i.e. rank 1 is returned for
    the smallest value, rank 2 for the second smallest, etc.). Seen
    differently, mm_ranks() returns the absolute cumulative frequency
    distribution of each column of X or, if norm!=0 is specified, the
    relative cumulative distribution.

    w specifies weights associated with the observations (rows) in X. Omit w,
    or specify w as 1 to obtain unweighted results. Using w!=1 in mm_ranks()
    does not seem to make much sense if the result is to be interpreted as
    ranks. It is useful, however, to compute the absolute frequency
    distribution function from weighted data.

    mid!=0 specifies to use midpoints in the e.c.d.f. That is, at each step
    in the distribution, the value of the midpoint of the step is returned.
    mid!=0 in mm_relrank() only affects the results for Q-values that have an
    exact match in X.


Remarks

    The formula for the empirical cumulative distribution function
    implemented in mm_ecdf() and mm_relrank() is:

               { 0        if x < x(1)
        F(x) = { W(i)/W   if x(i) <= x < x(i+1), i=1,...,n-1
               { 1        if x >= x(n)

    where x(1), x(2), ..., x(n) are the ordered observations, W(i) is the
    running sum of weights, and W is the overall sum of weights.

    The default for mm_ranks() is to return ranks in random order where x is
    tied. Alternatively, specify ties==1 to assign the highest occurring rank
    to tied observations, ties==2 to assign mean ranks, or ties==3 to assign
    the lowest rank. Example:

        : x = (1,2,2,3)'
        
        : x, mm_ranks(x,1,0), mm_ranks(x,1,1), mm_ranks(x,1,2),
        >    mm_ranks(x,1,3)
                 1     2     3     4     5
            +-------------------------------+
          1 |    1     1     1     1     1  |
          2 |    2     2     2   2.5     3  |
          3 |    2     3     2   2.5     3  |
          4 |    3     4     4     4     4  |
            +-------------------------------+

    Furthermore, ties==4 ranks tied observations in order of w (observations
    with smallest weights are ranked highest). Where w is constant, ties==4
    is equivalent to ties==0.

    Note that mm_ecdf() is closely related to mm_ranks(). In fact:

        mm_ecdf(x, w) = mm_ranks(x, w, 3, 0, 1)


Conformability

    mm_ecdf(X, w, mid):
             X:  n x k
             w:  n x 1 or 1 x 1
           mid:  1 x 1
        result:  n x k

    mm_relrank(X, w, Q, mid):
             X:  n x 1 or n x k
             w:  n x 1 or 1 x 1
             Q:  r x 1 or r x k
           mid:  1 x 1
        result:  r x 1 or r x k

    mm_ranks(X, w, ties, mid, norm):
             X:  n x k
             w:  n x 1 or 1 x 1
          ties:  1 x 1
           mid:  1 x 1
          norm:  1 x 1
        result:  n x k


Diagnostics

    The functions return missing if w contains missing values. Missing values
    in X or Q are ranked lowest.


Source code

    mm_ecdf.mata, mm_relrank.mata, mm_ranks.mata


Author

    Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


Also see

    Online:  help for cumul, relrank (if installed), moremata