help mata mm_ecdf()-------------------------------------------------------------------------------

Title

mm_ecdf() -- Cumulative distribution function

Syntax

real matrixmm_ecdf(X[,w,mid])

real matrixmm_relrank(X,w,Q[,mid])

real matrixmm_ranks(X[,w,ties,mid,norm])

where

X:real matrixcontaining data (rows are observations, columns variables)

w:real colvectorcontaining weights

mid:real scalarindicating that midpoints be used

Q:real matrixcontaining evaluation points

ties:real scalardetermining the treatment of ties;ties==0: randomly split ties (default);ties==1: use highest rank in case of ties;ties==2: use mean rank in case of ties;ties==3: use lowest rank in case of ties;ties==4: order ties byw

norm:real scalarindicating that the ranks be normalized

Description

mm_ecdf()returns the empirical cumulative distribution function (e.c.d.f.) of each column ofX. Observations with equal values receive the same cumulative value.

mm_relrank()evaluates the e.c.d.f ofXat the values provided byQ. That is,mm_relrank()returns the relative ranks ofQin the distribution ofX. Note thatmm_relrank()works column by column. IfQhas one column andXhas several columns, then the relative ranks ofQare computed in each column ofX. IfXhas one column andQhas several columns, then the e.c.d.f ofXis evaluated in each column ofQ. IfXandQboth have several columns, then the number of columns is required to be the same and relative ranks are computed column by column.Note that

mm_relrank(x,w,x)=mm_ecdf(x,w)if

xis a column vector. Naturally,mm_ecdf()is faster.

mm_ranks()returns for each column ofXthe ranks of the values inX, where the smallest values are ranked highest (i.e. rank 1 is returned for the smallest value, rank 2 for the second smallest, etc.). Seen differently,mm_ranks()returns the absolute cumulative frequency distribution of each column ofXor, ifnorm!=0 is specified, the relative cumulative distribution.

wspecifies weights associated with the observations (rows) inX. Omitw, or specifywas 1 to obtain unweighted results. Usingw!=1 inmm_ranks()does not seem to make much sense if the result is to be interpreted as ranks. It is useful, however, to compute the absolute frequency distribution function from weighted data.

mid!=0 specifies to use midpoints in the e.c.d.f. That is, at each step in the distribution, the value of the midpoint of the step is returned.mid!=0 inmm_relrank()only affects the results forQ-values that have an exact match inX.

RemarksThe formula for the empirical cumulative distribution function implemented in

mm_ecdf()andmm_relrank()is:{ 0 if x < x(1) F(x) = { W(i)/W if x(i) <= x < x(i+1), i=1,...,n-1 { 1 if x >= x(n)

where x(1), x(2), ..., x(n) are the ordered observations, W(i) is the running sum of weights, and W is the overall sum of weights.

The default for

mm_ranks()is to return ranks in random order wherexis tied. Alternatively, specifyties==1 to assign the highest occurring rank to tied observations,ties==2 to assign mean ranks, orties==3 to assign the lowest rank. Example:: x = (1,2,2,3)' : x, mm_ranks(x,1,0), mm_ranks(x,1,1), mm_ranks(x,1,2), > mm_ranks(x,1,3) 1 2 3 4 5 +-------------------------------+ 1 | 1 1 1 1 1 | 2 | 2 2 2 2.5 3 | 3 | 2 3 2 2.5 3 | 4 | 3 4 4 4 4 | +-------------------------------+

Furthermore,

ties==4 ranks tied observations in order ofw(observations with smallest weights are ranked highest). Wherewis constant,ties==4 is equivalent toties==0.Note that

mm_ecdf()is closely related tomm_ranks(). In fact:

mm_ecdf(x, w)=mm_ranks(x, w, 3, 0, 1)

Conformability

mm_ecdf(X,w,mid):X: nxkw: nx1 or 1x1mid: 1x1result: nxk

mm_relrank(X,w,Q,mid):X: nx1 or nxkw: nx1 or 1x1Q: rx1 or rxkmid: 1x1result: rx1 or rxk

mm_ranks(X,w,ties,mid,norm):X: nxkw: nx1 or 1x1ties: 1x1mid: 1x1norm: 1x1result: nxk

DiagnosticsThe functions return missing if

wcontains missing values. Missing values inXorQare ranked lowest.

Source codemm_ecdf.mata, mm_relrank.mata, mm_ranks.mata

AuthorBen Jann, ETH Zurich, jann@soz.gess.ethz.ch

Also seeOnline: help for

cumul,relrank(if installed),moremata