```-------------------------------------------------------------------------------
help for relrank
-------------------------------------------------------------------------------

Generate (quasi-) relative data (grade transformation)

relrank varname [if exp] [in range] [weight] , generate(newvar)
reference(refvar [if exp] [in range]) [ cdf(cdfvar) ]

by ... : may be used with relrank; see help by.

fweights and aweights are allowed; see help weights.

Description

relrank generates the so called relative data of varname compared to
refvar. That is, relrank generates a variable reflecting the relative
ranks of the values of varname in the distribution of refvar.
Technically, relrank first computes the empirical cumulative distribution
function of refvar (see help cumul) and then applies this reference CDF
to the values of varname. Because the reference distribution function
used by relrank is just an estimate of the "true" distribution, one might
prefer to speak of quasi-relative data.

The distribution of the relative data produced by relrank is called the
relative distribution and, naturally, also has a CDF and a PDF
(probability density function). As a matter of fact, the PDF of the
relative data -- the relative PDF -- can be interpreted as a density
ratio:  it is equal to the ratio between the PDF of the untransformed
data and the PDF of the reference data. For an introduction to the
concept of relative distributions see, e.g., Handcock and Morris (1998,
1999). Also see the ppplot package by Nicholas J. Cox (available from the
SSC Archive), which may be used to plot the relative distribution.

The transformation of varname to relative data -- also called the grade
transformation -- is used, for example, in the analysis of income or wage
differentials (see, e.g., Juhn, Murphy and Pierce 1991). Another useful
tool for such analyses is provided by the invcdf package (also available
from SSC), which may be used to apply the inverse cumulative distribution
function (the so called quantile function) to a variable containing
percentile ranks (invcdf is closely related to pctile). This is useful,
for example, to compute hypothetical wages for women if their relative
positions in the male wage distribution would have remained constant over
time.

Options

generate(newvar) it not optional. It specifies the name of the new
variable to be created.

reference(refvar [if exp] [in range]) is not optional. It specifies the
variable representing the reference distribution. Use if and in
within reference() to restrict the sample for refvar (the if and in
restrictions outside reference() do not apply to refvar). Note that
refvar and varname may refer to the same variable and that the
indicated samples for refvar and varname may overlap.

cdf(cdfvar) may be used to specify a variable representing the empirical
cumulative distribution function (e.c.d.f.) of refvar. In this case,
relrank skips the computation of the e.c.d.f. and uses cdfvar
instead. Note that cdfvar should lie in [0,1] and must be defined for
all values of refvar in the specified sample.

Examples

Compute the relative positions of female wages in the distribution of
male wages:

. relrank wage if female==1, ref(wage if female==0) g(rank)
. summarize rank

Hint: The code

. relrank x1, reference(x2) generate(rank)
. cumul rank, generate(cum) equal
. line cum rank, sort connect(J) xscale(range(0 1))

will essentially produce the same plot as

. ppplot line x1 x2, connect(J)

Hint: The command

. relrank x1, reference(x1) generate(rank)

computes the empirical cumulative distribution function of x1, that is,
it produces the same result as

. cumul x1, generate(rank) equal

Methods and Formulas

The relative ranks of the values of x in the distribution of y are
determined as follows:

{ 0        if x < y(i)
F_y(x) = { W(i)/W   if y(i) <= x < y(i+1), i=1,...,N-1
{ 1        if x >= y(N)

where y(1), y(2), ..., y(N) are the ordered values of the reference
distribution, W(i) is the running sum of weights of y, and W is the total
sum of weights (if not specified, all weights are 1).

References

Handcock, Mark S., Martina Morris (1998). Relative Distribution Methods.
Sociological Methodology 28: 53-97.
Handcock, Mark S., Martina Morris (1999). Relative Distribution Methods
in the Social Sciences. New York: Springer.
Juhn, Chinhui, Kevin M. Murphy, Brooks Pierce (1991). Accounting for the
Slowdown in Black-White Wage Convergence. Pp. 107-143 in: Workers and
Their Wages, ed. by Marvin Kosters, Washington, DC: AEI Press.

Author

Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch

Also see

```