-------------------------------------------------------------------------------
help for relrank
-------------------------------------------------------------------------------

Generate (quasi-) relative data (grade transformation)

        relrank varname [if exp] [in range] [weight] , generate(newvar)
               reference(refvar [if exp] [in range]) [ cdf(cdfvar) ]


    by ... : may be used with relrank; see help by.

    fweights and aweights are allowed; see help weights.


Description

    relrank generates the so called relative data of varname compared to
    refvar. That is, relrank generates a variable reflecting the relative
    ranks of the values of varname in the distribution of refvar.
    Technically, relrank first computes the empirical cumulative distribution
    function of refvar (see help cumul) and then applies this reference CDF
    to the values of varname. Because the reference distribution function
    used by relrank is just an estimate of the "true" distribution, one might
    prefer to speak of quasi-relative data.

    The distribution of the relative data produced by relrank is called the
    relative distribution and, naturally, also has a CDF and a PDF
    (probability density function). As a matter of fact, the PDF of the
    relative data -- the relative PDF -- can be interpreted as a density
    ratio:  it is equal to the ratio between the PDF of the untransformed
    data and the PDF of the reference data. For an introduction to the
    concept of relative distributions see, e.g., Handcock and Morris (1998,
    1999). Also see the ppplot package by Nicholas J. Cox (available from the
    SSC Archive), which may be used to plot the relative distribution.

    The transformation of varname to relative data -- also called the grade
    transformation -- is used, for example, in the analysis of income or wage
    differentials (see, e.g., Juhn, Murphy and Pierce 1991). Another useful
    tool for such analyses is provided by the invcdf package (also available
    from SSC), which may be used to apply the inverse cumulative distribution
    function (the so called quantile function) to a variable containing
    percentile ranks (invcdf is closely related to pctile). This is useful,
    for example, to compute hypothetical wages for women if their relative
    positions in the male wage distribution would have remained constant over
    time.


Options

    generate(newvar) it not optional. It specifies the name of the new
        variable to be created.

    reference(refvar [if exp] [in range]) is not optional. It specifies the
        variable representing the reference distribution. Use if and in
        within reference() to restrict the sample for refvar (the if and in
        restrictions outside reference() do not apply to refvar). Note that
        refvar and varname may refer to the same variable and that the
        indicated samples for refvar and varname may overlap.

    cdf(cdfvar) may be used to specify a variable representing the empirical
        cumulative distribution function (e.c.d.f.) of refvar. In this case,
        relrank skips the computation of the e.c.d.f. and uses cdfvar
        instead. Note that cdfvar should lie in [0,1] and must be defined for
        all values of refvar in the specified sample.


Examples

    Compute the relative positions of female wages in the distribution of
    male wages:

        . relrank wage if female==1, ref(wage if female==0) g(rank)
        . summarize rank
        

    Hint: The code

        . relrank x1, reference(x2) generate(rank)
        . cumul rank, generate(cum) equal
        . line cum rank, sort connect(J) xscale(range(0 1))
        

    will essentially produce the same plot as

        . ppplot line x1 x2, connect(J)
        

    Hint: The command

        . relrank x1, reference(x1) generate(rank)
        

    computes the empirical cumulative distribution function of x1, that is,
    it produces the same result as

        . cumul x1, generate(rank) equal
        

Methods and Formulas

    The relative ranks of the values of x in the distribution of y are
    determined as follows:

                 { 0        if x < y(i)
        F_y(x) = { W(i)/W   if y(i) <= x < y(i+1), i=1,...,N-1
                 { 1        if x >= y(N)

    where y(1), y(2), ..., y(N) are the ordered values of the reference
    distribution, W(i) is the running sum of weights of y, and W is the total
    sum of weights (if not specified, all weights are 1).


References

    Handcock, Mark S., Martina Morris (1998). Relative Distribution Methods.
        Sociological Methodology 28: 53-97.
    Handcock, Mark S., Martina Morris (1999). Relative Distribution Methods
        in the Social Sciences. New York: Springer.
    Juhn, Chinhui, Kevin M. Murphy, Brooks Pierce (1991). Accounting for the
        Slowdown in Black-White Wage Convergence. Pp. 107-143 in: Workers and
        Their Wages, ed. by Marvin Kosters, Washington, DC: AEI Press.


Author

    Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


Also see