{smcl}
{* 17jan2005}{...}
{hline}
help for {hi:relrank}
{hline}

{title:Generate (quasi-) relative data (grade transformation)}

{p 8 15 2}
{cmd:relrank} {it:varname} [{cmd:if} {it:exp}]
 [{cmd:in} {it:range}] [{it:weight}] {cmd:,}
  {cmdab:g:enerate:(}{it:newvar}{cmd:)}
  {bind:{cmdab:r:eference:(}{it:refvar} [{cmd:if} {it:exp}] [{cmd:in} {it:range}]{cmd:)}}
  [ {cmd:cdf(}{it:cdfvar}{cmd:)} ]


{p 4 4 2}
{cmd:by} {it:...} : may be used with {cmd:relrank}; see help {help by}.

{p 4 4 2}
{cmd:fweight}s and {cmd:aweight}s are allowed; see help {help weights}.


{title:Description}

{p 4 4 2} {cmd:relrank} generates the so called {it:relative data} of
{it:varname} compared to {it:refvar}. That is, {cmd:relrank} generates a
variable reflecting the relative ranks of the values of {it:varname} in the
distribution of {it:refvar}. Technically, {cmd:relrank} first computes the
empirical cumulative distribution function of {it:refvar} (see help
{help cumul}) and then applies this reference CDF to the values of
{it:varname}. Because the reference distribution function used
by {cmd:relrank} is just an estimate of the "true" distribution,
one might prefer to speak of {it:quasi-relative} data.

{p 4 4 2} The distribution of the relative data produced by {cmd:relrank}
is called the {it:relative distribution} and, naturally, also has a CDF and
a PDF (probability density function). As a matter of fact, the PDF of the
relative data -- the relative PDF -- can be interpreted as a density ratio:
it is equal to the ratio between the PDF of the untransformed data and the
PDF of the reference data. For an introduction to the concept of relative
distributions see, e.g., Handcock and Morris (1998, 1999). Also see the
{cmd:ppplot} package by Nicholas J. Cox (available from the SSC Archive),
which may be used to plot the relative distribution.

{p 4 4 2} The transformation of {it:varname} to relative data -- also
called the {it:grade transformation} -- is used, for example, in the
analysis of income or wage differentials (see, e.g., Juhn, Murphy and
Pierce 1991). Another useful tool for such analyses is provided by the
{cmd:invcdf} package (also available from SSC), which may be used to apply the
inverse cumulative distribution function (the so called quantile function)
to a variable containing percentile ranks ({cmd:invcdf} is
closely related to {help pctile}). This is useful, for example, to compute
hypothetical wages for women if their relative positions in the male
wage distribution would have remained constant over time.


{title:Options}

{p 4 8 2} {cmd:generate(}{it:newvar}{cmd:)} it not optional. It specifies
the name of the new variable to be created.

{p 4 8 2}
{bind:{cmd:reference(}{it:refvar} [{cmd:if} {it:exp}] [{cmd:in} {it:range}]{cmd:)}}
is not optional. It specifies the variable representing
the reference distribution. Use {cmd:if} and {cmd:in} within
{cmd:reference()} to restrict the sample for {it:refvar} (the {cmd:if} and
{cmd:in} restrictions outside {cmd:reference()} do not apply to
{it:refvar}). Note that {it:refvar} and {it:varname} may refer to the same
variable and that the indicated samples for {it:refvar} and {it:varname}
may overlap.

{p 4 8 2} {cmd:cdf(}{it:cdfvar}{cmd:)} may be used to specify a variable
representing the empirical cumulative distribution function (e.c.d.f.)
of {it:refvar}. In this case, {cmd:relrank} skips the computation of the
e.c.d.f. and uses {it:cdfvar} instead. Note that {it:cdfvar} should lie in
[0,1] and must be defined for all values of {it:refvar} in the specified sample.


{title:Examples}

{p 4 4 2}Compute the relative positions of female wages in the distribution
of male wages:

        {inp}. relrank wage if female==1, ref(wage if female==0) g(rank)
        . summarize rank
        {txt}

{p 4 4 2}Hint: The code

        {inp}. relrank x1, reference(x2) generate(rank)
        . cumul rank, generate(cum) equal
        . line cum rank, sort connect(J) xscale(range(0 1))
        {txt}

{p 4 4 2}will essentially produce the same plot as

        {inp}. ppplot line x1 x2, connect(J)
        {txt}

{p 4 4 2}Hint: The command

        {inp}. relrank x1, reference(x1) generate(rank)
        {txt}

{p 4 4 2}computes the empirical cumulative distribution function of x1, that is,
it produces the same result as

        {inp}. cumul x1, generate(rank) equal
        {txt}

{title:Methods and Formulas}

{p 4 4 2} The relative ranks of the values of
x in the distribution of y are determined as follows:

                 { 0        if x < y(i)
        F_y(x) = { W(i)/W   if y(i) <= x < y(i+1), i=1,...,N-1
                 { 1        if x >= y(N)

{p 4 4 2} where y(1), y(2), ..., y(N) are the ordered values of the
reference distribution, W(i) is the running sum of weights of y, and W is the
total sum of weights (if not specified, all weights are 1).


{title:References}

{p 4 8 2} Handcock, Mark S., Martina Morris (1998). Relative Distribution Methods.
Sociological Methodology 28: 53-97.{p_end}
{p 4 8 2} Handcock, Mark S., Martina Morris (1999). Relative Distribution Methods
in the Social Sciences. New York: Springer.{p_end}
{p 4 8 2} Juhn, Chinhui, Kevin M. Murphy, Brooks Pierce (1991). Accounting for the
Slowdown in Black-White Wage Convergence. Pp. 107-143 in: Workers and Their
Wages, ed. by Marvin Kosters, Washington, DC: AEI Press.{p_end}


{title:Author}

{p 4 4 2}
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


{title:Also see}

{p 4 13 2}
Online:  help for {help cumul}, {help ppplot} (if installed),  {help invcdf}
(if installed)