-------------------------------------------------------------------------------
help for relrank
-------------------------------------------------------------------------------

Generate (quasi-) relative data (grade transformation)

relrank varname [if exp] [in range] [weight] , generate(newvar) reference(refvar [if exp] [in range]) [ cdf(cdfvar) ]

by ... : may be used with relrank; see help by.

fweights and aweights are allowed; see help weights.

Description

relrank generates the so called relative data of varname compared to refvar. That is, relrank generates a variable reflecting the relative ranks of the values of varname in the distribution of refvar. Technically, relrank first computes the empirical cumulative distribution function of refvar (see help cumul) and then applies this reference CDF to the values of varname. Because the reference distribution function used by relrank is just an estimate of the "true" distribution, one might prefer to speak of quasi-relative data.

The distribution of the relative data produced by relrank is called the relative distribution and, naturally, also has a CDF and a PDF (probability density function). As a matter of fact, the PDF of the relative data -- the relative PDF -- can be interpreted as a density ratio: it is equal to the ratio between the PDF of the untransformed data and the PDF of the reference data. For an introduction to the concept of relative distributions see, e.g., Handcock and Morris (1998, 1999). Also see the ppplot package by Nicholas J. Cox (available from the SSC Archive), which may be used to plot the relative distribution.

The transformation of varname to relative data -- also called the grade transformation -- is used, for example, in the analysis of income or wage differentials (see, e.g., Juhn, Murphy and Pierce 1991). Another useful tool for such analyses is provided by the invcdf package (also available from SSC), which may be used to apply the inverse cumulative distribution function (the so called quantile function) to a variable containing percentile ranks (invcdf is closely related to pctile). This is useful, for example, to compute hypothetical wages for women if their relative positions in the male wage distribution would have remained constant over time.

Options

generate(newvar) it not optional. It specifies the name of the new variable to be created.

reference(refvar [if exp] [in range]) is not optional. It specifies the variable representing the reference distribution. Use if and in within reference() to restrict the sample for refvar (the if and in restrictions outside reference() do not apply to refvar). Note that refvar and varname may refer to the same variable and that the indicated samples for refvar and varname may overlap.

cdf(cdfvar) may be used to specify a variable representing the empirical cumulative distribution function (e.c.d.f.) of refvar. In this case, relrank skips the computation of the e.c.d.f. and uses cdfvar instead. Note that cdfvar should lie in [0,1] and must be defined for all values of refvar in the specified sample.

Examples

Compute the relative positions of female wages in the distribution of male wages:

. relrank wage if female==1, ref(wage if female==0) g(rank) . summarize rank

Hint: The code

. relrank x1, reference(x2) generate(rank) . cumul rank, generate(cum) equal . line cum rank, sort connect(J) xscale(range(0 1))

will essentially produce the same plot as

. ppplot line x1 x2, connect(J)

Hint: The command

. relrank x1, reference(x1) generate(rank)

computes the empirical cumulative distribution function of x1, that is, it produces the same result as

. cumul x1, generate(rank) equal

Methods and Formulas

The relative ranks of the values of x in the distribution of y are determined as follows:

{ 0 if x < y(i) F_y(x) = { W(i)/W if y(i) <= x < y(i+1), i=1,...,N-1 { 1 if x >= y(N)

where y(1), y(2), ..., y(N) are the ordered values of the reference distribution, W(i) is the running sum of weights of y, and W is the total sum of weights (if not specified, all weights are 1).

References

Handcock, Mark S., Martina Morris (1998). Relative Distribution Methods. Sociological Methodology 28: 53-97. Handcock, Mark S., Martina Morris (1999). Relative Distribution Methods in the Social Sciences. New York: Springer. Juhn, Chinhui, Kevin M. Murphy, Brooks Pierce (1991). Accounting for the Slowdown in Black-White Wage Convergence. Pp. 107-143 in: Workers and Their Wages, ed. by Marvin Kosters, Washington, DC: AEI Press.

Author

Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch

Also see