```-------------------------------------------------------------------------------
help for censlope
(SJ6-4: snp15_7; SJ6-3: snp15_6; SJ5-3: snp15_5; SJ3-3: snp15_4;
STB-61: snp15_3; STB-58: snp15_2; STB-57: snp15)
-------------------------------------------------------------------------------

Robust confidence intervals for median and other percentile slopes

censlope yvarname xvarname [weight] [if] [in], [centile(numlist)
iteration_options]

where yvarname and xvarname are variable names.

fweights, iweights, and pweights are allowed; see weight.  They are
interpreted as for somersd.

bootstrap, by, jackknife, and statsby are allowed; see prefix.

Description

censlope calculates confidence intervals for generalized Theil-Sen median
slopes and other percentile slopes of a Y variable specified by yvarname
with respect to an X variable specified by xvarname.  These confidence
intervals are robust to the possibility that the population distributions
of the Y variable, conditional on different values of the X variable, are
different in ways other than location.  This might happen if, for
example, the conditional distributions had different variances.  For
positive-valued Y variables, censlope can be used to calculate confidence
intervals for median per-unit ratios or other percentile per-unit ratios
associated with a unit increment in the X variable.  If the X variable is
binary with values 0 and 1, then the generalized Theil-Sen percentile
slopes are the generalized Hodges-Lehmann percentile differences between
the group of observations whose X value is 1 and the group of
observations whose X value is 0.  censlope is part of the somersd package
and requires the somersd program to work.  It executes the somersd
command,

somersd xvarname yvarname [weight] [if] [in] [, somersd_options]

and then estimates the percentile slopes.  The estimates and confidence
limits for the percentile slopes are evaluated using an iterative
numerical method, which the user may change from the default by using the
iteration_options.

Options

centile(numlist) specifies a list of percentile slopes to be reported and
defaults to centile(50) (median only) if not specified.  Specifying
centile(25 50 75) will produce the 25th, 50th, and 75th percentile
differences.

eform specifies that exponentiated percentile slopes be given.  This
option is used if yvarname specifies the log of a positive-valued
variable.  In this case, confidence intervals are calculated for
percentile ratios or per-unit ratios between values of the original
positive variable, instead of for percentile differences or per-unit
differences.

ystargenerate(newvarlist) specifies a list of variables to be generated,
corresponding to the percentile slopes, containing the differences
Y*(beta)=Y-X*beta, where beta is the percentile slope.  The variable
names in the newvarlist are matched to the list of percentiles
specified by the centile() option, sorted in ascending order of
percent.  If the two lists have different lengths, then censlope
generates a number nmin of new variables equal to the minimum length
of the two lists, matching the first nmin percentiles with the first
nmin new variable names.  Usually, there is only one percentile slope
(the median slope), and one new ystargenerate() variable, whose
median can be used as the intercept when drawing a straight line
through the data points on a scatterplot.

estaddr specifies that the results saved in r() also be saved in e() (see
Saved results below).  This option makes it easier to use censlope
with parmby, to create an output dataset (or results set) with one
observation per by-group and data on confidence intervals for Somers'
D and median slopes.  parmby is part of the package parmest,

somersd_options are any of the options available with somersd.

iteration_options are any of the options described in censlope_iteration.

Remarks

censlope is part of the somersd package and uses the program somersd,
which calculates confidence intervals for Somers' D and Kendall's tau-a.
Given two random variables Y and X, a 100qth percentile slope of Y with
respect to X is defined as a value of beta satisfying the equation

theta( Y-beta*X , X ) = 1 - 2q

where theta(U,V) represents either D(U|V) (Somers' D) or tau_a(U,V)
(Kendall's tau-a) between the variables U and V.  (For definitions of
Somers' D and Kendall's tau-a, see somersd.) If q=0.5, then the value of
beta is a Theil-Sen median slope.  If in addition X is a binary variable,
with possible values 0 and 1, then the value of beta is a Hodges-Lehmann
median difference between Y values in the subpopulation in which X==1 and
Y values in the subpopulation in which X==0.  An alternative program for
calculating Hodges-Lehmann median (and other percentile) differences is
cendif, which is also distributed as part of the somersd package.

For extreme percentiles and/or very small sample numbers, censlope
sometimes calculates infinite positive upper confidence limits or
infinite negative lower confidence limits.  These are represented by
+/-c(maxdouble), where c(maxdouble) is the c-class value specifying the
largest positive number that can be stored in a double.

censlope can use all the options used by somersd, to use any of the
extended versions of Somers' D or Kendall's tau-a in the definition of
percentile slopes, differences, and ratios.  In particular, we may use
the wstrata() option of somersd to estimate within-stratum median
differences and slopes, based on comparisons between observations between
the same stratum.  This method allows us to estimate median differences
in an outcome variable, associated with an exposure, within strata
defined by grouping a confounder, or by grouping a propensity score for
the exposure based on multiple confounders.  Therefore, rank parameters
(such as median differences) can be adjusted for confounders, just as
regression parameters can be adjusted for confounders.  However,
regression methods are required to define propensity scores.

The program cendif is also part of the somersd package and calculates
confidence intervals for a restricted subset of the parameters estimated
by censlope, assuming a binary X variable and a restricted range of
somersd_options.  cendif does not use an iterative method but instead
calculates all possible differences between Y values in the two groups
defined by the binary X variable.  In large samples, this method is more
time-consuming than the iterative method used by censlope.  However, in
small samples (such as the auto data), cendif can be much faster than
censlope.

Full documentation of the somersd package (including Methods and
Formulas) is provided in the files somersd.pdf, censlope.pdf, and
cendif.pdf, which are distributed with the somersd package as ancillary
files (see net).  They can be viewed using the Adobe Acrobat Reader,

For a comprehensive review of Kendall's tau-a, Somers' D, and median
differences, see Newson (2002).  The definitive reference for the
statistical and computational methods of censlope is Newson (2006).

Examples

. censlope weight length

. censlope weight length, transf(z)

. censlope weight length, transf(z) centile(25(25)75)

. censlope weight foreign

. censlope weight foreign, transf(z)

. censlope weight foreign, transf(z) centile(0(25)100)

The following example estimates percentile weight ratios between non-U.S.
and U.S. cars:

. gene logweight=log(weight)
. censlope logweight foreign, transf(z) centile(0(25)100) eform

The following example uses the wstrata option of somersd to calculate
median differences in fuel efficiency between non-U.S. and U.S. cars in
the same weight quintile.  We find that non-U.S. cars typically travel 2
to 7 more miles per gallon than U.S. cars, but 0 to 4 fewer miles per
gallon than U.S. cars in the same weight quintile:

. xtile weightgp=weight, nquantiles(5)
. tab weightgp foreign
. censlope mpg foreign, transf(z)
. censlope mpg foreign, transf(z) wstrata(weightgp)

The following example creates a scatterplot of car weight in U.S. pounds
against car length in U.S. inches with a straight line through the data
points, whose slope is the median slope and whose intercept is the median
of the variable resi generated by the ystargenerate() option:

. censlope weight length, transf(z) tdist ystargenerate(resi)
. egen intercept=median(resi)
. gene what=weight-resi+intercept
. lab var what "Predicted weight"
. scatter weight length || line what length, sort

The following example uses the estaddr option together with parmby (part
of the parmest package) to produce an output dataset (or results set) in
the memory, with one observation per by-group, and data on confidence
intervals for Somers' D and median slopes.  This dataset is then input to
the eclplot command to produce a confidence interval plot of Somers' D
parameters and a confidence interval plot of median slopes.  The packages

. parmby "censlope weight length, tdist estaddr", by(foreign)
norestore ecol(cimat) rename(ec_1_1 percent ec_1_2 pctlslope
ec_1_3 minimum ec_1_4 maximum)
. list
. eclplot estimate min95 max95 foreign, hori ylabel(0 1)
xtitle("Somers' D (95% CI)")
. eclplot pctlslope minimum maximum foreign, hori ylabel(0 1)
xtitle("Percentile slope (95% CI)")

Saved results

censlope saves the following results in r():

Scalars
r(level)        confidence level
r(fromabs)      value of the fromabs() option
r(tolerance)    value of the tolerance() option

Macros
r(yvar)         name of the Y variable
r(xvar)         name of the X variable
r(eform)        eform if specified
r(centiles)     list of percentages for the percentiles
r(technique)    list of techniques from the technique() option
r(tech_steps)   list of step numbers for the techniques

Matrices
r(cimat)        confidence intervals for percentile differences or ratios
r(rcmat)        return codes for entries of r(cimat)
r(bracketmat)   bracket matrix
r(techstepmat)  column vector of step numbers for the techniques

The matrix r(cimat) has one row per percentile, as well as columns
containing the percentages, percentile estimates, lower and upper
confidence limits (labeled Percent, Pctl_Slope, Minimum, and Maximum if
eform is not specified, or Percent, Pctl_Ratio, Minimum, and Maximum if
eform is specified).  The matrix r(rcmat) has the same numbers of rows
and columns as r(cimat) with the same labels, and the first column
contains the percentages, but the other entries contain return codes for
the estimation of the corresponding entries of r(cimat).  These return
codes are equal to 0 if the beta-value was estimated successfully, 1 if
the corresponding zetastar-value could not be calculated, 2 if the
corresponding zetastar-value could not be bracketed, 3 if the
beta-brackets failed to converge, and 4 if the beta-value could not be
calculated from the converged beta-brackets.  The matrix r(bracketmat) is
the final version of the bracket matrix described in help for the
fromabs() and brackets() options of censlope and has one row per
beta-bracket, as well as two columns, labeled Beta and Zetastar,
containing the beta-brackets and the corresponding zetastar-values.  The
matrix r(techstepmat) is a column vector with one row for each of the
techniques listed in the technique() option, with a row label equal to
the name of the technique and a value equal to the number of steps for
that technique.  The fromabs(), brackets(), tolerance(), and technique()
options are described in iteration_options.

censlope also saves in e() a full set of estimation results for the
somersd command.  If estaddr is specified, this set of estimation results
is expanded by adding a set of e() results with the same names and
contents as the r() results.  This option allows the user to pass a
censlope command to parmby, producing an output dataset (or results set)
with one observation per by-group and data on confidence intervals for
Somers' D and for the median slope.

Author

Roger Newson, Imperial College London, UK.
Email: r.newson@imperial.ac.uk

Reference

Newson, R. 2002. Parameters behind "nonparametric" statistics: Kendall's
tau, Somers' D, and median differences. Stata Journal 2: 45-64.

Newson, R. 2006. Confidence intervals for rank statistics:  Percentile
slopes, differences, and ratios. Stata Journal 6: 497-520.

Also see

Manual: [R] spearman, [R] ranksum, [R] signrank, [R] centile

STB:  STB-52: sg123, STB-55: snp15, STB-57: snp15.1, STB-58: snp15.2,
STB-58: snp16; STB-61: snp15.3; STB-61: snp16.1

Online: ktau, ranksum, signrank
cid, npshift, somersd, cendif, parmest, eclplot (if installed)
```