------------------------------------------------------------------------------- help forcenslope(SJ6-4: snp15_7; SJ6-3: snp15_6; SJ5-3: snp15_5; SJ3-3: snp15_4; STB-61: snp15_3; STB-58: snp15_2; STB-57: snp15) -------------------------------------------------------------------------------

Robust confidence intervals for median and other percentile slopes

censlopeyvarnamexvarname[weight] [if] [in],[centile(numlist)eformystargenerate(newvarlist)estaddrsomersd_optionsiteration_options]where

yvarnameandxvarnameare variable names.

fweights,iweights, andpweights are allowed; see weight. They are interpreted as forsomersd.

bootstrap,by,jackknife, andstatsbyare allowed; see prefix.

Description

censlopecalculates confidence intervals for generalized Theil-Sen median slopes and other percentile slopes of a Y variable specified byyvarnamewith respect to an X variable specified byxvarname. These confidence intervals are robust to the possibility that the population distributions of the Y variable, conditional on different values of the X variable, are different in ways other than location. This might happen if, for example, the conditional distributions had different variances. For positive-valued Y variables,censlopecan be used to calculate confidence intervals for median per-unit ratios or other percentile per-unit ratios associated with a unit increment in the X variable. If the X variable is binary with values 0 and 1, then the generalized Theil-Sen percentile slopes are the generalized Hodges-Lehmann percentile differences between the group of observations whose X value is 1 and the group of observations whose X value is 0.censlopeis part of thesomersdpackage and requires thesomersdprogram to work. It executes thesomersdcommand,

somersdxvarnameyvarname[weight] [if] [in] [,somersd_options]and then estimates the percentile slopes. The estimates and confidence limits for the percentile slopes are evaluated using an iterative numerical method, which the user may change from the default by using the

iteration_options.

Options

centile(numlist)specifies a list of percentile slopes to be reported and defaults tocentile(50)(median only) if not specified. Specifyingcentile(25 50 75)will produce the 25th, 50th, and 75th percentile differences.

eformspecifies that exponentiated percentile slopes be given. This option is used ifyvarnamespecifies the log of a positive-valued variable. In this case, confidence intervals are calculated for percentile ratios or per-unit ratios between values of the original positive variable, instead of for percentile differences or per-unit differences.

ystargenerate(newvarlist)specifies a list of variables to be generated, corresponding to the percentile slopes, containing the differencesY*(beta)=Y-X*beta, wherebetais the percentile slope. The variable names in thenewvarlistare matched to the list of percentiles specified by thecentile()option, sorted in ascending order of percent. If the two lists have different lengths, thencenslopegenerates a numbernminof new variables equal to the minimum length of the two lists, matching the firstnminpercentiles with the firstnminnew variable names. Usually, there is only one percentile slope (the median slope), and one newystargenerate()variable, whose median can be used as the intercept when drawing a straight line through the data points on a scatterplot.

estaddrspecifies that the results saved inr()also be saved ine()(seeSaved resultsbelow). This option makes it easier to usecenslopewithparmby, to create an output dataset (or results set) with one observation per by-group and data on confidence intervals for Somers' D and median slopes.parmbyis part of the packageparmest, downloadable from SSC.

somersd_optionsare any of the options available withsomersd.

iteration_optionsare any of the options described in censlope_iteration.

Remarks

censlopeis part of thesomersdpackage and uses the programsomersd, which calculates confidence intervals for Somers' D and Kendall's tau-a. Given two random variables Y and X, a 100qth percentile slope of Y with respect to X is defined as a value ofbetasatisfying the equation

theta( Y-beta*X , X ) = 1 - 2qwhere

theta(U,V)represents eitherD(U|V)(Somers' D) ortau_a(U,V)(Kendall's tau-a) between the variables U and V. (For definitions of Somers' D and Kendall's tau-a, seesomersd.) Ifq=0.5, then the value ofbetais a Theil-Sen median slope. If in addition X is a binary variable, with possible values 0 and 1, then the value ofbetais a Hodges-Lehmann median difference between Y values in the subpopulation in which X==1 and Y values in the subpopulation in which X==0. An alternative program for calculating Hodges-Lehmann median (and other percentile) differences iscendif, which is also distributed as part of thesomersdpackage.For extreme percentiles and/or very small sample numbers,

censlopesometimes calculates infinite positive upper confidence limits or infinite negative lower confidence limits. These are represented by+/-c(maxdouble), wherec(maxdouble)is the c-class value specifying the largest positive number that can be stored in a double.

censlopecan use all the options used bysomersd, to use any of the extended versions of Somers' D or Kendall's tau-a in the definition of percentile slopes, differences, and ratios. In particular, we may use thewstrata()option ofsomersdto estimate within-stratum median differences and slopes, based on comparisons between observations between the same stratum. This method allows us to estimate median differences in an outcome variable, associated with an exposure, within strata defined by grouping a confounder, or by grouping a propensity score for the exposure based on multiple confounders. Therefore, rank parameters (such as median differences) can be adjusted for confounders, just as regression parameters can be adjusted for confounders. However, regression methods are required to define propensity scores.The program

cendifis also part of thesomersdpackage and calculates confidence intervals for a restricted subset of the parameters estimated bycenslope, assuming a binary X variable and a restricted range ofsomersd_options.cendifdoes not use an iterative method but instead calculates all possible differences between Y values in the two groups defined by the binary X variable. In large samples, this method is more time-consuming than the iterative method used bycenslope. However, in small samples (such as theautodata),cendifcan be much faster thancenslope.Full documentation of the

somersdpackage (including Methods and Formulas) is provided in the filessomersd.pdf,censlope.pdf, andcendif.pdf, which are distributed with thesomersdpackage as ancillary files (seenet). They can be viewed using the Adobe Acrobat Reader, which can be downloaded fromhttp://www.adobe.com/products/acrobat/readermain.html

For a comprehensive review of Kendall's tau-a, Somers' D, and median differences, see Newson (2002). The definitive reference for the statistical and computational methods of

censlopeis Newson (2006).

Examples

. censlope weight length

. censlope weight length, transf(z)

. censlope weight length, transf(z) centile(25(25)75)

. censlope weight foreign

. censlope weight foreign, transf(z)

. censlope weight foreign, transf(z) centile(0(25)100)The following example estimates percentile weight ratios between non-U.S. and U.S. cars:

. gene logweight=log(weight). censlope logweight foreign, transf(z) centile(0(25)100) eformThe following example uses the

wstrataoption ofsomersdto calculate median differences in fuel efficiency between non-U.S. and U.S. cars in the same weight quintile. We find that non-U.S. cars typically travel 2 to 7 more miles per gallon than U.S. cars, but 0 to 4 fewer miles per gallon than U.S. cars in the same weight quintile:

. xtile weightgp=weight, nquantiles(5). tab weightgp foreign. censlope mpg foreign, transf(z). censlope mpg foreign, transf(z) wstrata(weightgp)The following example creates a scatterplot of car weight in U.S. pounds against car length in U.S. inches with a straight line through the data points, whose slope is the median slope and whose intercept is the median of the variable

resigenerated by theystargenerate()option:

. censlope weight length, transf(z) tdist ystargenerate(resi). egen intercept=median(resi). gene what=weight-resi+intercept. lab var what "Predicted weight". scatter weight length || line what length, sortThe following example uses the

estaddroption together withparmby(part of theparmestpackage) to produce an output dataset (or results set) in the memory, with one observation per by-group, and data on confidence intervals for Somers' D and median slopes. This dataset is then input to theeclplotcommand to produce a confidence interval plot of Somers' D parameters and a confidence interval plot of median slopes. The packagesparmestandeclplotcan be downloaded from SSC.

. parmby "censlope weight length, tdist estaddr", by(foreign)norestore ecol(cimat) rename(ec_1_1 percent ec_1_2 pctlslopeec_1_3 minimum ec_1_4 maximum). list. eclplot estimate min95 max95 foreign, hori ylabel(0 1)xtitle("Somers' D (95% CI)"). eclplot pctlslope minimum maximum foreign, hori ylabel(0 1)xtitle("Percentile slope (95% CI)")

Saved results

censlopesaves the following results inr():Scalars

r(level)confidence levelr(fromabs)value of thefromabs()optionr(tolerance)value of thetolerance()optionMacros

r(yvar)name of the Y variabler(xvar)name of the X variabler(eform)eformif specifiedr(centiles)list of percentages for the percentilesr(technique)list of techniques from thetechnique()optionr(tech_steps)list of step numbers for the techniquesMatrices

r(cimat)confidence intervals for percentile differences or ratiosr(rcmat)return codes for entries ofr(cimat)r(bracketmat)bracket matrixr(techstepmat)column vector of step numbers for the techniquesThe matrix

r(cimat)has one row per percentile, as well as columns containing the percentages, percentile estimates, lower and upper confidence limits (labeledPercent,Pctl_Slope,Minimum, andMaximumifeformis not specified, orPercent,Pctl_Ratio,Minimum, andMaximumifeformis specified). The matrixr(rcmat)has the same numbers of rows and columns asr(cimat)with the same labels, and the first column contains the percentages, but the other entries contain return codes for the estimation of the corresponding entries ofr(cimat). These return codes are equal to 0 if the beta-value was estimated successfully, 1 if the corresponding zetastar-value could not be calculated, 2 if the corresponding zetastar-value could not be bracketed, 3 if the beta-brackets failed to converge, and 4 if the beta-value could not be calculated from the converged beta-brackets. The matrixr(bracketmat)is the final version of the bracket matrix described in help for the fromabs() and brackets() options ofcenslopeand has one row per beta-bracket, as well as two columns, labeledBetaandZetastar, containing the beta-brackets and the corresponding zetastar-values. The matrixr(techstepmat)is a column vector with one row for each of the techniques listed in the technique() option, with a row label equal to the name of the technique and a value equal to the number of steps for that technique. Thefromabs(),brackets(),tolerance(), andtechnique()options are described initeration_options.

censlopealso saves ine()a full set of estimation results for thesomersdcommand. Ifestaddris specified, this set of estimation results is expanded by adding a set ofe()results with the same names and contents as ther()results. This option allows the user to pass acenslopecommand toparmby, producing an output dataset (or results set) with one observation per by-group and data on confidence intervals for Somers' D and for the median slope.

AuthorRoger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk

ReferenceNewson, R. 2002. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D, and median differences.

Stata Journal2: 45-64.Newson, R. 2006. Confidence intervals for rank statistics: Percentile slopes, differences, and ratios.

Stata Journal6: 497-520.

Also seeManual:

[R] spearman,[R] ranksum,[R] signrank,[R] centileSTB: STB-52: sg123, STB-55: snp15, STB-57: snp15.1, STB-58: snp15.2, STB-58: snp16; STB-61: snp15.3; STB-61: snp16.1

Online:

ktau,ranksum,signrankcid,npshift,somersd,cendif,parmest,eclplot(if installed)