{smcl} {hline} help for {hi:censlope}{right:(SJ6-4: snp15_7; SJ6-3: snp15_6; SJ5-3: snp15_5; SJ3-3: snp15_4;} {right:STB-61: snp15_3; STB-58: snp15_2; STB-57: snp15)} {hline} {title:Robust confidence intervals for median and other percentile slopes} {p 8 21 2} {cmd:censlope} {it:yvarname} {it:xvarname} {weight} {ifin}{cmd:,} [{cmdab:ce:ntile}{cmd:(}{it:numlist}{cmd:)} {cmdab:ef:orm} {cmdab:ys:targenerate}{cmd:(}{help newvarlist:{it:newvarlist}}{cmd:)} {cmdab:esta:ddr} {help somersd:{it:somersd_options}} {help censlope_iteration:{it:iteration_options}}] {pstd} where {it:yvarname} and {it:xvarname} are variable names. {pstd} {cmd:fweight}s, {cmd:iweight}s, and {cmd:pweight}s are allowed; see {help weight}. They are interpreted as for {helpb somersd}. {pstd} {opt bootstrap}, {opt by}, {opt jackknife}, and {opt statsby} are allowed; see {help prefix}.{p_end} {title:Description} {pstd} {cmd:censlope} calculates confidence intervals for generalized Theil-Sen median slopes and other percentile slopes of a Y variable specified by {it:yvarname} with respect to an X variable specified by {it:xvarname}. These confidence intervals are robust to the possibility that the population distributions of the Y variable, conditional on different values of the X variable, are different in ways other than location. This might happen if, for example, the conditional distributions had different variances. For positive-valued Y variables, {cmd:censlope} can be used to calculate confidence intervals for median per-unit ratios or other percentile per-unit ratios associated with a unit increment in the X variable. If the X variable is binary with values 0 and 1, then the generalized Theil-Sen percentile slopes are the generalized Hodges-Lehmann percentile differences between the group of observations whose X value is 1 and the group of observations whose X value is 0. {cmd:censlope} is part of the {helpb somersd} package and requires the {helpb somersd} program to work. It executes the {helpb somersd} command, {p 8 21 2} {cmd:somersd} {it:xvarname} {it:yvarname} {weight} {ifin} [{cmd:,} {help somersd:{it:somersd_options}}] {pstd} and then estimates the percentile slopes. The estimates and confidence limits for the percentile slopes are evaluated using an {help censlope_iteration:iterative numerical method}, which the user may change from the default by using the {help censlope_iteration:{it:iteration_options}}. {title:Options} {p 4 8 2} {cmd:centile(}{it:numlist}{cmd:)} specifies a list of percentile slopes to be reported and defaults to {cmd:centile(50)} (median only) if not specified. Specifying {cmd:centile(25 50 75)} will produce the 25th, 50th, and 75th percentile differences. {p 4 8 2} {cmd:eform} specifies that exponentiated percentile slopes be given. This option is used if {it:yvarname} specifies the log of a positive-valued variable. In this case, confidence intervals are calculated for percentile ratios or per-unit ratios between values of the original positive variable, instead of for percentile differences or per-unit differences. {p 4 8 2} {cmd:ystargenerate(}{help newvarlist:{it:newvarlist}}{cmd:)} specifies a list of variables to be generated, corresponding to the percentile slopes, containing the differences {hi:Y*(beta)=Y-X*beta}, where {hi:beta} is the percentile slope. The variable names in the {help newvarlist:{it:newvarlist}} are matched to the list of percentiles specified by the {cmd:centile()} option, sorted in ascending order of percent. If the two lists have different lengths, then {cmd:censlope} generates a number {it:nmin} of new variables equal to the minimum length of the two lists, matching the first {it:nmin} percentiles with the first {it:nmin} new variable names. Usually, there is only one percentile slope (the median slope), and one new {cmd:ystargenerate()} variable, whose median can be used as the intercept when drawing a straight line through the data points on a scatterplot. {p 4 8 2} {cmd:estaddr} specifies that the results saved in {cmd:r()} also be saved in {cmd:e()} (see {hi:Saved results} below). This option makes it easier to use {cmd:censlope} with {helpb parmby}, to create an output dataset (or results set) with one observation per by-group and data on confidence intervals for Somers' {it:D} and median slopes. {helpb parmby} is part of the package {helpb parmest}, downloadable from {help ssc:SSC}. {phang} {it:somersd_options} are any of the options available with {helpb somersd}. {phang} {it:iteration_options} are any of the options described in {help censlope_iteration}. {title:Remarks} {pstd} {cmd:censlope} is part of the {helpb somersd} package and uses the program {helpb somersd}, which calculates confidence intervals for Somers' {it:D} and Kendall's tau-a. Given two random variables Y and X, a 100{hi:q}th percentile slope of Y with respect to X is defined as a value of {hi:beta} satisfying the equation {phang} {hi:theta( Y-beta*X , X ) = 1 - 2q} {pstd} where {hi:theta(U,V)} represents either {hi:D(U|V)} (Somers' {it:D}) or {hi:tau_a(U,V)} (Kendall's tau-a) between the variables U and V. (For definitions of Somers' {it:D} and Kendall's tau-a, see {helpb somersd}.) If {hi:q}=0.5, then the value of {hi:beta} is a Theil-Sen median slope. If in addition X is a binary variable, with possible values 0 and 1, then the value of {hi:beta} is a Hodges-Lehmann median difference between Y values in the subpopulation in which X==1 and Y values in the subpopulation in which X==0. An alternative program for calculating Hodges-Lehmann median (and other percentile) differences is {helpb cendif}, which is also distributed as part of the {helpb somersd} package. {pstd} For extreme percentiles and/or very small sample numbers, {cmd:censlope} sometimes calculates infinite positive upper confidence limits or infinite negative lower confidence limits. These are represented by {hi:+/-}{cmd:c(maxdouble)}, where {cmd:c(maxdouble)} is the {help creturn:c-class value} specifying the largest positive number that can be stored in a {help data_types:double}. {pstd} {cmd:censlope} can use all the options used by {helpb somersd}, to use any of the extended versions of Somers' {it:D} or Kendall's tau-a in the definition of percentile slopes, differences, and ratios. In particular, we may use the {cmd:wstrata()} option of {helpb somersd} to estimate within-stratum median differences and slopes, based on comparisons between observations between the same stratum. This method allows us to estimate median differences in an outcome variable, associated with an exposure, within strata defined by grouping a confounder, or by grouping a propensity score for the exposure based on multiple confounders. Therefore, rank parameters (such as median differences) can be adjusted for confounders, just as regression parameters can be adjusted for confounders. However, regression methods are required to define propensity scores. {pstd} The program {helpb cendif} is also part of the {helpb somersd} package and calculates confidence intervals for a restricted subset of the parameters estimated by {cmd:censlope}, assuming a binary X variable and a restricted range of {help somersd:{it:somersd_options}}. {helpb cendif} does not use an iterative method but instead calculates all possible differences between Y values in the two groups defined by the binary X variable. In large samples, this method is more time-consuming than the iterative method used by {cmd:censlope}. However, in small samples (such as the {helpb dta_examples:auto} data), {helpb cendif} can be much faster than {cmd:censlope}. {pstd} Full documentation of the {helpb somersd} package (including Methods and Formulas) is provided in the files {hi:somersd.pdf}, {hi:censlope.pdf}, and {hi:cendif.pdf}, which are distributed with the {helpb somersd} package as ancillary files (see {helpb net}). They can be viewed using the Adobe Acrobat Reader, which can be downloaded from {browse "http://www.adobe.com/products/acrobat/readermain.html":http://www.adobe.com/products/acrobat/readermain.html} {pstd} For a comprehensive review of Kendall's tau-a, Somers' {it:D}, and median differences, see Newson (2002). The definitive reference for the statistical and computational methods of {cmd:censlope} is Newson (2006). {title:Examples} {p 8 12 2}{cmd:. censlope weight length}{p_end} {p 8 12 2}{cmd:. censlope weight length, transf(z)}{p_end} {p 8 12 2}{cmd:. censlope weight length, transf(z) centile(25(25)75)}{p_end} {p 8 12 2}{cmd:. censlope weight foreign}{p_end} {p 8 12 2}{cmd:. censlope weight foreign, transf(z)}{p_end} {p 8 12 2}{cmd:. censlope weight foreign, transf(z) centile(0(25)100)}{p_end} {pstd} The following example estimates percentile weight ratios between non-U.S. and U.S. cars: {p 8 12 2}{cmd:. gene logweight=log(weight)}{p_end} {p 8 12 2}{cmd:. censlope logweight foreign, transf(z) centile(0(25)100) eform}{p_end} {pstd} The following example uses the {cmd:wstrata} option of {helpb somersd} to calculate median differences in fuel efficiency between non-U.S. and U.S. cars in the same weight quintile. We find that non-U.S. cars typically travel 2 to 7 more miles per gallon than U.S. cars, but 0 to 4 fewer miles per gallon than U.S. cars in the same weight quintile: {p 8 12 2}{cmd:. xtile weightgp=weight, nquantiles(5)}{p_end} {p 8 12 2}{cmd:. tab weightgp foreign}{p_end} {p 8 12 2}{cmd:. censlope mpg foreign, transf(z)}{p_end} {p 8 12 2}{cmd:. censlope mpg foreign, transf(z) wstrata(weightgp)}{p_end} {pstd} The following example creates a scatterplot of car weight in U.S. pounds against car length in U.S. inches with a straight line through the data points, whose slope is the median slope and whose intercept is the median of the variable {cmd:resi} generated by the {cmd:ystargenerate()} option: {p 8 12 2}{cmd:. censlope weight length, transf(z) tdist ystargenerate(resi)}{p_end} {p 8 12 2}{cmd:. egen intercept=median(resi)}{p_end} {p 8 12 2}{cmd:. gene what=weight-resi+intercept}{p_end} {p 8 12 2}{cmd:. lab var what "Predicted weight"}{p_end} {p 8 12 2}{cmd:. scatter weight length || line what length, sort}{p_end} {pstd} The following example uses the {cmd:estaddr} option together with {helpb parmby} (part of the {helpb parmest} package) to produce an output dataset (or resultsset) in the memory, with one observation per by-group, and data on confidence intervals for Somers' {it:D} and median slopes. This dataset is then input to the {helpb eclplot} command to produce a confidence interval plot of Somers' {it:D} parameters and a confidence interval plot of median slopes. The packages {helpb parmest} and {helpb eclplot} can be downloaded from {help ssc:SSC}. {p 8 12 2}{cmd:. parmby "censlope weight length, tdist estaddr", by(foreign) norestore ecol(cimat) rename(ec_1_1 percent ec_1_2 pctlslope ec_1_3 minimum ec_1_4 maximum)}{p_end} {p 8 12 2}{cmd:. list}{p_end} {p 8 12 2}{cmd:. eclplot estimate min95 max95 foreign, hori ylabel(0 1) xtitle("Somers' D (95% CI)")}{p_end} {p 8 12 2}{cmd:. eclplot pctlslope minimum maximum foreign, hori ylabel(0 1) xtitle("Percentile slope (95% CI)")}{p_end} {pstd} The following example illustrates the use of the {helpb bootstrap} prefix command to generate bootstrap confidence limits for the median slope, as recommended by Wilcox (1998). Note the use of the {cmd:nolimits} option, described in the help for {help censlope_iteration:{it:iteration_options}}. This approximately halves the computation time used by the bootstrap, because no confidence limits are calculated for the individual bootstrap subsamples. {p 8 12 2}{cmd:.set seed 987654321}{p_end} {p 8 12 2}{cmd:.bootstrap medslope=el(r(cimat),1,2), reps(399): censlope weight length, nolimits}{p_end} {p 8 12 2}{cmd:.estat bootstrap, all}{p_end} {title:Saved results} {pstd} {cmd:censlope} saves the following results in {cmd:r()}: {p2colset 5 21 25 2}{...} {p2col:Scalars}{p_end} {p2col:{cmd:r(level)}}confidence level{p_end} {p2col:{cmd:r(fromabs)}}value of the {cmd:fromabs()} option{p_end} {p2col:{cmd:r(tolerance)}}value of the {cmd:tolerance()} option{p_end} {p2col:Macros}{p_end} {p2col:{cmd:r(yvar)}}name of the Y variable{p_end} {p2col:{cmd:r(xvar)}}name of the X variable{p_end} {p2col:{cmd:r(eform)}}{cmd:eform} if specified{p_end} {p2col:{cmd:r(centiles)}}list of percentages for the percentiles{p_end} {p2col:{cmd:r(technique)}}list of techniques from the {cmd:technique()} option{p_end} {p2col:{cmd:r(tech_steps)}}list of step numbers for the techniques{p_end} {p2col:Matrices}{p_end} {p2col:{cmd:r(cimat)}}confidence intervals for percentile differences or ratios{p_end} {p2col:{cmd:r(rcmat)}}return codes for entries of {cmd:r(cimat)}{p_end} {p2col:{cmd:r(bracketmat)}}bracket matrix{p_end} {p2col:{cmd:r(techstepmat)}}column vector of step numbers for the techniques{p_end} {p2colreset}{...} {pstd} The matrix {cmd:r(cimat)} has one row per percentile, as well as columns containing the percentages, percentile estimates, lower and upper confidence limits (labeled {hi:Percent}, {hi:Pctl_Slope}, {hi:Minimum}, and {hi:Maximum} if {cmd:eform} is not specified, or {hi:Percent}, {hi:Pctl_Ratio}, {hi:Minimum}, and {hi:Maximum} if {cmd:eform} is specified). The matrix {cmd:r(rcmat)} has the same numbers of rows and columns as {cmd:r(cimat)} with the same labels, and the first column contains the percentages, but the other entries contain return codes for the estimation of the corresponding entries of {cmd:r(cimat)}. These return codes are equal to 0 if the beta-value was estimated successfully (or not requested by the user), 1 if the corresponding zetastar-value could not be calculated, 2 if the corresponding zetastar-value could not be bracketed, 3 if the beta-brackets failed to converge, and 4 if the beta-value could not be calculated from the converged beta-brackets. The matrix {cmd:r(bracketmat)} is the final version of the bracket matrix described in help for the {help censlope_iteration:fromabs() and brackets() options} of {cmd:censlope} and has one row per beta-bracket, as well as two columns, labeled {hi:Beta} and {hi:Zetastar}, containing the beta-brackets and the corresponding zetastar-values. The matrix {cmd:r(techstepmat)} is a column vector with one row for each of the techniques listed in the {help censlope_iteration:technique() option}, with a row label equal to the name of the technique and a value equal to the number of steps for that technique. The {cmd:fromabs()}, {cmd:brackets()}, {cmd:tolerance()}, and {cmd:technique()} options are described in {help censlope_iteration:{it:iteration_options}}. {pstd} {cmd:censlope} also saves in {cmd:e()} a full set of {help ereturn:estimation results} for the {helpb somersd} command. If {cmd:estaddr} is specified, this set of estimation results is expanded by adding a set of {cmd:e()} results with the same names and contents as the {cmd:r()} results. This option allows the user to pass a {cmd:censlope} command to {helpb parmest:parmby}, producing an output dataset (or results set) with one observation per by-group and data on confidence intervals for Somers' D and for the median slope. {title:Author} {pstd} Roger Newson, Imperial College London, UK.{break} Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk} {title:References} {phang} Newson, R. 2002. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' {it:D} and median differences. {it:Stata Journal} 2: 45-64. Download from {browse "http://www.stata-journal.com/article.html?article=st0007":the {it:Stata Journal} website}. {phang} Newson, R. 2006. Confidence intervals for rank statistics: Percentile slopes, differences, and ratios. {it:Stata Journal} 6: 497-520. Download from {browse "http://www.stata-journal.com/article.html?article=snp15_7":the {it:Stata Journal} website}. {phang} Wilcox, R. R. 1998. A note on the Theil-Sen regression estimator when the regressor is random and the error term is heteroscedastic. {it:Biometrical Journal} 40: 261-268. {title:Also see} {psee} Manual: {hi:[R] spearman}, {hi:[R] ranksum}, {hi:[R] signrank}, {hi:[R] centile} {p_end} {psee} STB: STB-52: sg123, STB-55: snp15, STB-57: snp15.1, STB-58: snp15.2, STB-58: snp16; STB-61: snp15.3; STB-61: snp16.1{p_end} {psee} Online: {helpb ktau}, {helpb ranksum}, {helpb signrank}{break} {helpb cid}, {helpb npshift}, {helpb somersd}, {helpb cendif}, {helpb parmest}, {helpb eclplot} (if installed) {p_end}