{smcl} {...} {hline} help for {hi:somersd}{right:(SJ6-3: snp15_6; SJ5-3: snp15_5; SJ3-3: snp15_4;} {right:STB-61: snp15_3; STB-58: snp15_2; STB-57: snp15)} {hline} {title:Somers' {it:D} or Kendall's tau-a with confidence intervals} {p 8 21 2} {cmd:somersd} {varlist} {weight} {ifin} [{cmd:,} {cmdab:ta:ua} {cmdab:tr:ansf}{cmd:(}{it:transformation_name}{cmd:)} {cmdab:td:ist} {cmdab:ce:nind}{cmd:(}{it:cenind_list}{cmd:)} {cmdab:cl:uster}{cmd:(}{it:varname}{cmd:)} {cmdab:cfw:eight}{cmd:(}{it:expression}{cmd:)} {cmdab:fu:ntype}{cmd:(}{it:functional_type}{cmd:)} {cmdab:ws:trata}{cmd:(}{it:varlist}{cmd:)} {cmdab:bs:trata}{cmd:(}{it:varlist} | {cmd:_n}{cmd:)} {cmdab::no}{cmdab:tre:e} {cmdab:l:evel}{cmd:(}{it:#}{cmd:)} {cmdab:ci:matrix}{cmd:(}{it:new_matrix}{cmd:)} ] {pstd} where {it:transformation_name} is one of {p 8 21 2} {cmd:iden} | {cmd:z} | {cmd:asin} | {cmd:rho} | {cmd:zrho} | {cmd:c} {pstd} and {it:functional_type} is one of {p 8 21 2} {cmdab:w:cluster} | {cmdab:b:cluster} | {cmdab:v:onmises} {pstd} and {it:cenind_list} is a list of variable names and/or zeros. {pstd} {cmd:fweight}s, {cmd:iweight}s, and {cmd:pweight}s are allowed; see {help weight}. {pstd} {opt bootstrap}, {opt by}, {opt jackknife}, {opt statsby}, {opt mi estimate}, {opt svy jackknife}, {opt svy bootstrap}, {opt svy brr} and {opt svy sdr} are allowed; see {helpb prefix}.{p_end} {title:Description} {pstd} {cmd:somersd} computes confidence intervals for a wide range of rank statistics. It includes 3 component modules, each with a .pdf manual, which is distributed with the {cmd:somersd} package as an ancillary file. The modules are as follows: {p2colset 4 28 32 2}{...} {p2col:Module{space 4}File}Calculates confidence intervals for{p_end} {p2line} {p2col:{cmd:somersd}{space 3}{hi:somersd.pdf}}Kendall's tau-a and Somers' D{p_end} {p2col:{helpb censlope}{space 2}{hi:censlope.pdf}}Theil-Sen median (and other percentile) slopes{p_end} {p2col:{helpb cendif}{space 4}{hi:cendif.pdf}}Hodges-Lehmann median (and other percentile) differences{p_end} {p2colreset}{...} {pstd} The modules {helpb censlope} and {helpb cendif} require the module {cmd:somersd} in order to work and use a lot of the same options. {pstd} The module {cmd:somersd} calculates values of Somers' {it:D} or Kendall's tau-a for the first variable of {it:varlist} as a predictor of each of the other variables in {it:varlist}, with estimates and jackknife variances and confidence intervals output as if for the parameters of a maximum likelihood fit. It is possible to use {helpb lincom} to output confidence limits for differences between the population Somers' {it:D} or tau-a values. {title:Options for use with somersd} {p 4 8 2} {cmd:taua} causes {cmd:somersd} to calculate Kendall's tau-a. If {cmd:taua} is absent, then Somers' {it:D} is calculated. {p 4 8 2} {cmd:transf(}{it:transformation_name}{cmd:)} specifies that the estimates are to be transformed, defining a confidence level for the transformed population value. {cmd:iden} (identity or untransformed) is the default. {cmd:z} specifies Fisher's z (the hyperbolic arctangent), {cmd:asin} specifies Daniels' arcsine, {cmd:rho} specifies Greiner's rho (Pearson correlation estimated using Greiner's relation), {cmd:zrho} specifies the {cmd:z}-transform of Greiner's rho, and {cmd:c} specifies Harrell's c. If the first variable of the {it:varlist} is a binary indicator of a disease and the other variables are quantitative predictors for that disease, then Harrell's c is the area under the receiver operating characteristic (ROC) curve. {cmd:somersd} recognizes the transformation names {cmd:arctanh} and {cmd:atanh} as synonyms for {cmd:z}, {cmd:arcsin} and {cmd:arsin} as synonyms for {cmd:asin}, {cmd:sinph} as a synonym for {cmd:rho}, {cmd:zsinph} as a synonym for {cmd:zrho}, and {cmd:roc} and {cmd:auroc} as synonyms for {cmd:c}. It also recognizes unambiguous abbreviations for transformation names, such as {cmd:id} for {cmd:iden} or {cmd:aur} for {cmd:auroc}. The transformations are calculated using a {help somersd_mata:Mata function}. {p 4 8 2} {cmd:tdist} specifies that the estimates are assumed to have a t distribution with {hi:N-1} degrees of freedom, where {hi:N} is the number of clusters if {cmd:cluster()} is specified, or the number of observations if {cmd:cluster()} is not specified. If {cmd:tdist} is not specified, then the standardized Somers' {it:D} estimates are assumed to be sampled from a standard Normal distribution. Simulation study data suggest that the {cmd:tdist} option should be recommended. {p 4 8 2} {cmd:cenind(}{it:cenind_list}{cmd:)} specifies a list of left- or right-censorship indicators, corresponding to the variables mentioned in the {it:varlist}. Each censorship indicator is either a variable name or a zero. If the censorship indicator corresponding to a variable is the name of a second variable, then this second variable is used to indicate the censorship status of the first variable, which is assumed to be left-censored (at or below its stated value) in observations in which the second variable is negative, right-censored (at or above its stated value) in observations in which the second variable is positive, and uncensored (equal to its stated value) in observations in which the second variable is zero. If the censorship indicator corresponding to a variable is a zero, then the variable is assumed to be uncensored. If {cmd:cenind()} is unspecified, then all variables in the {cmd:varlist} are assumed to be uncensored. If the list of censorship indicators specified by {cmd:cenind()} is shorter than the list of variables specified in the {it:varlist}, then the list of censorship indicators is completed with the required number of zeros on the right. {p 4 8 2} {cmd:cluster(}{it:varname}{cmd:)} specifies the variable which defines sampling clusters. If {cmd:cluster()} is specified, then the variances and confidence limits are calculated assuming that the data represent a sample of clusters from a population of clusters, rather than a sample of observations from a population of observations. {p 4 8 2} {cmd:cfweight(}{it:expression}{cmd:)} specifies an expression giving the cluster frequency weights. These cluster frequency weights must have the same value for all observations in a cluster. If {cmd:cfweight()} and {cmd:cluster()} are both specified, then each cluster in the dataset is assumed to represent a number of identical clusters equal to the cluster frequency weight for that cluster. If {cmd:cfweight()} is specified and {cmd:cluster()} is unspecified, then each observation in the dataset is treated as a cluster, and assumed to represent a number of identical one-observation clusters equal to the cluster frequency weight. For more details on the interpretation of weights, see {hi:Interpretation of weights} below. {p 4 8 2} {cmd:funtype(}{it:functional_type}{cmd:)} specifies whether the Somers' {it:D} or Kendall's tau-a functionals estimated are between-cluster, within-cluster or Von Mises functionals. These three functional types are specified by the options {cmd:funtype(bcluster)}, {cmd:funtype(wcluster)} or {cmd:funtype(vonmises)}, respectively. If {cmd:funtype()} is not specified, then {cmd:funtype(bcluster)} is assumed, and between-cluster functionals are estimated. The within-cluster Somers' {it:D} is a generalization of the confidence interval corresponding to the {help signrank:sign test}. The Gini coefficient is a special case of the clustered Von Mises Somers' {it:D}. For further details, see the manual {hi:somersd.pdf}, distributed with {cmd:somersd} as an ancillary file. {p 4 8 2} {cmd:wstrata(}{it:varlist}{cmd:)} specifies a list of variables whose value combinations are the W strata. If {cmd:wstrata()} is specified, then {cmd:somersd} estimates stratified Somers' {it:D} or Kendall's tau-a parameters, applying only to pairs of observations within the same W stratum. These parameters can be used to measure associations within strata, such as associations between an outcome and an exposure within groups defined by values of a confounder, or by values of a propensity score based on multiple confounders. {p 4 8 2} {cmd:bstrata(}{it:varlist} | {cmd:_n}{cmd:)} specifies the B strata. If {cmd:bstrata()} is specified, then {cmd:somersd} estimates Somers' {it:D} or Kendall's tau-a parameters specific to pairs of observations from different B strata. These B strata are either combinations of values of a list of variables (if {it:varlist} is specified) or the individual observations (if {cmd:_n} is specified). B strata will not often be required. However, if we are estimating the within-cluster Kendall's tau-a (using the options {cmd:taua funtype(wcluster)}), then the additional option {cmd:bstrata(_n)} will ensure that the within-cluster Kendall's tau-a can take the whole range of values from -1 (in the case of complete discordance within clusters) to +1 (in the case of complete concordance within clusters). {p 4 8 2} {cmd:notree} specifies that {cmd:somersd} does not use the default {help somersd_mata:search tree algorithm} based on Newson (2006a), but instead uses a trivial algorithm, which compares every pair of observations and requires much more time with large datasets. This option is rarely used except to compare performance. Both algorithms are implemented in {help mata:Mata}, using a set of {help somersd_mata:Mata functions}, whose source code is distributed with the {cmd:somersd} package. {p 4 8 2} {cmd:level(}{it:#}{cmd:)} specifies the confidence level, as a percentage, for confidence intervals of the estimates; see {helpb level}. {p 4 8 2} {cmd:cimatrix(}{it:new_matrix}{cmd:)} specifies an output matrix to be created, containing estimates and confidence limits for the untransformed Somers' {it:D}, Kendall's tau-a or Greiner's rho parameters. If {cmd:transf()} is specified, then the confidence limits will be asymmetric and based on symmetric confidence limits for the transformed parameters. This option (like {cmd:level()} may be used in replay mode as well as in nonreplay mode. {title:Remarks} {pstd} For uncensored variables X and Y, Kendall's tau-a is defined as {phang2} {hi:tau_a(X,Y) = E[sign(X1-X2)*sign(Y1-Y2)]} {pstd} where (X1,Y1) and (X2,Y2) are sampled from the bivariate distribution of X and Y. In the case of censored variables X and Y, with censorship indicators R and S, respectively, which are negative for left-censorship, positive for right-censorship, and zero for noncensorship, we define Kendall's tau-a as {phang2} {hi:tau_a(X,Y) = E[csign(X1,R1,X2,R2)*csign(Y1,S1,Y2,S2)]} {pstd} where the function {phang2} {hi:csign(U,P,V,Q)} {pstd} is defined as 1 if U>V and P>=0>=Q, -1 if U