{smcl}
{...}
{hline}
help for {hi:somersd}{right:(SJ6-3: snp15_6; SJ5-3: snp15_5; SJ3-3: snp15_4;}
{right:STB-61: snp15_3; STB-58: snp15_2; STB-57: snp15)}
{hline}
{title:Somers' {it:D} or Kendall's tau-a with confidence intervals}
{p 8 21 2}
{cmd:somersd} {varlist} {weight} {ifin}
[{cmd:,} {cmdab:ta:ua} {cmdab:tr:ansf}{cmd:(}{it:transformation_name}{cmd:)} {cmdab:td:ist}
{cmdab:ce:nind}{cmd:(}{it:cenind_list}{cmd:)}
{cmdab:cl:uster}{cmd:(}{it:varname}{cmd:)}
{cmdab:cfw:eight}{cmd:(}{it:expression}{cmd:)}
{cmdab:fu:ntype}{cmd:(}{it:functional_type}{cmd:)}
{cmdab:ws:trata}{cmd:(}{it:varlist}{cmd:)}
{cmdab:bs:trata}{cmd:(}{it:varlist} | {cmd:_n}{cmd:)}
{cmdab::no}{cmdab:tre:e}
{cmdab:l:evel}{cmd:(}{it:#}{cmd:)}
{cmdab:ci:matrix}{cmd:(}{it:new_matrix}{cmd:)} ]
{pstd}
where {it:transformation_name} is one of
{p 8 21 2}
{cmd:iden} | {cmd:z} | {cmd:asin} | {cmd:rho} | {cmd:zrho} | {cmd:c}
{pstd}
and {it:functional_type} is one of
{p 8 21 2}
{cmdab:w:cluster} | {cmdab:b:cluster} | {cmdab:v:onmises}
{pstd}
and {it:cenind_list} is a list of variable names and/or zeros.
{pstd}
{cmd:fweight}s, {cmd:iweight}s, and {cmd:pweight}s are allowed; see
{help weight}.
{pstd}
{opt bootstrap}, {opt by}, {opt jackknife}, {opt statsby}, {opt mi estimate},
{opt svy jackknife}, {opt svy bootstrap}, {opt svy brr} and {opt svy sdr}
are allowed; see {helpb prefix}.{p_end}
{title:Description}
{pstd}
{cmd:somersd} computes confidence intervals for a wide range of rank
statistics. It includes 3 component modules, each with a .pdf manual, which
is distributed with the {cmd:somersd} package as an ancillary file. The
modules are as follows:
{p2colset 4 28 32 2}{...}
{p2col:Module{space 4}File}Calculates confidence intervals for{p_end}
{p2line}
{p2col:{cmd:somersd}{space 3}{hi:somersd.pdf}}Kendall's tau-a and Somers'
D{p_end}
{p2col:{helpb censlope}{space 2}{hi:censlope.pdf}}Theil-Sen median (and other percentile) slopes{p_end}
{p2col:{helpb cendif}{space 4}{hi:cendif.pdf}}Hodges-Lehmann median (and other percentile) differences{p_end}
{p2colreset}{...}
{pstd}
The modules {helpb censlope} and {helpb cendif} require the module
{cmd:somersd} in order to work and use a lot of the same options.
{pstd}
The module {cmd:somersd} calculates values of Somers' {it:D} or Kendall's tau-a
for the first variable of {it:varlist} as a predictor of each of the other
variables in {it:varlist}, with estimates and jackknife variances and
confidence intervals output as if for the parameters of a maximum likelihood
fit. It is possible to use {helpb lincom} to output confidence limits for
differences between the population Somers' {it:D} or tau-a values.
{title:Options for use with somersd}
{p 4 8 2}
{cmd:taua} causes {cmd:somersd} to calculate Kendall's tau-a. If {cmd:taua} is
absent, then Somers' {it:D} is calculated.
{p 4 8 2}
{cmd:transf(}{it:transformation_name}{cmd:)} specifies that the estimates are to be transformed,
defining a confidence level for the transformed population value. {cmd:iden}
(identity or untransformed) is the default. {cmd:z} specifies Fisher's z (the
hyperbolic arctangent), {cmd:asin} specifies Daniels' arcsine, {cmd:rho}
specifies Greiner's rho (Pearson correlation estimated using Greiner's
relation), {cmd:zrho} specifies the {cmd:z}-transform of Greiner's rho, and
{cmd:c} specifies Harrell's c. If the first variable of the {it:varlist}
is a binary indicator of a disease and the other variables are quantitative
predictors for that disease, then Harrell's c is the area under the
receiver operating characteristic (ROC) curve. {cmd:somersd} recognizes the
transformation names {cmd:arctanh} and {cmd:atanh} as synonyms for {cmd:z},
{cmd:arcsin} and {cmd:arsin} as synonyms for {cmd:asin}, {cmd:sinph} as a
synonym for {cmd:rho}, {cmd:zsinph} as a synonym for {cmd:zrho}, and {cmd:roc}
and {cmd:auroc} as synonyms for {cmd:c}. It also recognizes unambiguous
abbreviations for transformation names, such as {cmd:id} for {cmd:iden} or
{cmd:aur} for {cmd:auroc}. The transformations are calculated using a
{help somersd_mata:Mata function}.
{p 4 8 2}
{cmd:tdist} specifies that the estimates are assumed to have a
t distribution with {hi:N-1} degrees of freedom, where {hi:N} is the number of
clusters if {cmd:cluster()} is specified, or the number of observations
if {cmd:cluster()} is not specified.
If {cmd:tdist} is not specified,
then the standardized Somers' {it:D} estimates are assumed to be sampled from a standard Normal distribution.
Simulation study data suggest that the {cmd:tdist} option should be recommended.
{p 4 8 2}
{cmd:cenind(}{it:cenind_list}{cmd:)} specifies a list of left- or
right-censorship indicators, corresponding to the variables mentioned in the
{it:varlist}. Each censorship indicator is either a variable name or a zero.
If the censorship indicator corresponding to a variable is the name of a
second variable, then this second variable is used to indicate the censorship
status of the first variable, which is assumed to be left-censored (at or
below its stated value) in observations in which the second variable is
negative, right-censored (at or above its stated value) in observations in
which the second variable is positive, and uncensored (equal to its stated
value) in observations in which the second variable is zero. If the censorship
indicator corresponding to a variable is a zero, then the variable is assumed
to be uncensored. If {cmd:cenind()} is unspecified, then all variables in the
{cmd:varlist} are assumed to be uncensored. If the list of censorship
indicators specified by {cmd:cenind()} is shorter than the list of variables
specified in the {it:varlist}, then the list of censorship indicators is
completed with the required number of zeros on the right.
{p 4 8 2}
{cmd:cluster(}{it:varname}{cmd:)} specifies the variable which defines
sampling clusters. If {cmd:cluster()} is specified, then the variances and
confidence limits are calculated assuming that the data represent a sample of
clusters from a population of clusters, rather than a sample of observations
from a population of observations.
{p 4 8 2}
{cmd:cfweight(}{it:expression}{cmd:)} specifies an expression giving the
cluster frequency weights. These cluster frequency weights must have the same
value for all observations in a cluster. If {cmd:cfweight()} and
{cmd:cluster()} are both specified, then each cluster in the dataset is
assumed to represent a number of identical clusters equal to the cluster
frequency weight for that cluster. If {cmd:cfweight()} is specified and
{cmd:cluster()} is unspecified, then each observation in the dataset is
treated as a cluster, and assumed to represent a number of identical
one-observation clusters equal to the cluster frequency weight. For more
details on the interpretation of weights, see {hi:Interpretation of weights}
below.
{p 4 8 2}
{cmd:funtype(}{it:functional_type}{cmd:)} specifies whether the Somers' {it:D}
or Kendall's tau-a functionals estimated are between-cluster, within-cluster
or Von Mises functionals. These three functional types are specified by the
options {cmd:funtype(bcluster)}, {cmd:funtype(wcluster)} or
{cmd:funtype(vonmises)}, respectively. If {cmd:funtype()} is not specified,
then {cmd:funtype(bcluster)} is assumed, and between-cluster functionals are
estimated. The within-cluster Somers' {it:D} is a generalization of the
confidence interval corresponding to the {help signrank:sign test}. The Gini
coefficient is a special case of the clustered Von Mises Somers' {it:D}. For
further details, see the manual {hi:somersd.pdf}, distributed with
{cmd:somersd} as an ancillary file.
{p 4 8 2}
{cmd:wstrata(}{it:varlist}{cmd:)} specifies a list of variables whose value
combinations are the W strata. If {cmd:wstrata()} is specified, then
{cmd:somersd} estimates stratified Somers' {it:D} or Kendall's tau-a
parameters, applying only to pairs of observations within the same W
stratum. These parameters can be used to measure associations within strata,
such as associations between an outcome and an exposure within groups defined
by values of a confounder, or by values of a propensity score based on
multiple confounders.
{p 4 8 2}
{cmd:bstrata(}{it:varlist} | {cmd:_n}{cmd:)} specifies the B strata. If
{cmd:bstrata()} is specified, then {cmd:somersd} estimates Somers' {it:D} or
Kendall's tau-a parameters specific to pairs of observations from different
B strata. These B strata are either combinations of values of a
list of variables (if {it:varlist} is specified) or the individual
observations (if {cmd:_n} is specified). B strata will not often be
required. However, if we are estimating the within-cluster Kendall's tau-a
(using the options {cmd:taua funtype(wcluster)}), then the additional option
{cmd:bstrata(_n)} will ensure that the within-cluster Kendall's tau-a can take
the whole range of values from -1 (in the case of complete discordance within
clusters) to +1 (in the case of complete concordance within clusters).
{p 4 8 2}
{cmd:notree} specifies that {cmd:somersd} does not use the default
{help somersd_mata:search tree algorithm} based on Newson (2006a), but instead
uses a trivial algorithm, which compares every pair of observations and
requires much more time with large datasets. This option is rarely used except
to compare performance. Both algorithms are implemented in {help mata:Mata},
using a set of {help somersd_mata:Mata functions}, whose source code is
distributed with the {cmd:somersd} package.
{p 4 8 2}
{cmd:level(}{it:#}{cmd:)} specifies the confidence level, as a percentage, for
confidence intervals of the estimates; see {helpb level}.
{p 4 8 2}
{cmd:cimatrix(}{it:new_matrix}{cmd:)} specifies an output matrix to be
created, containing estimates and confidence limits for the untransformed
Somers' {it:D}, Kendall's tau-a or Greiner's rho parameters. If {cmd:transf()}
is specified, then the confidence limits will be asymmetric and based on
symmetric confidence limits for the transformed parameters. This option (like
{cmd:level()} may be used in replay mode as well as in nonreplay mode.
{title:Remarks}
{pstd}
For uncensored variables X and Y, Kendall's tau-a is defined as
{phang2}
{hi:tau_a(X,Y) = E[sign(X1-X2)*sign(Y1-Y2)]}
{pstd}
where (X1,Y1) and (X2,Y2) are sampled from the bivariate
distribution of X and Y. In the case of censored variables X
and Y, with censorship indicators R and S, respectively, which
are negative for left-censorship, positive for right-censorship, and zero for
noncensorship, we define Kendall's tau-a as
{phang2}
{hi:tau_a(X,Y) = E[csign(X1,R1,X2,R2)*csign(Y1,S1,Y2,S2)]}
{pstd}
where the function
{phang2}
{hi:csign(U,P,V,Q)}
{pstd}
is defined as 1 if U>V and P>=0>=Q, -1 if U