Compute standard errors using the inverse confidence interval method
invcise lb_varname ub_varname [ dof_varname ] [if] [in] , stderr( newvarname) [ eformestimate(varname) level(#) replace float fast ]
where lb_varname, ub_varname and dof_varname are the names of existing variables, containing lower confidence bounds, upper confidence bounds, and degrees of freedom, respectively.
Description
invcise is intended for use in an output dataset (or resultsset), with one observation for each of a set of estimated parameters, and variables containing their confidence limits, and (optionally) containing the degrees of freedom used to calculate these confidence limits. Such datasets may be produced using the official Stata statsby prefix, or by the parmest package, downloadable from SSC. invcise uses the confidence limits to compute a new variable, containing standard errors for the parameters, using the inverse confidence interval method. These standard errors, together with parameter estimates in another variable in the dataset, may be used to calculate standard errors and confidence intervals for linear combinations of these parameters, using the metaparm module of the parmest package, assuming that the parameters are independently estimated. The inverse confidence interval method is frequently used with rank statistics, such as medians, median differences, and median slopes, to compute confidence intervals for linear combinations of these rank statistics, particularly differences between differences ("interactions") or weighted means of several differences ("meta-analysis summaries").
Options
stderr(newvarname) is required. It specifies the name of a new variable to be created, containing standard errors computed from the input confidence limit variables using the inverse confidence interval method.
eformestimate(varname) specifies the name of a variable, assumed to be an exponentiated estimate corresponding to the input confidence limits, and implying that the standard error must be calculated from the log ratio of the confidence limits, multiplied by the eformestimate() variable, and then scaled inversely by twice the critical t-value or z-value corresponding to the confidence level specified by level(). If eformestimate() is not specified, then the standard error is calculated from the difference between the confidence limits, scaled inversely by twice the critical t-value or z-value corresponding to the confidence level specified by level(). The eformestimate() option is useful if the standard errors are used with the eformestimate() variable for input to the metaparm or parmcip modules of the parmest package, using the eform option of these modules to produce exponentiated confidence intervals. Such exponentiated confidence intervals may be used to estimate parameters which are ratios, ratios of ratios, or geometric mean ratios.
level(#) specifies the confidence level assumed for the input confidence limits, expressed as a percentage. If level() is not specified, then invcise first attempts to extract the confidence level from the variable characteristic lb_varname[level], and then (if this attempt fails) attempts to extract the confidence level from ub_varname[level], and then (if this attempt also fails) extracts the confidence level from the c-class value c(level), which contains the default confidence level in force in Stata at the time, which is usually set to 95 to specify 95% confidence limits. The variable characteristic varname[level] is created, for a confidence limit variable with the name varname, by the modules of the parmest package, which all set this characteristic to be equal to the confidence level used in calculating the confidence limit variable.
replace specifies that any non-input variable with the same name as the new variable specified by the stderr() option will be discarded before the new standard error variable is created.
float specifies that float is the highest-precision numeric type to be allowed for the stderr() variable. If float is not specified, then the stderr() variable is created as a double variable. Whether or not float is specified, the stderr() variable is compressed to the lowest precision possible without loss of informstion.
fast is an option for programmers. It specifies that invcise will take no action to restore the existing dataset in memory in the event of failure, or if the user presses Break. If fast is not specified, then invcise will take this action, which uses an amount of time depending on the size of the dataset in memory.
Methods and formulas
invcise computes standard errors using the inverse confidence interval method, which is an inversion of the method commonly used to compute confidence limits from estimates and standard errors.
The default formula (if eformestimate() is not specified) used to derive a standard error SE by inverting a 100*(1-alpha)% confidence interval with lower bound lb and upper bound ub is
SE = 0.5*(ub - lb)/z(alpha)
(where z(alpha) is the result of invnorm(1-alpha/2)) if no degrees of freedom variable is specified, and is
SE = 0.5*(ub - lb)/t(df,alpha)
(where t(df,alpha) is the result of invttail(df,1-alpha/2) and df is the degrees of freedom) if a degrees of freedom variable is specified.
If the eformestimate() option is specified, then the formula used is
SE = 0.5*eformestimate*(log(ub) - log(lb))/z(alpha)
(where eformestimate is the variable specified by eformestimate()) if no degrees of freedom variable is specified, and is
SE = 0.5*eformestimate*(lof(ub) - log(lb))/t(df,alpha)
if a degrees of freedom variable is specified.
These formulas are typically used with confidence intervals for rank statistics, such as percentiles and percentile differences. Lehmann (1963) discussed a standard error formula of this kind for Hodges-Lehmann median differences. McKean and Schrader (1984) discussed a standard error formula of this kind for medians, which was slightly modified by Bonett and Price (2001).
Usually, standard error formulas are a means to the end of calculating confidence intervals. The reason for inverting the usual practice is to calculate confidence intervals for linear combinations of independently estimated parameters, such as medians or median differences from independent subsamples from distinct subpopulations. These linear combinations are typically either weighted averages, or differences, or weighted averages of differences (as in a meta-analysis), or differences between differences (known as interactions, and viewed as important by some scientists). Bonett and Price (2002) discuss the general case of linear combinations of medians, and Price and Bonett (2002) discuss the special case of differences (and ratios) between two medians. Given a list of independently-estimated parameters theta_1, ..., theta_N, with corresponding standard errors se_1, ..., se_N, and corresponding coefficients a_1, ..., a_N, we wish to estimate the linear combination
Theta = Sum ( a_j * theta_j )
and its standard error
SE = sqrt( Sum (a_j * se_j)^2 )
and we can easily do this using the metaparm module of the parmest package, once the standard errors have been calculated using invcise. We usually expect the Central Limit Theorem to work better for the linear combination than for its component parameters, which may be better estimated using their original confidence intervals, which were inverted using invcise to give their standard errors.
Examples
The following sequence of commands reads in the auto data and adds a variable odd, indicating whether a car model is odd-numbered or even-numbered. This dataset is used in the examples, which compare differences in mileage between non-US cars and US cars within the odd-numbered and even-numbered groups.
.sysuse auto, clear .gene byte odd=mod(_n,2) .lab def odd 0 "Even" 1 "Odd" .lab val odd odd .lab var odd "Odd numbered model" .describe .tab foreign odd, m
The following example starts by using centile, with the statsby prefix, to replace the dataset in memory with a new dataset, with one observation for each of 4 groups, defined by combinations of values for the variables odd and foreign, and variables containing group numbers in N, and estimates and lower and upper confidence bounds for the group medians in median, medmin and medmax. We then use invcise to compute a standard error for each median, and use metaparm to replace the new dataset with a third dataset, with one observation per group defined by a value of odd, and data on confidence intervals and P-values for differences between median values in non-US and US cars in the group. The second metaparm command lists a confidence interval for the difference (or interaction) between the foreign-US difference in odd-numbered models and the foreign-US difference in even-numbered models. The third metaparm command lists a confidence interval for the weighted mean foreign-US difference, averaging the differences in odd-numbered and even-numbered cars.
.preserve .statsby N=r(N) median=r(c_1) medmin=r(lb_1) medmax=r(ub_1), by(odd foreign) noisily clear: centile mpg .list odd foreign N median medmin medmax .invcise medmin medmax, stderr(icse) .metaparm [iweight=(foreign==1)-(foreign==0)], by(odd) norestore sumvar(N) estimate(median) stderr(icse) .list odd N median min95 max95 p .metaparm [iweight=(odd==1)-(odd==0)], sumvar(N) estimate(median) stderr(icse) list(,) .metaparm [aweight=N], sumvar(N) estimate(median) stderr(icse) list(,) .restore
The following example compares Hodges-Lehmann median foreign-US differences, which are not necessarily the same parameters as foreign-US differences between medians. We start by using the censlope module of the somersd package, together with the parmby module of the parmest package, to replace the dataset in memory with a new dataset, with one observation per value of odd, and data on confidence intervals and P-values for foreign-US median differences. We then use invcise to compute standard errors inversely from the confidence limits. The first metaparm command lists a confidence interval and a P-value for the odd-even difference (or interaction) between foreign-US median differences. The second metaparm command lists a confidence interval for the weighted mean of the two foreign-US median differences, summarizing the foriegn-US differences in the two groups. The confidence intervals are slightly slimmer than the corresponding confidence intervals in the previous example, although they are for different parameters.
.preserve .parmby "censlope mpg foreign, tdist estaddr", by(odd) escal(N) norestore ecol(cimat) rename(es_1 N ec_1_1 percent ec_1_2 meddif ec_1_3 mdmin ec_1_4 mdmax) .describe .list odd N dof meddif mdmin mdmax .invcise mdmin mdmax dof, stderr(icse) .metaparm [iweight=(odd==1)-(odd==0)] , sumvar(N) estimate(meddif) stderr(icse) dof(dof) list(,) .metaparm [aweight=N], sumvar(N) estimate(meddif) stderr(icse) dof(dof) list(,) .restore
The following example is similar to the previous example, but compares Hodges-Lehmann median foreign/US ratios instead of Hodges-Lehmann median foreign/US differences. We start by creating the variable logmpg as the log of mpg, and estimate the Hodges-Lehmann median ratios by exponentiating the Hodges-Lehmann median differences for logmpg. We then use invcise, with the eformestimate() option, to calculate inverse confidence interval standard errors for the median ratios. These are then input into metaparm as before, except that, this time, we use the eform option of metaparm, to estimate the odd/even ratios between foreign/US ratios, and to estimate the weighted geometric mean foreign/US ratio.
.preserve .gene logmpg=log(mpg) .parmby "censlope logmpg foreign, tdist estaddr eform", eform by(odd) escal(N) norestore ecol(cimat) rename(es_1 N ec_1_1 percent ec_1_2 medrat ec_1_3 mrmin ec_1_4 mrmax) .describe .list odd N dof medrat mrmin mrmax .invcise mrmin mrmax dof, stderr(icse) eformestimate(medrat) .metaparm [iweight=(odd==1)-(odd==0)] , sumvar(N) estimate(medrat) stderr(icse) dof(dof) eform list(,) .metaparm [aweight=N], sumvar(N) estimate(medrat) stderr(icse) dof(dof) eform list(,) .restore
The parmest and somersd packages can both be downloaded from SSC.
Saved results
invcise saves the following in r():
Scalars r(level) confidence level
Macros r(lb) name of lower confidence bound variable r(ub) name of upper confidence bound variable r(dof) name of degrees of freedom variable r(eformestimate) name of eformestimate() variable r(levelsource) source of confidence level
The returned result r(levelsource) may be level(), lb_varname[level], ub_varname[level], or c(level), indicating that the confidence level was derived from the level() option, from the level characteristic of the lower bound variable, from the level characteristic of the upper bound variable, or from the c-class value c(level), respectively.
Author
Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk
References
Bonett, D. G. and Price, R. M. 2002. Statistical inference for a linear function of medians: Confidence intervals, hypothesis testing, and sample size requirements. Psychological Methods 7(3): 370-383.
Lehmann, E. L. 1963. Nonparametric confidence intervals for a shift parameter. Annals of Mathematical Statistics 34(4): 1507-1512.
McKean, J. W. and Schrader, R. M. 1984. A comparison of methods for studentizing the sample median. Communications in Statistics - Simulation and Computation 13(6): 751-773.
Price, R. M. and Bonett, D. G. 2002. Distribution-free confidence intervals for difference and ratio of medians. Journal of Statistical Computation and Simulation 72(2): 119-124.
Price, R. M. and Bonett, D. G. 2001. Estimating the variance of the sample median. Journal of Statistical Computing and Simulation 68(3): 295-305.
Also see
Manual: [R] centile, [D] statsby On-line: help for centile, statsby help for parmest, parmby, parmcip, metaparm, somersd, censlope, cendif if installed