One and Two-way tables for survey data, with balance repeated replication (BRR) > based standard errors
svrtab varname1 [varname2] [if exp] [in range] [ ,
[tabulate options]
tab(varname) missing
[display items]
cell count row column obs se ci deff deft cucol
[display formats]
{ format(%fmt) | fcell(%fmt) fcount(%fmt) frow(%fmt) fcolumn(%fmt) fobs(%fmt) fse(%fmt) fci(%fmt) fdeff(%fmt) fdeft(%fmt) }
[display options]
[ proportion | percent ] nolabel nomarginals format(%fmt) vertical level(#)
[statistic options]
pearson lr null wald llwald noadjust ]
This command is for use with replication weights. You must set your data for replication based survey estimation with svrset or survwgt before using this command.
svrtab typed without arguments redisplays previous results. Any of the "display items", "display options", or "statistic options" can be specified when redisplaying with the following exception: wald must be specified at run time.
Description
svrmodel produces one- and two-way tabulations with complex survey data. Tests for independence are available for two-way tables. Standard errors are calculated using a series of user-supplied replication weights, by balanced repeated replication (BRR) or survey jackknife (JK1, JK2, or JKn). This is an alternate method to the Taylor series linearization methods used by Stata's svy commands. See survwgt for details on the creation of weights and estimation of variances with replication.
Except for the different method of variance calculation, svrtab has identical syntax as svytab. Point estimates are the same as those from svytab; standard errors and tests of independence are different.
Options regarding Tabulation
These options operate exactly as they do in svytab, and this help text is taken from that help file:
tab(varname) specifies that counts should instead be cell totals of this variable and proportions (or percentages) should be relative to (i.e., weighted by) this variable. For example, if this variable denotes income, then the cell "counts" are instead totals of income for each cell, and the cell proportions are proportions of income for each cell.
missing specifies that missing values of varname1 and varname2 are to be treated as another row or column category, rather than be omitted from the analysis (the default).
Options to Choose Items for Display
These options operate exactly as they do in svytab, and this help text is taken from that help file:
cell requests that cell proportions (or percentages) be displayed. This is the default if none of count, row, or column are specified.
count requests that weight cell counts be displayed.
row or column requests that row or column proportions (or percentages) be displayed.
cucol requests that cumulative column percentages be displayed. This option is only valid for tables with one column.
obs requests that the number of observations for each cell be displayed.
se requests that the standard errors of either cell proportions (the default), weighted counts, or row or column proportions be displayed. When se (or ci, deff, or deft) is specified, only one of cell, count, row, or column can be selected. The standard error computed is the standard error of the one selected.
ci requests confidence intervals for either cell proportions, weighted counts, or row or column proportions. The confidence intervals are constructed using a logit transform so that their endpoints always lie between 0 and 1.
deff (deft) requests that the design-effect measure deff (deft) be displayed for either cell proportions, counts, or row or column proportions. See [R] svymean for details. The mean generalized deff is also displayed when deff or deft is requested.
Options for Display Formats
format(%fmt) specifies an overall format for the items in the table. The default is %6.0g. See [U] 15.5 Formats: controlling how data are displayed.
Alternately, display formats can be specified separately for the items included in the table with the other formatting options, fcell(), fcount(), frow(), fcolumn(), fobs(), fse(), fci(), fdeff(), fdeft().
If only one of fcell(), fcolumn(), and frow() is specified, that format is used for all three, if they are being displayed. Similarly, if only one of fdeff() and fdeft() is specified, that format will be used for displaying both deff and deft.
Options regarding Display
proportion or percent requests that proportions (the default) or percentages be displayed.
nolabel requests that variable labels and value labels be ignored.
nomarginals requests that row and column marginals not be displayed.
vertical requests that the endpoints of the confidence intervals be stacked vertically on display.
level(#) specifies the confidence level (i.e., nominal coverage rate), in percent, for confidence intervals; see help level.
cellwidth(#), csepwidth(#), and stubwidth(#) specify widths of table elements in the output; see help tabdisp.
pearson requests that the Pearson chi-squared statistic be computed. By default, this is the test of independence that is displayed. The Pearson chi-squared statistic is corrected for the survey design using the second-order correction of Rao and Scott (1984) and converted into an F-statistic.
One term in the correction formula can be calculated using either observed cell proportions or proportions under the null hypothesis (i.e., the product of the marginals). By default, observed cell proportions are used. If the null option is selected, then a statistic corrected using proportions under the null is displayed as well. In most cases, it makes little difference which is used, but simulations seem to indicate that for sparse tables, using observed cell proportions is superior.
lr requests that the likelihood-ratio test statistic for proportions be computed. Note that this statistic is not defined when there are one or more zero cells in the table. The statistic is corrected for the survey design using exactly the same correction procedure that is used with the pearson statistic. Again, either observed cell proportions or proportions under the null can be used in the correction formula. By default, the former is used; specifying the null option gives both the former and the latter. Neither variant of this statistic is recommended for sparse tables. For nonsparse tables, the lr statistics are very similar to the corresponding pearson statistics.
null modifies the pearson and lr options only. If it is specified, two corrected statistics are displayed. The statistic labeled "D-B (null)" ("D-B" stands for design-based) uses proportions under the null hypothesis (i.e., the product of the marginals) in the Rao and Scott (1984) correction. The statistic labeled merely "Design-based" uses observed cell proportions. If null is not specified, only the correction that uses observed proportions is displayed.
wald requests a Wald test of whether observed weighted counts equal the product of the marginals. By default, an adjusted F-statistic is produced; an unadjusted statistic can be produced by specifying noadjust. The unadjusted F-statistic can yield extremely anti-conservative p-values (i.e., p-values that are too small) when the degrees of freedom of the variance estimates (the number of PSUs minus the number of strata) are small relative to the (R-1)*(C-1) degrees of freedom of the table (where R is the number of rows and C is the number of columns). Hence, the statistic produced by wald and noadjust should not be used for inference except when it is essentially identical to the adjusted statistic; it is only made available to duplicate the results of other software.
llwald requests a Wald test of the log-linear model of independence. Note that the statistic is not defined when there are one or more zero cells in the table. The adjusted statistic (the default) can produce anti-conservative p-values, especially for sparse tables, when the degrees of freedom of the variance estimates are small relative to the degrees of freedom of the table. Specifying noadjust yields a statistic with more severe problems. Neither the adjusted nor the unadjusted statistic is recommended for inference; the statistics are only made available for pedagogical purposes and to duplicate the results of other software.
noadjust modifies the wald and llwald options only. It requests that an unadjusted F-statistic be displayed in addition to the adjusted statistic.
Examples
. svrtab agegrp gender . svrtab, se ci deff [redisplay std. err., etc.] . svrtab, count column obs [redisplay counts, etc.]
. svrtab agegrp gender, count se [compute std. err. of counts] . svrtab, count ci [redisplay CI of counts]
. svrtab agegrp gender, wald [compute Wald test] . svrtab, pearson lr [redisplay pearson and lr tests]
. svrtab agegrp gender, count se fse(%4.2f) fcount(%4.0fc) [specify display formats]
. svrtab agegrp gender, tab(income) [gives income proportions by agegrp and gender]
. svrtab agegrp, count col [one-dimensional tabulation]
Saved Results
svrtab generates the same saved results as svytab.
Note that e(cmd) is set to "svytab" in order to allow post-tabulation tests to > function correctly.
Methods and formulae
See survwgt.
Acknowledgements
svrtab consists largely of the ado file code from official Stata's svytab command, version 1.1.6, modified to calculate (co)variances differently. I would like to thank Bobby Gutierrez at StataCorp for advice on implementation of BRR.
Author
Nick Winter Cornell University nw53@cornell.edu