Correlation tables for survey data
corr_svy varlist [weight] [if exp] [in range] [, strata(varname) psu(varname) fpc(varname) subpop(varname) pw obs sig print(#) star(#) ]
pweights are allowed; see help weights.
Warning: Use of if or in restrictions will not produce correct variance estimates for subpopulations in many cases. To compute estimates for subpopulations, use the subpop() option.
Description
corr_svy displays the correlation matrix for varlist. Optional significance levels are calculated, based on survey-based variance estimates for the correlations.
It allows any or all of the following: probability sampling weights, stratification, and clustering. The subpop() option will give estimates for a single subpopulation. For a general discussion of various aspects of survey designs, including multistage designs, see [U] 30 Overview of survey estimation.
To describe strata and PSUs of your data and to handle the error message "stratum with only one PSU detected", see help svydes.
Options
strata(), psu(), and fpc() are described in svyset; see help svyset.
subpop(varname) specifies that estimates be computed for the single subpopulation defined by the observations for which varname~=0. Typically, varname=1 defines the subpopulation and varname=0 indicates observations not belonging to the subpopulation. For observations whose subpopulation status is uncertain, varname should be set to missing.
obs requests that the number of observations for each correlation be displayed. This only makes sense in conjunction with the pw option, but can be specified regardless.
pw specifies that pairwise correlations be calculated and displayed.
sig requests that the significance level of the coefficients be displayed.
obs requests that the number of observations for each correlation be displayed. This only makes sense in conjunction with the pw option, but can be specified regardless.
star(#) specifies the significance level of coefficients to be starred. star(5) would star all coefficients significant at the 5% level or better.
print(#) specifies the significance level of correlation coefficients to be printed. Coefficients with larger significance levels are left blank. print(10) would list only coefficients significant at the 10% level or better.
Example
. svyset pweight leadwt . svyset strata stratid . svyset psu psuid
. corr_svy loglead age female region2-region4, obs sig
Saved Results
corr_svy saves in r() the following, about the final correlation calculated:
r(N) The number of observations r(p) The p-level r(rho) The estimated rho
Methods and formulae
Calculations are based on the methods explained by Bill Sribney in a post to the Statalist, and reproduced in this Stata FAQ: http://www.stata.com/support/faqs/stat/survey.html.
Point estimates are calculated by correlate, with aweights.
With simple random sampling, the p-value from a linear regression of Y on X (or X on Y) is exactly the same as a p-value for Pearson's correlation coefficient for a simple random sample under the assumption of normality of the population. With survey variance estimates, however, the p-value for the slope of the regression of Y on X is NOT the same as the p-value for the regression of X on Y, unlike the case for the OLS regression estimator. So, corr_svy obtains the p-values from both regressions and displays the conservative (i.e. larger) of the two.
Author
Nick Winter Cornell University nw53@cornell.edu