Correlation analysis
corrtab [varlist] [weight fweight aweight] [if exp] [in range] [ , obs sig bonferroni sidak cwdeletion vsort(varname numeric) vars(#) above(#) print(#) sort spearman tlabel clabel alllabel format ]
Description
corrtab displays Pearson or Spearman rank correlations for varlist. The default calculation of individual correlation coefficients is made independently and thus the display contains the pairwise coefficients. Optionally, casewise deletion can be requested. The sample n and a test of independence are also reported optionally. String variables are automatically omitted from the analytic processing. Multiple, duplicative varlist designations can be selected to enable full capture of intended variables; duplicate variables specified in varlist are removed before processing.
Remarks
corrtab provides a rapid display of correlations formatted for easy reading and for copying to reports and manuscripts. corrtab is meant for use when the number of column variables is 8 or fewer, although it could display many more column variables depending on font and linesize. The user should experiment. The number of column variables that will be displayed properly depends also on the length of the labels in column 1.
corrtab optionally makes use of advanced labeling systems to provide clear and useful display suitable for the screen and for word-processors (see below).
Options
spearman specifies Spearman correlations. The default is to calculate Pearson correlations.
obs adds a line to each row of the display reporting the number of observations used in calculating the correlation coefficient.
sig adds a line to each row of the display reporting the significance level of each correlation coefficient.
print(#) specifies the significance level for printing of correlation coefficients. Coefficients with significance levels larger than # are left blank. print(10) or print(.1) would list only coefficients significant at the 10% level or better.
bonferroni makes the Bonferroni adjustment to calculated significance levels. This affects printed significance levels and the print() option. corrtab, print(.05) bonferroni prints coefficients with Bonferroni-adjusted significance levels of .05 or less.
sidak makes the Sidak adjustment to calculated significance levels. This affects printed significance levels and the print() option. corrtab, print(.05) sidak prints coefficients with Sidak-adjusted significance levels of .05 or less. {p}vars(#) specifies that the first # variables on the varlist are to be correlated with all of the variables on the varlist. This produces # columns of correlations. There is no limit to the number of variables specified, but a difficult to read display occurs when the number of variables exceeds the width of the screen. Not specifying vars() results in all variables being displayed. {p}sort requests the varlist be reported in sorted order. If vars() is specified the first # variables will not be sorted. {p}above(#) specifies the minimum absolute level of correlation coefficients to be printed. Coefficients with smaller coefficients are left blank. above(.5) would list only coefficients of 0.5 or greater or -0.5 or less. {p}cwdeletion removes observations with missing values in the varlist from the calculations. {p}vsort() sorts the correlation coefficient in descending order according to a selected variable in the column list. This option works only when obs and/or sig are not used. That is, it works for the simple display of coefficients only. {p}tlabel makes use of the tlabel system (if used) to provide detailed labels for column 1 (see below). {p}clabel places labels in the column names using char varname[varname], according to list's subvarname option (see below). {p}alllabel places labels in columns and rows using char varname[varname], according to list's subvarname option (see below). {p}format Default is %9.3f. Increase both f and d (%.f.df) to handle large number of observations and/or increased decimal format. User-defined labels {p}By default, corrtab uses variable names for column and row labels. However, variable names are not always appropriate or appropriately formatted. Specific labels for correlation display create several problems. The primary problem is that column labels must be short enough that they don't waste display space. Row labels can be longer and provide more information. User-defined labels provide the opportunity to make word-processor-ready tables as well as correlation tables that are easy to read and work with. {p} There are two systems available. The first (tlabel) was first used in the program fsum (see fsum if installed). tlabel user-defined labels are actually variable characteristics in the form of char varname[tlabel] description. See help for char. Characteristics (labels) are saved with the data set. They can be entered from the keyboard with the char command. Since such labels will probably be used repeatedly, they can be entered in a do file or program and called when needed. An example of do file commands is shown directly below: {p 4 8}. char haq_disa[tlabel] "HAQ (0-3)" {p_end} {p 4 8}. char sex[tlabel] "Sex (% male)" {p_end} {p 4 8}. char age[tlabel] "Age (years)" {p_end} {p 4 8}. char ethorig[tlabel] "Ethnic origin (code)" {p_end} {p}As an aid, the programs tlabel and tlablist are provided. {p}The second system uses clabel. In Stata 8, an option was provided to the list command list to make use of char varname[varname] to label columns. corrtab makes use of this option, as well. Examples of labels altered for the shorter clabel system are: {p 4 8}. char haq_disa[varname] HAQ {p_end} {p 4 8}. char sex[varname] Sex {p_end} {p 4 8}. char age[varname] Age {p_end} {p 4 8}. char ethorig[varname] Ethnicity {p_end} {p}The dual labeling system is optional. Its main value is in the circumstance where the same variables and labels are used repeatedly. In this instance it saves time and improves screen and word-processor formatting and readability. Examples {p 4 8}. corrtab {p_end} {p 4 8}. corrtab price weight mpg displ {p_end} {p 4 8}. corrtab price weight mpg displ, sig var(2) sort {p_end} {p 4 8}. corrtab price weight mpg displ, sig obs var(2) above(0.5) sp cwd sort {p_end} {p 4 8}. corrtab price weight mpg displ, sig obs vsort(price) {p_end} {p 4 8}. corrtab mpg re* p* *igh*,sig bon tlabel clabel {p_end} {p 4 8}. corrtab price weight mpg displ,all {p_end} {p 4 8}. corrtab price weight mpg displ, t c {p_end} {p 4 8}. corrtab haq pain glb fatigue age totin,v(3) t c vsort(haq) {p_end} Pearson correlations +---------------------------------------------------------+ | Variable HAQ Pain Global | |---------------------------------------------------------| | HAQ (0-3) 1.000 0.598 0.588 | | Pain (0-10) 0.598 1.000 0.665 | | Global severity (0-10) 0.588 0.665 1.000 | | Fatigue (0-10) 0.527 0.608 0.604 | | Total Income (US dollars) -0.337 -0.223 -0.249 | | Age (years) 0.131 -0.036 0.024 | +---------------------------------------------------------+ Acknowledgements {p}corrtab is a Stata 8 program that is an upgrade from the Stata 5 version of pwcorrs. {p}Nick Cox made very helpful suggestions. Author Fred Wolfe, National Data Bank for Rheumatic Diseases, Wichita, KS fwolfe@arthritis-research.org Also see {p 0 19}On-line: help for pwcorr, corr,