Title
summout -- Comparative table of summary statistics
Syntax
summout [varlist] [if] [in] [using filename], by(groupvar) [more options]
by prefix is allowed; see [D] by.
Description
summout creates a table of summary statistics for the variables listed in varlist across the categories of the categorical variable groupvar. Output table can be copied and pasted into any worksheet using the "Copy Table" option of the right click menu. In addition, it can be exported in ASCII format to the file indicated in filename. This is useful for large tables or as part of .do files.
For every variable in varlist, summout first makes a normality test across the categories of groupvar. If the test concludes normality, then mean and standard deviation are calculated, and a hypothesis test is done using oneway. If the test finds a non-normal distribution, then median, interquartile range and a Kruskal-Wallis test are shown instead.
According to options, output table can have one or more of the following columns:
- Variable name or variable label - nonparam, which is 0 if the tested variable has normal distribution and 1 if otherwise - est: variable's estimate measure (mean or median) for specified category or total value - dis: variable's dispersion measure (sd, se or iqr) for specified category or total value - lb: variable's lower boundary of confidence interval for specified category or total value - ub: variable's upper boundary of confidence interval for specified category or total value - p value, indicating the p value of the corresponding hypothesis test
summout needs user-written command mat2txt, please check if installed.
Options
+------+ ----+ Main +-------------------------------------------------------------
by(groupvar) is required, it indicates the variable containing the categories for comparison. groupvar must have at least two values or an error message will be shown. Same error will be displayed if population selected with if or in arguments leaves groupvar with only one category.
ignore causes summout to assume all variables as normal (use this at your own risk).
se diplays standard error of the mean instead of standard deviation (for normally distributed variables only).
ci calculates and displays confidence intervals instead of a dispersion measure. Confidence intervals are calculated using ci or centile accordingly. Option ci overrides se.
level(#) sets confidence level; default is level(95).
dp(#) specifies number of decimal places for estimate and dispersion measures; default is dp(2).
nt() specifies normality test to be used: sk for Skewness and kurtosis test, ks for Kolmogorov-Smirnov test and sw for Shapiro-Wilk test. Default is nt(sw).
nolabel omits variable labels and value labels on the output table.
Examples
sysuse auto summout price mpg weight, by(foreign) summout price mpg weight, by(foreign) nt(sk) summout price mpg weight, by(foreign) nt(ks) summout price mpg weight, by(foreign) ignore summout price mpg weight, by(foreign) nolabel summout price mpg weight, by(foreign) se summout price mpg weight, by(foreign) dp(3) summout price mpg weight, by(foreign) ci level(99) summout price mpg weight using example.txt, by(rep78)
Acknowledgements
Special thanks to Ian Watson (see inspirational command tabout) for his valuable programming advice and to Zumin Shi for his constructive feedback.
Author
Andrés González Rangel MD, MSc Clinical Epidemiology Universidad Nacional de Colombia algonzalezr@unal.edu.co