Summary statistics
fsum [varlist] [weight fweight aweight] [if exp] [in range] [ , stats(n miss abspct mean vari sd se p1 p5 p25 p50 median p75 p95 p99 min max lci uci sum) addstats(optional statistic) format(format) pctvar(varlist) not(varlist) complete label varname uselabel decsum catvar(optional categorical variable) mcatvar(optional categorical variable) ]
Description
fsum provides summary statistics, including N, # missing, percent missing, mean, variance, standard deviation, standard error, P1, P5, P25, median, P75, P95, P99, minimum, maximum, confidence intervals, sum, and percent for 0/1 variables. It can also display categories of a variable in a manner similar to tabulate. fsum allows Stata labels and special user-defined labels. Display width is automatically adjusted for variable and label length. Column width is adjustable by format, and variable labels are available optionally.
Remarks
fsum is a general purpose summary statistic program. Its display can be formatted with appropriate labels and variable formats so that its output can be pasted into a word processor without the need for further alterations within the word processor.
fsum makes use of 3 potential "labels" to identify each variable. Stata abbreviation of varnames can lead to names that are not suitable for presentation tables or, in some instances, many not be easily readable. Substitution of variable labels for varnames many not be a complete solution, for variables labels may be informative yet not suitable for publication tables.
fsum provides a facility to use 2 labels: the Stata variable label and a user-defined label. The user-defined label is actually a variable characteristic (char) in the format of char varname[tlabel] description-. See below for a description of entering user-defined labels.
fsum will use as its default use the user-defined label if it exists. If it does not exist, fsum will use "varname". However, if the -uselabel- option is specified, fsum will use the Stata variable label before using varname, but it will do this after first looking for the user-defined label.
By default, fsum reports N, mean, standard deviation, minimum and maximum, essentially mimicking -summarize-. However, many additional statistics are optionally available, including N, # missing, percent missing, mean, standard deviation, standard error, variance, median, p1, p5, p25, p75, p95, p99, minimum, maximum, confidence intervals, sum, and percent for 0/1 variables.
Percent calculation and reporting is done automatically if the variable for percent calculation is either 1) entered in the pctvar() option or 2) a "%" sign is found in the user specified label. The ability to calculate percentages in this way can save a great deal of time in the creation of presentation tables, as non-related statistics are left blank.
If the varlist variable is also entered in catvar() or mcatvar() a tabulation of categories for the variable will be performed. If a "%" sign or the word "code" is found in the user specified label, only N, and mean (expressed as a percentage) will be displayed (miss and abspct can also be optionally displayed). Otherwise the full range of selected statistics will be displayed.
If the label option is specified, fsum will display the Stata variable labels at the right of the screen. If the varname option is specified, fsum will display the Stata varnames at the right side of the screen, Both options cannot be selected at the same time.
The default variable format is %9.2f. This results in a compact output. If the variable width exceeds this format, the result will be automatically displayed in exponential form. However, changing the format with the format() option, will automatically re-space the output and can allow results with greater output widths to be displayed. The format can be entered as f(w.d) or f(%w.ds).
If the sum statistic is selected it is automatically formatted as %n.0f. However, this can optionally be changed to display the general default format/
Options
stats(statistic name) allows individual specification of requested statistics. Default is to display N, mean, standard deviation, minimum and maximum. Allowed stats are N mean vari sd p1, p5, p25, (p50), median, p75, p95, p99,,se, min, max, uci, lci, sum, miss, abspct. abspct is percent missing.
addstats(statistic name) allows individual statistics to be added to the default so as to avoid having to type all stats() when only one or a few additional statistics are needed.
Variable names entered in pctvar(varlist) will be treated as 0/1 categorical variables, and results will be presented as a percent. Variables will also be treated as a percent if the the % sign is in the user-defined label.
complete indicates that observations with missing values for the varlist will be excluded.
label requests that the Stata variables be displayed at the right of the table.
not requests that the varnames be excluded. This is useful in handling variables that differ by stubs or suffixes.
varname requests that the varnames be displayed at the right of the table.
uselabel requests that the Stata variable labels be used for the "variable name" if the user-defined label is not present.
catvar requests that a tabulation of categories for varname be performed. If a "%" sign or the word "code" is found in the user specified label, only N, miss, abspct, and mean will be displayed. Otherwise the full range of selected statistics will be displayed.
mcatvar acts similarly to catvar except that it display missing observations as a separate category.
The format option allows any formatting desired. The default format is %9.2f.
decsum requests that the normal format be applied to the sum statistic. The default if to apply %n.0f.
User-defined labels
User-defined labels provide the opportunity to make word processor ready tables. In addition, they can trigger identification of a variable as one for which percent should be calculated if the % sign is part of the label. User defined labels are actually variable characteristics in the form of char varname[tlabel] description. See help for char. Characteristics (labels) are saved with the data set. They can be entered from the keyboard with the char command. Since such labels will probably be used repeatedly, they can be entered in a do file or program and called when needed. An example of do file commands is shown directly below"
char haq_disa[tlabel] "HAQ (0-3)" char sex[tlabel] "Sex (% male)" char age[tlabel] "Age (years)" char ethorig[tlabel] "Ethnic origin (code)"
As an aid, the program nlabel is provided. This program provides a simple way to create a series of labels at the same time.
Examples
. fsum . fsum age sex income haq, f(10.3) s(n abspct mean median p95 sum . fsum age sex esr pcs, s(N mean median lci uci sum), l u . fsum age sex ethorig pcs,mcat(ethorig) cat(sex) . fsum t*, not(totinc) f(%9.1f)
Acknowledgements
Nick Cox made helpful suggestions that improved the program.
Author
Fred Wolfe, National Data Bank for Rheumatic Diseases, Wichita, KS fwolfe@arthritis-research.org
Also see
On-line: help for summarize, tabstat, univar if installed, nlabel if installed.