{smcl}
{* 14feb2005}{...}
{hline}
help for {hi:fsum}
{hline}

{title:Summary statistics} 

{p 8 16}{cmd:fsum} 
[{it:varlist}] 
[{cmd:weight} {it:fweight aweight}]  
[{cmd:if} {it:exp}]  
[{cmd:in} {it:range}] 
[ {cmd:,}
{cmdab:s:tats(}{it:n miss abspct mean vari sd se p1 p5 p10 p25 p50 median p75 p90 p95 p99  min max lci uci sum}{cmd:)}  
{cmdab:a:ddstats(}{it:optional statistic}{cmd:)} 
{cmdab:f:ormat(}{it:format}{cmd:)} 
{cmdab:p:ctvar(}{it:varlist}{cmd:)} 
{cmd:not(}{it:varlist}{cmd:)} 
{cmdab:com:plete} 
{cmdab:l:abel} 
{cmdab:v:arname} 
{cmdab:u:selabel} 
{cmdab:d:ecsum} 
{cmdab:cat:var(}{it:optional categorical variable}{cmd:)} 
{cmdab:mcat:var(}{it:optional categorical variable}{cmd:)} 
] 


{title:Description}

{p}{cmd:fsum} provides summary statistics, including N, # missing, percent 
missing, mean, variance, standard deviation, standard error, P1, P5, P10 P25, median, 
P75, P90 P95, P99, minimum, maximum, confidence intervals, sum, and percent for 0/1 
variables. It can also display categories of a variable in a manner similar 
to tabulate. {cmd:fsum} allows Stata labels and special user-defined labels. 
Display width is automatically adjusted for variable and label length. Column 
width is adjustable by format, and variable labels are available optionally. 

{title:Remarks}

{p}{cmd:fsum} is a general purpose summary statistic program. Its display can 
be formatted with appropriate labels and variable formats so that its output can
be pasted into a word processor without the need for further alterations within 
the word processor.

{p}{cmd:fsum} makes use of 3 potential "labels" to identify each variable. 
Stata abbreviation of varnames can lead to names that are not suitable
for presentation tables or, in some instances, many not be easily readable. 
Substitution of variable labels for varnames many not be a complete solution, 
for variables labels may be informative yet not suitable for publication tables.

{p}{cmd:fsum} provides a facility to use 2 labels: the Stata variable label and
a user-defined label. The user-defined label is actually a variable 
characteristic (char) in the format of {cmd:char} varname[tlabel] description-. 
See below for a description of entering user-defined labels.

{p}{cmd:fsum} will use as its default use the user-defined label if it exists. 
If it does not exist, {cmd:fsum} will use "varname". However, if the -uselabel- 
option is specified, {cmd:fsum} will use the Stata variable label before using 
varname, but it will do this after first looking for the user-defined label.

{p}By default, {cmd:fsum} reports N, mean, standard deviation, minimum and 
maximum, essentially mimicking -summarize-. However, many additional statistics are 
optionally available, including N, # missing, percent missing, mean, standard 
deviation, standard error, variance, median, p1, p5, p10, p25, p75, p90. p95, p99, 
minimum, maximum, confidence intervals, sum, and percent for 0/1 variables.

{p} Percent calculation and reporting is done automatically if the variable for 
percent calculation is either 1) entered in the {cmd:pctvar()} option or 
2) a "%" sign is found in the user specified label. The ability to calculate 
percentages in this way can save a great deal of time in the creation of 
presentation tables, as non-related statistics are left blank.

{p} If the varlist variable is also entered in {cmd:catvar()} or {cmd:mcatvar()} 
a tabulation of categories for the variable will be performed. If a "%" sign or 
the word "code" is found in the user specified label, only N, and mean (expressed 
as a percentage) will be displayed (miss and abspct can also be optionally 
displayed). Otherwise the full range of selected statistics will be displayed.

{p} If the {cmd:label} option is specified, {cmd:fsum} will display the Stata 
variable labels at the right of the screen. If the {cmd:varname} option is 
specified, {cmd:fsum} will display the Stata varnames at the right side of 
the screen, Both options cannot be selected at the same time.

{p}The default variable format is %9.2f. This results in a compact output. If 
the variable width exceeds this format, the result will be automatically 
displayed in exponential form. However, changing the format with the {cmd:format()} 
option, will automatically re-space the output and can allow results with greater
output widths to be displayed. The format can be entered as f(w.d) or f(%w.ds).

{p}If the {cmd:sum} statistic is selected it is automatically formatted as %n.0f. 
However, this can optionally be changed to display the general default format/

{title:Options}

{p}{cmd:stats(statistic name)} allows individual specification of requested 
statistics. Default is to display N, mean, standard deviation, minimum and 
maximum. Allowed stats are N mean vari sd p1, p5, p10, p25, (p50), median, p75, p90, p95, 
p99,,se, min, max, uci, lci, sum, miss, abspct. abspct is percent missing.

{p}{cmd:addstats(statistic name)} allows individual statistics to be added to
the default so as to avoid having to type all stats() when only one or a 
few additional statistics are needed. If you request p10 you should also request p1.

{p}Variable names entered in {cmd:pctvar(varlist)} will be treated as 0/1
categorical variables, and results will be presented as a percent. Variables will 
also be treated as a percent if the the % sign is in the user-defined label.

{p}{cmd:complete} indicates that observations with missing values for the varlist
will be excluded.

{p}{cmd:label} requests that the Stata variables be displayed at the right
of the table.

{p}{cmd:not} requests that the varnames be excluded. This is useful in handling variables 
that differ by stubs or suffixes.

{p}{cmd:varname} requests that the varnames be displayed at the right
of the table.

{p}{cmd:uselabel} requests that the Stata variable labels be used for the 
"variable name" if the user-defined label is not present.

{p}{cmd:catvar} requests that a tabulation of categories for varname be 
performed. If a "%" sign or the word "code" is found in the user specified 
label, only N, miss, abspct, and mean will be displayed. Otherwise the full 
range of selected statistics will be displayed.

{p}{cmd:mcatvar} acts similarly to {cmd:catvar} except that it display missing 
observations as a separate category.

{p}The {cmd:format} option allows any formatting desired. The default format is 
%9.2f.

{p}{cmd:decsum} requests that the normal format be applied to the {cmd:sum} statistic. The 
default if to apply %n.0f.


{title:User-defined labels}

{p}User-defined labels provide the opportunity to make word processor ready 
tables. In addition, they can trigger identification of a variable as one for
which percent should be calculated if the % sign is part of the label. User 
defined labels are actually variable characteristics in the form of 
{cmd:char varname[tlabel] description}. See help for {help char}. 
Characteristics (labels) are saved with the data set.
They can be entered from the keyboard with the {cmd:char} command. Since such 
labels will probably be used repeatedly, they can be entered in a do file or 
program and called when needed. An example of do file commands is shown directly 
below"

{p 4 8}{inp:char haq_disa[tlabel] "HAQ (0-3)"} {p_end}
{p 4 8}{inp:char sex[tlabel] "Sex (% male)"} {p_end}
{p 4 8}{inp:char age[tlabel] "Age (years)"} {p_end}
{p 4 8}{inp:char ethorig[tlabel] "Ethnic origin (code)"} {p_end}

{p}As an aid, the program {cmd:nlabel} is provided. This program provides a simple way to create a series of 
labels at the same time.


{title:Examples}

{p 4 8}{inp:. fsum} {p_end}
{p 4 8}{inp:. fsum age sex income haq, f(10.3) s(n abspct mean median p95 sum)} {p_end}
{p 4 8}{inp:. fsum age sex esr pcs, s(N mean median lci uci sum), l u} {p_end}
{p 4 8}{inp:. fsum age sex ethorig pcs,mcat(ethorig) cat(sex)} {p_end}
{p 4 8}{inp:. fsum t*, not(totinc) f(%9.1f)} {p_end}

{title:Acknowledgements}

{p}Nick Cox made helpful suggestions that improved the program.

{title:Author}

    Fred Wolfe, National Data Bank for Rheumatic Diseases, Wichita, KS  
    fwolfe@arthritis-research.org


{title:Also see}

{p 0 19}On-line:  help for {help summarize}, {help tabstat}, {help univar} if installed, 
{help nlabel} if installed. 
{p_end}