```
help gsum
-------------------------------------------------------------------------------

Title

gsum --      Summary statistics for grouped data

Syntax
gsum varlist [if] [in] [weight] , [options] [group definitions]

-----------------------------------------------------------------------------

Specifiying Group Ranges

gsum accepts variables with codes from 0 to 25 (integers only).

Elements of group definitions can be g0(#-#), g1(#-#)...g#(#-#) where g#
identifies the group number and (#-#) identifies a numeric range.

If, however, varlist has each category labeled in the format of #-#, gsum can
simply use these values.

If you do not specify group definitions, gsum will look for labels.

If you do specify group definitions, gsum will ignore the labels.

options                     Description
-------------------------------------------------------------------------

quantiles(q q ...)         the set of quantiles to be calculated, the
default set is 0.25, 0.50, and 0.75.

gen(newvarlist)            create new variable called newvarlist
containing the midpoints.

table                      display the value table.

save(filename)             save the value table to filename.

-------------------------------------------------------------------------

Description

gsum calculates summary statistics for an ordinal variable where each
category represents a range of a conceptually continuous variable. gsum
provides the weighted N, the mean, the standard deviation, and quantiles
0.25, 0.50 (the median), and 0.75 (you can specify any set of quantiles
you want).  Each quantile is available as both the midpoint of the
category in which the quantile falls, or as a linear interpolation of
that quantile based on methods presented by Blalock (1979).

gsum can also produce a value table (which can also be saved) listing
each category, the range, the midpoint of that range, the number of
cases, the weight of each case, and the cumulative distribution function
(CDF).

For an extra tool, gsum can also create a new variable that contains the
midpoints.

gsum accepts any type of [weight] and is byable.

For example, you may have a variable age_cat where 1 represents 18-24
years of age, 2 represents 25-44 years of age, and 3 represents 45-100
years of age.  You can use gsum to calculate summary statistics such as
the mean, median, and standard deviation.

Examples

Use the 2010 GSS data on age

. use gssage.dta, clear

If the variable age_cat is labled correctly,

. gsum age_cat

Or, if you are not sure,

. gsum age_cat, g1(18-24) g2(25-44) g3(45-100)

To use weights,

. gsum age_cat [pweight = wtssall]

To see the value table,

. gsum age_cat, table

To save the value table in the file valuetable.dta,

. gsum age_cat, save(valuetable.dta)

To create the variable midpoint_age_cat,

. gsum age_cat, gen(midpoint_age_cat)

You can also enter in data from a frequency table.  For example, there is
a table in Blalock (1979) that shows the frequency of cases for different
income ranges:

Income Range  Frequency
-----------------------
1950-2950        17
2950-3950        26
3950-4950        38
4950-5950        51
5950-6950        36
6950-7950        21
-----------------------
Total           189

You can input this table into Stata as a categorical variable and
frequencies:

. clear
. input y f
1. 1 17
2. 2 26
3. 3 38
4. 4 51
5. 5 36
6. 6 21
7. end

You can then label the categories

. label def money 1 "1950-2950" 2 "2950-3950" 3 "3950-4950" 4 "4950-5950"
5 "5950-6950" 6 "6950-7950"
. label val y money

Then use frequency weights

. gsum y [fweight = f], table quantiles(0.50)

Saved results

gsum saves the following in r():

Scalars
r(N)                the number of observations
r(sum_W)            the sum of the weights
r(mean)             the mean
r(var)              the variance
r(sd)               the standard deviation
r(mn)               the minimum
r(mx)               the maximum
r(qiq)              the q quantile using the interpolation method
r(qmq)              the q quantile using the midpoint method

Acknowledgments

The algorithms used in this program are based on
Blalock, H.M. 1979. Social Statistics.  2nd Ed. McGraw-Hill: New York

Contact

This program was written by Eric Hedberg, National Opinion Research
Center at the University of Chicago.  Any questions or comments can be
directed to ech@uchicago.edu.

```