------------------------------------------------------------------------------- help forgroups-------------------------------------------------------------------------------

List group frequencies

groupsvarlist[weight] [ifexp] [inrange] [,fillinformat(format)geltmissingorder({high|low})select({condition|#})show(frequencies_to_show)showhead(header_text_for_frequencies)reverselist_optionssaving(filename[,save_options])]

by...:may be used withgroups; see help by. Note in particular that this is the key to controlling how percents are calculated; that is, underbypercents sum to 100 within distinct categories defined by itsvarlist.

fweights andaweights are allowed; see help weights.

Description

groupslists the distinct groups ofvarlistoccurring in the dataset and their frequencies.groupsis perhaps most useful with categorical variables, but has other uses. Groups are by default presented in the sort order ofvarlist. Note that there is no limit on the number of variables invarlist.Frequencies are counts or other measures of abundance. Percents are percents of each total frequency. Cumulative frequencies and percents are cumulated in the order of groups and show frequency (percent) in each group and all earlier groups in the listing, unless the

ltoption is specified. Reverse cumulative frequencies and percents show frequency (percent) in all later groups in the listing, unless thegeoption is specified. "Valid" percents are calculated relative to all pertinent non-missing values.

Options

fillinspecifies that groups (i.e. cross-combinations) ofvarlistwhich do not occur in the data are shown explicitly as having zero frequency. This has no effect with a single variable. Note that this option can bite hard as the number of cross-combinations can explode combinatorially.

format()specifies a numeric format for percent and cumulative percent frequencies. The default is %6.2f.

gespecifies that reverse frequencies and percents are to be calculated for the current and all later groups, that is, they are for values greater than or equal to each value.

ltspecifies that cumulative frequencies and percents are to be calculated for only the previous groups, that is, they are for values less than each value.

missingspecifies that observations with missing values on any of the variables invarlistare to be included in the listing. By default they are omitted. Note that "valid" percents will be the same as other percents unless themissingoption is specified.

order()specifies that groups should be listed in order of their frequencies. Ordering may behigh(highest frequencies first) orlow(lowest frequencies first).highandlowmay be abbreviated, down to as little ashorlrespectively.

select()specifies that only selected groups are to be listed. There are two syntaxes.In the first syntax, selection is according to a condition imposed on the frequencies, or on the percents, or on the cumulative frequencies, or on the cumulative percents, or on the reverse cumulatives. The syntax is exemplified by

select(freq == 1)select(percent > 5)select(Percent < 50)The element

freq,percent,Freq,Percent,RFreq,RPercent,vpercent,Vpercentorrvpercentmay be abbreviated down to unambiguous abbreviations. Note that case matters in distinguishingfreqandFreq,percentandPercent, andvpercentandVpercent. What follows must complete a simple true-or-false condition in Stata syntax, typically an inequality or equality.In the second syntax, a positive or negative integer is specified. A positive integer specifies that only the

first #groups are to be shown. A negative integer specifies that only thelast|#| groups are to be shown. First and last are determined with respect to the listing which would otherwise have been given. Thus withorder(h),select(5)shows the 5 groups with the 5 highest frequencies, whileselect(-5)shows the 5 groups with the 5 lowest frequencies, ties being broken according to the sort order ofvarlist. Similarly, withorder(l)the opposite is true. Withoutorder(),select(5)shows the first 5 groups ofvarlistandselect(-5)shows the last 5 groups ofvarlist. The most obviously useful example is whenvarlistconsists of a single variable, so that the listing is of the 5 lowest (highest) groups of values of that variable.

show()specifies which frequencies should be shown. By default, frequencies, percents and cumulative percents are shown with one variable, and frequencies and percents are shown with two or more variables, in that order.show()may be used to specify one or two or three of those, and/or cumulative frequencies, and/or reverse cumulative frequencies or reverse cumulative percents, and/or equivalent percents for "valid" values, or to change the order of presentation. The elementsfreq,percent,Freq,Percent,RFreq,vpercent,Vpercentandrvpercentmay be abbreviated, down to unambiguous abbreviations. Note that case matters in distinguishingfreqandFreq,percentandPercent, orvpercentandVpercent. Exceptionally,nonemay be used to specify that none of these should be shown. For example, withselect(f == 1)the frequencies would all be 1, and thus unnecessary, while the percents and cumulative percents may not be of interest, soshow(none)may be desired.

showhead()specifies alternative text for the header explaining frequency variables. There should be as many elements as the number of frequency, percent, cumulative frequency, cumulative percent, reverse cumulative frequency, reverse cumulative percent and valid percent variables listed and they should occur in the same order as those variables are listed. Text containing spaces should be bound in" ". Thus withshow(f RF),showhead(# "# bigger")specifies thatfrequencies are indicated by"#"and thereverse cumulativefrequencies are indicated by"# bigger".

reversereverses what would otherwise be displayed, last values first.

list_optionsare options of list. These offer several ways of changing the appearance of the listing. Note thatsumby itself produces sums only of frequencies and percents, where shown.

saving()specifies that the results listed will be saved to a named Stata .dta file using save. That does not include any sums, means or similar summaries. Options ofsavemay be specified in the usual way. This option may not be combined withby:.

Examples

. groups foreign. groups foreign rep78. groups foreign rep78, fillin. groups foreign rep78, fillin saving(mytable, replace)

. bysort foreign: groups rep78, ord(h) N

. groups mpg, sel(f == 1) show(none). groups mpg, sel(5). groups mpg, sel(-5). groups mpg, sel(5) ord(h)

. groups foreign rep78, fill sel(f == 0) show(none)

. groups foreign rep78, sepby(foreign). groups foreign rep78, sepby(foreign) showhead(# %)

. groups rep78, missing show(freq percent vpercent) sep(0)

. groups rep78, show(freq rfreq Rpercent) ge. groups rep78, show(F f Rf) lt showhead(< = >)

. groups mpg, reverse. groups mpg, reverse show(f p RP) ge

AuthorNicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

AcknowledgmentsFred Wolfe made very helpful comments. He, Roger Harbord and Eric Zbinden all found a bug. A question from Stefan Gawrich led to the

geoption. A question from James Keeler led to thereverseoption. A question from William Parry led to thesaving()option.

Also seeOnline: help for tabulate, table, list; duplicates, contract, modes (if installed), fre (Ben Jann; if installed)