Make dataset of means, medians, and other summary statistics.
xcollapse clist [weight] [if exp] [in range] [, list( [varlist] [if exp] [in range] [ , [list_options] ] ) saving(filename[,replace]) norestore fast flist(global_macro_name) cw by(varlist) idnum(#) nidnum(newvarname) idstr(string) nidstr(newvarname) format(varlist_1 format_1 ... varlist_n format_n) float ]
where clist is a list of statistics and variables, defined as in the online help for collapse.
aweights, fweights, pweights, and iweights are allowed. See help for collapse for details of how these are handled.
Description
xcollapse is an extended version of collapse. It creates an output dataset of means, sums, medians, and other summary statistics. This output dataset may be listed to the Stata log, or saved to a disk file, or written to the memory (overwriting any pre-existing dataset).
Options for use with xcollapse
The options are listed in the following 2 groups:
1. Output-destination options. (These specify where the output dataset will be written.)
2. Other options. (These specify what the output dataset will contain.)
Output-destination options
list(varlist [if exp] [in range] [, list_options ] ) specifies a list of variables in the output dataset, which will be listed to the Stata log by xcollapse. The list() option can be used with the format() option (see below) to produce a list of summary statistics with user-specified numbers of decimal places or significant figures. The user may optionally also specify if or in qualifiers to list subsets of combinations of variable values, or change the display style using a list of list_options allowed as options by the list command.
saving(filename[,replace]) saves the output dataset to a disk file. If replace is specified, and a file of that name already exists, then the old file is overwritten.
norestore specifies that the output dataset will be written to the memory, overwriting any pre-existing dataset. This option is automatically set if fast is specified. Otherwise, if norestore is not specified, then the pre-existing dataset is restored in the memory after the execution of xcollapse.
fast is a stronger version of norestore, intended for use by programmers. It specifies that the pre-existing dataset in the memory will not be restored, even if the user presses Break during the execution of xcollapse. If norestore is specified and fast is absent, then xcollapse will go to extra work so that it can restore the original data if the user presses Break.
Note that the user must specify at least one of the four options list(), saving(), norestore and fast. These four options specify whether the output dataset is listed to the Stata log, saved to a disk file, or written to the memory (overwriting any pre-existing dataset). More than one of these options can be specified.
flist(global_macro_name) specifies the name of a global macro, containing a filename list (possibly empty). If saving() is also specified, then xcollapse will append the name of the dataset specified in the saving() option to the value of the global macro specified in flist(). This enables the user to build a list of filenames in a global macro, containing the output of a sequence of output datasets. These files may later be concatenated using append, or using dsconcat (downloadable from SSC) if installed.
Other options
cw specifies casewise deletion. If not specified, all observations possible are used for each calculated statistic.
by(varlist) specifies the groups over which the summary statistics are to be calculated. If not specified, the resulting dataset will contain one observation. If specified, varlist may refer to either string or numeric variables. Note that, if the if expression or the weight expression contains the reserved names _n and _N, then these will be interpreted as the observation sequence number and the number of observations, respectively, within the whole dataset, not within the by-group.
idnum(#) specifies an ID number for the output dataset. It is used to create a numeric variable, with default name idnum, in the output dataset, with that value for all observations. This is useful if the output dataset is concatenated with other xcollapse output datasets using append, or using dsconcat if installed.
nidnum(newvarname) specifies a name for the numeric ID variable evaluated by idnum(). If idnum() is present and nidnum() is absent, then the name of the numeric ID variable is set to idnum.
idstr(string) specifies an ID string for the output dataset. It is used to create a string variable, with default name idstr in the output dataset, with that value for all observations. This is useful if the output dataset is concatenated with other xcollapse output datasets using append, or using dsconcat if installed.
nidstr(newvarname) specifies a name for the string ID variable evaluated by idstr(). If idstr() is present and nidstr() is absent, then the name of the string ID variable is set to idstr.
format(varlist_1 format_1 ... varlist_n format_n) specifies a list of pairs of variable lists and display formats. The formats will be allocated to the variables in the output dataset specified by the corresponding varlist_i lists. If the format() option is absent, then the percent variables have the format %8.2f, the frequency variables have the format %12.0g, and the other variables have the same formats as the variables of the same names in the input dataset.
float specifies that numeric output variables in the output dataset, specified by the clist, will not have storage type double, but will be recast to storage type float, even if this causes loss of precision. Whether or not float is specified, numeric output variables in the output dataset, specified by the clist, will be compressed to the lowest storage type possible without loss of precision.
Examples
The following examples use the list() option to list the output dataset to the Stata log. After these examples are executed, there is no new dataset either in the memory or on disk.
. xcollapse mpg weight price, list(,)
. xcollapse (median) mpg weight price, list(,)
. xcollapse mpg weight price, by(foreign rep78) list(,clean)
. xcollapse (mean) mpg weight price, by(foreign rep78) format(mpg weight price %8.2f) list(*, sepby(foreign))
. xcollapse (count) nmpg=mpg nweight=weight nprice=price (median) medmpg=mpg medweight=weight medprice=price, by(foreign rep78) format(med* %8.2f) list(foreign rep78 *mpg *weight *price, sepby(foreign) abbrev(16))
The following examples use the norestore option to create an output dataset in the memory, overwriting any pre-existing dataset.
. xcollapse mpg weight price, norestore
. xcollapse (median) mpg weight price, norestore
. xcollapse mpg weight price, by(foreign rep78) norestore
. xcollapse (mean) mpg weight price, by(foreign rep78) format(mpg weight price %8.2f) norestore
. xcollapse (count) nmpg=mpg nweight=weight nprice=price (median) medmpg=mpg medweight=weight medprice=price, by(foreign rep78) format(med* %8.2f) norestore
The following examples use the saving() option to create an output dataset in a disk file.
. xcollapse mpg weight price, saving(mysumm1)
. xcollapse (median) mpg weight price, saving(mysumm2,replace)
. xcollapse mpg weight price, by(foreign rep78) saving(mysumm3,replace)
. xcollapse (mean) mpg weight price, by(foreign rep78) format(mpg weight price %8.2f) saving(mysumm4,replace)
. xcollapse (count) nmpg=mpg nweight=weight nprice=price (median) medmpg=mpg medweight=weight medprice=price, by(foreign rep78) format(med* %8.2f) saving(mysumm5,replace)
Author
Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk
Also see
Manual: [R] collapse, [R] contract
Online: help for collapse, contract, egen, statsby, summarize, tabdisp, table help for xcontract, dsconcat if installed