-------------------------------------------------------------------------------
help for xcontract                                               (Roger Newson)
-------------------------------------------------------------------------------

Create dataset of variable combinations with frequencies and percents

xcontract varlist [weight] [if exp] [in range] [, list( [varlist] [if exp] [in range] [ , [list_options] ] ) saving(filename[,replace]) norestore fast flist(global_macro_name) freq(newvarname) percent(newvarname) cfreq(newvarname) cpercent(newvarname) ptype(storage_type) by(by_varlist) idnum(#) nidnum(newvarname) idstr(string) nidstr(newvarname) format(varlist_1 format_1 ... varlist_n format_n) zero nomiss ]

fweights are allowed; see help for weights.

Description

xcontract is an extended version of contract. It creates an output data set with 1 observation per combination of values of the variables in varlist and data on the frequencies and percents of those combinations of values in the existing data set, and, optionally, the cumulative frequencies and percents of those combinations. If the by() option is used, then the output data set has one observation per combination of values of the varlist variables per by-group, and percents are calculated within each by-group. The output data set created by xcontract may be listed to the Stata log, or saved to a disk file, or written to the memory (overwriting any pre-existing data set).

Options for use with xcontract

xcontract has a large number of options, which are listed in 3 groups:

1. Output-destination options. (These specify where the output data set will be written.)

2. Output-variable options. (These specify the variables in the output data set.)

3. Other options. (These specify the observations in the output data set.)

Output-destination options

list(varlist [if exp] [in range] [, list_options ] ) specifies a list of variables in the output data set, which will be listed to the Stata log by xcontract. The list() option can be used with the format() option (see below) to produce a list of frequencies and/or percents with user-specified numbers of decimal places or significant figures. The user may optionally also specify if or in qualifiers to list subsets of combinations of variable values, or change the display style using a list of list_options allowed as options by the list command. If by(by_varlist) is used, then the combinations are listed by the by-groups defined by by_varlist.

saving(filename[,replace]) saves the output data set to a disk file. If replace is specified, and a file of that name already exists, then the old file is overwritten.

norestore specifies that the output data set will be written to the memory, overwriting any pre-existing data set. This option is automatically set if fast is specified. Otherwise, if norestore is not specified, then the pre-existing data set is restored in the memory after the execution of xcontract.

fast is a stronger version of norestore, intended for use by programmers. It specifies that the pre-existing data set in the memory will not be restored, even if the user presses Break during the execution of xcontract. If norestore is specified and fast is absent, then xcontract will go to extra work so that it can restore the original data if the user presses Break.

Note that the user must specify at least one of the four options list(), saving(), norestore and fast. These four options specify whether the output data set is listed to the Stata log, saved to a disk file, or written to the memory (overwriting any pre-existing data set). More than one of these options can be specified.

flist(global_macro_name) specifies the name of a global macro, containing a filename list (possibly empty). If saving() is also specified, then xcontract will append the name of the data set specified in the saving() option to the value of the global macro specified in flist(). This enables the user to build a list of filenames in a global macro, containing the output of a sequence of output data sets. These files may later be concatenated using append, or using dsconcat (downloadable from SSC) if installed.

Output-variable options

freq(newvarname) specifies a name for the frequency variable. If not specified, _freq is used.

percent(newvarname) specifies a name for the percent variable. If not specified, _percent is used. If the by() option is used, then the percent for each combination of values of the varlist variables in each by-group is calculated as a percent of the by-group.

cfreq(newvarname) specifies a name for the cumulative frequency variable. If not specified, no cumulative frequency variable is created. If the by() option is used, then the cumulative frequency for each combination of values of the varlist variables in each by-group is calculated as a cumulative frequency within the by-group.

cpercent(newvarname) specifies a name for the cumulative percent variable. If not specified, no cumulative percent variable is created. If the by() option is used, then the cumulative percent for each combination of values of the varlist variables in each by-group is calculated as a cumulative percent of the by-group.

ptype(storage_type) specifies a storage type for generating the percent variables specified by percent() and cpercent(). If type() is not specified, then these variables will be generated as variables of type float. All generated variables are compressed to the smallest storage type possible without loss of precision. See help for compress.

by(by_varlist) specifies a list of by-variables. If by() is specified, then all percents will be calculated as percents of their by-groups. Note that, if the if expression or the weight expression contains the reserved names _n and _N, then these will be interpreted as the observation sequence number and the number of observations, respectively, within the whole data set, not within the by-group.

idnum(#) specifies an ID number for the output data set. It is used to create a numeric variable, with default name idnum, in the output data set, with that value for all observations. This is useful if the output data set is concatenated with other xcontract output data sets using append, or using dsconcat if installed.

nidnum(newvarname) specifies a name for the numeric ID variable evaluated by idnum(). If idnum() is present and nidnum() is absent, then the name of the numeric ID variable is set to idnum.

idstr(string) specifies an ID string for the output data set. It is used to create a string variable, with default name idstr in the output data set, with that value for all observations. This is useful if the output data set is concatenated with other xcontract output data sets using append, or using dsconcat if installed.

nidstr(newvarname) specifies a name for the string ID variable evaluated by idstr(). If idstr() is present and nidstr() is absent, then the name of the string ID variable is set to idstr.

format(varlist_1 format_1 ... varlist_n format_n) specifies a list of pairs of variable lists and display formats. The formats will be allocated to the variables in the output data set specified by the corresponding varlist_i lists. If the format() option is absent, then the percent variables have the format %8.2f, the frequency variables have the format %12.0g, and the other variables have the same formats as the variables of the same names in the input data set.

Other options

zero specifies that combinations of values of the variables in varlist with zero frequency in the input data set will be included in the output data set.

nomiss specifies that observations with missing values for any of the variables in varlist will be excluded from the output data set. If not specified, all observations are included, except if excluded by the if and in qualifiers or given zero weights.

Examples

The following examples use the list() option to list the output data set to the Stata log. After these examples are executed, there is no new data set either in the memory or on disk.

. xcontract foreign rep78, list(,)

. xcontract foreign rep78, zero list(,clean noobs)

. xcontract foreign rep78, f(count) p(percent) cf(ccount) cp(cpercent) zero nomiss list(*,clean noobs)

. xcontract rep78, by(foreign) fr(frequency) per(percentage) cf(cumfreq) cp(cumperc) pty(double) format(percentage cumperc %4.0f) list(rep78-cumperc,clean noobs abbrev(16))

. xcontract _all, list(*,clean noobs)

The following examples use the norestore option to create an output data set in the memory, overwriting any pre-existing data set.

. xcontract foreign rep78, norestore

. xcontract foreign rep78, zero norestore

. xcontract foreign rep78, f(count) p(percent) cf(ccount) cp(cpercent) zero nomiss norestore

. xcontract rep78, by(foreign) fr(frequency) per(percentage) cf(cumfreq) cp(cumperc) pty(double) format(percentage cumperc %4.0f) norestore

. xcontract _all, norestore

The following examples use the saving() option to create an output data set in a disk file.

. xcontract foreign rep78, saving(myfreq1.dta)

. xcontract foreign rep78, zero saving(myfreq2.dta,replace)

. xcontract foreign rep78, f(count) p(percent) cf(ccount) cp(cpercent) zero nomiss saving(myfreq3.dta,replace)

. xcontract rep78, by(foreign) fr(frequency) per(percentage) cf(cumfreq) cp(cumperc) pty(double) format(percentage cumperc %4.0f) saving(myfreq4.dta,replace)

. xcontract _all, saving(myfreq5.dta,replace)

Acknowledgements

I would like to thank Nicholas J. Cox of Durham University, UK for some very helpful advice about writing efficient code, and also for writing the original version of contract, from which I re-engineered some of the code for xcontract. I would also like to thank StataCorp for writing fillin, from which I also re-engineered some of the code for xcontract.

Author

Roger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk

Also see

Manual: [R] contract, [R] collapse, [R] fillin, [R] compress, [R] format, [R] expand, [R] duplicates

Online: help for contract, collapse, fillin, compress, format, expand, duplicates help for xcollapse, dsconcat if installed