Group the data creating a group variable and/or output dataset
xgroup varlist [if] [in] , [ generate(varname) saving(filename [, replace]) missing label lname(name) truncate(#) ]
Description
xgroup is an extended version of the egen function group(varlist). It inputs a varlist of existing variables, and groups the data by the varlist. It generates as output a new variable, indicating the group to which each observation belongs, and/or an output dataset (or resultsset), with 1 observation per group, and data on the values of the varlist variables, and of the generated variable (if specified), corresponding to that group. The output dataset can be merged into other datasets, using the official Stata command merge or the SSC package addinby. xgroup is typically used together with the SSC packages parmest and factext, especially when a multiple-intercept model is fitted to the data, with 1 intercept for each combination of values for the varlist.
Options
generate(varname) specifies the name of an output variable to be generated. This output variable will contain, in each observation, the group to which that observation belongs, based on the values of the varlist in that ovservation. It will have 1 integer value per combination of values for the input variables in the varlist, in ascending order of these values, starting with 1.
saving(filename [, replace]) specifies a file containing an output dataset (or resultsset), with 1 observation per group specified by the varlist, and data on the corresponding value of the new variable specified by the generate() option (if present), and also on the values of the input variables in the varlist corresponding to that group. The output dataset is sorted primarily by the new grouping variable (if specified), and secondarily by the variables in the varlist. If replace is specified, then any existing file named filename will be replaced. The user must specify the generate() option, the saving() option, or both.
missing specifies that there will be 1 group per combination of missing or non-missing values of variables in the varlist. If missing is not specified, then there will be 1 group per combination of non-missing values of varlist, and the generate() variable (if specified) will be missing in any observation in which one or more of these variables has missing values.
label, lname(name) and truncate(#) control the value labels of the variable specified by the generate() option, and function as the options of the same names in the egen function group(varlist). However, they are not likely to be used, as they duplicate the function of the saving() option.
Examples
.sysuse auto, clear .xgroup foreign rep78, gene(frgroup) saving(frgroup1.dta, replace)
.sysuse auto, clear .xgroup rep78 foreign, saving(rfgroup1.dta, replace)
The following example uses xgroup with the packages parmest, factext and addinby, downloadable from SSC. It defines a new binary variable odd, indicating that a car is odd-numbered in the sequence of the data. It then uses xgroup to create a new variable fogroup, grouping the data by the binary variables foreign and odd. It then fits a regression model predicting weight (lb) from length (inches), with a common length effect on weight, expressed in lb/inch, and 4 intercepts, 1 for each combination of the binary variables foreign and odd, giving the expected weight of a car 180 inches long with that combination of the variables foreign and odd. The parameters of this model are saved (using parmest) in an output dataset (or resultsset), with 1 observation per estimated model parameter, which is written to the memory, overwriting the existing data. In this new dataset, the grouping variable fogroup is then reconstructed, using factext. The output dataset (or resultsset) generated by xgroup is then merged into the output dataset (or resultsset) generated by parmest, using addinby, to reconstruct the variables foreign and odd in the 4 observations corresponding to intercepts.
.sysuse auto, clear .gene byte odd=mod(_n,2) .lab var odd "Odd-numbered car" .clonevar lm180=length .replace lm180=lm180-180 .tempfile tf1 .xgroup foreign odd, gene(fogroup) saving(`"`tf1'"', replace) .xi, noomit: regress weight lm180 i.fogroup, noconst .parmest, label norestore .factext fogroup .addinby fogroup using `"`tf1'"', missing unmatched(keep) .describe .list label foreign odd estimate min* max* p
Author
Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk
Also see
Manual: [D] egen, [D] merge, [R] xi On-line: help for egen, merge, xi help for addinby, parmest, factext if installed