Create a group variable and optionally an output dataset for a gsort key
gsgroup gsort_list , generate(varname) [ saving(filename [, replace]) nomissing mfirst ]
where gsort_list is a list of one or more elements of the form
[+|-]varname
as used by the gsort command.
Description
gsgroup inputs a gsort key (a list of elements of the form used by gsort, and generates as output a new variable, indicating the sequential order of the group to which each observation belongs, and (optionally) an output dataset (or resultsset), with 1 observation per group, and data on the values of the generated variable, and the variables of the gsort key corresponding to that group. Unlike gsort, gsgroup does not change the sort order of the dataset in memory. The output dataset can be merged into other datasets, using the official Stata command merge or the SSC package addinby. gsgroup is typically used together with the SSC packages parmest and xcontract.
Options
generate(varname) must be present. It specifies the name of an output variable to be generated. This output variable will contain, in each observation, the group to which that observation belongs, based on the values of the variables of the gsort key in that ovservation. It will have 1 integer value per combination of values for the input variables in the varlist, in ascending order of these values, starting with 1.
saving(filename [, replace]) specifies a file containing an output dataset (or resultsset), with 1 observation per group specified by the varlist, and data on the corresponding value of the new variable specified by the generate() option, and also on the values of the input variables in the gsort key corresponding to that group. The output dataset is sorted primarily by the grouping variable, and secondarily by the variables in the gsort key in order of appearance. If replace is specified, then any existing file named filename will be replaced.
nomissing specifies that observations with missing values in the variables of the input gsort key will have missing values in the generated group variable, and that these missing values will not be included in the saving() dataset (if specified). In default, groups with missing values will be included, both in the group variable and in the output dataset.
mfirst functions as the option of the same name for gsort.
Examples
.sysuse auto, clear .gsgroup -foreign rep78, gene(frgroup)
.sysuse auto, clear .gsgroup rep78 foreign, g(rfgroup) saving(rfgroup1.dta, replace)
The following advanced example requires Stata Version 11 or higher. It uses xgroup with the packages parmest, xcontract, keyby and addinby, downloadable from SSC. It defines a new binary variable odd, indicating that a car is odd-numbered in the sequence of the data. It then uses gsgroup to create a new variable fogroup, grouping the data by the binary variables foreign and odd, and an output dataset with 1 option per group. It then fits a regression model predicting fuel efficiency in miles per gallon (mpg) from car weight in US pounds (lb) to the data in each group defined by fogroup, and uses margins to estimate the expected mileage per gallon, in that group, of a car weighing 3000 lb. The estimated mileage for each group, with its confidence limits, is saved in an output dataset (or resultsset) in a temporary file, using parmest, with the idstr() and rename() options to create a numeric identifier variable, also with the name fogroup. We then concatenate the parmest resultssets into the memory using append, and then use addinby to add in the values of the key variables from the gsgroup resultsset. Finally, we use keyby to key the resultsset by the groups and by their defining variables, and describe and list the resultsset. Note that this example uses the variable characteristic fogroup[varlist], documented in Saved results, and containing a list of the variables defining the groups.
Set-up:
.sysuse auto, clear .gene byte odd=mod(_n,2) .lab var odd "Odd-numbered car"
Calculations:
.gsgroup -foreign odd, gene(fogroup) saving(fogroup1, replace) .global tflist "" .levelsof fogroup, lo(fogroups) .foreach GP of num `fogroups' { .xcontract fogroup `fogroup[varlist]' if fogroup==`GP', list(,) .regress mpg weight if fogroup==`GP' .margins, at(weight=3000) .tempfile tfcur .parmest, bmat(r(b)) vmat(r(V)) idnum(`GP') rename(idnum fogroup) saving(`"`tfcur'"', replace) flis(tflist) .} .clear .append using $tflist .addinby fogroup using fogroup1 .keyby fogroup `fogroup[varlist]' .describe .list
Saved results
gsgroup assigns to the generate() variable a variable characteristic varlist, containing a varlist of the variables appearing in the input gsort key, in order of appearance in the key.
Author
Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk
Also see
Manual: [D] gsort, [D] save, [D] sort, [D] merge On-line: help for gsort, save, sort, merge help for addinby, keyby, parmest, xcontract if installed