{smcl} {hline} help for {cmd:xgroup}{right:(Roger Newson)} {hline} {title:Group the data creating a group variable and/or output dataset} {p 8 21 2} {cmd:xgroup} {varlist} {ifin} , [ {opth g:enerate(varname)} {cmdab:sa:ving}{cmd:(}{it:filename} [{cmd:, replace}]{cmd:)} {break} {opt m:issing} {opt label} {opth l:name(name)} {opt t:runcate(#)} ] {title:Description} {pstd} {cmd:xgroup} is an extended version of the {helpb egen} function {helpb egen:group({it:varlist})}. It inputs a {varlist} of existing variables, and groups the data by the {varlist}. It generates as output a new variable, indicating the group to which each observation belongs, and/or an output dataset (or resultsset), with 1 observation per group, and data on the values of the {varlist} variables, and of the generated variable (if specified), corresponding to that group. The output dataset can be merged into other datasets, using the official Stata command {helpb merge} or the {help ssc:SSC} package {helpb addinby}. {cmd:xgroup} is typically used together with the {help ssc:SSC} packages {helpb parmest} and {helpb factext}, especially when a multiple-intercept model is fitted to the data, with 1 intercept for each combination of values for the {varlist}. {title:Options} {phang} {opth generate(varname)} specifies the name of an output variable to be generated. This output variable will contain, in each observation, the group to which that observation belongs, based on the values of the {varlist} in that ovservation. It will have 1 integer value per combination of values for the input variables in the {varlist}, in ascending order of these values, starting with 1. {phang} {cmd: saving}{cmd:(}{it:filename} [{cmd:, replace}]{cmd:)} specifies a file containing an output dataset (or resultsset), with 1 observation per group specified by the {varlist}, and data on the corresponding value of the new variable specified by the {cmd:generate()} option (if present), and also on the values of the input variables in the {varlist} corresponding to that group. The output dataset is sorted primarily by the new grouping variable (if specified), and secondarily by the variables in the {varlist}. If {cmd:replace} is specified, then any existing file named {it:filename} will be replaced. The user must specify the {cmd:generate()} option, the {cmd:saving()} option, or both. {phang} {opt missing} specifies that there will be 1 group per combination of missing or non-missing values of variables in the {varlist}. If {cmd:missing} is not specified, then there will be 1 group per combination of non-missing values of {varlist}, and the {cmd:generate()} variable (if specified) will be missing in any observation in which one or more of these variables has missing values. {phang} {opt label}, {opth lname(name)} and {opt truncate(#)} control the {help label:value labels} of the variable specified by the {cmd:generate()} option, and function as the options of the same names in the {helpb egen} function {helpb egen:group({it:varlist})}. However, they are not likely to be used, as they duplicate the function of the {cmd:saving()} option. {title:Examples} {phang2}{cmd:.sysuse auto, clear}{p_end} {phang2}{cmd:.xgroup foreign rep78, gene(frgroup) saving(frgroup1.dta, replace)}{p_end} {phang2}{cmd:.sysuse auto, clear}{p_end} {phang2}{cmd:.xgroup rep78 foreign, saving(rfgroup1.dta, replace)}{p_end} {pstd} The following example uses {cmd:xgroup} with the packages {helpb parmest}, {helpb factext} and {helpb addinby}, downloadable from {help ssc:SSC}. It defines a new binary variable {cmd:odd}, indicating that a car is odd-numbered in the sequence of the data. It then uses {cmd:xgroup} to create a new variable {cmd:fogroup}, grouping the data by the binary variables {cmd:foreign} and {cmd:odd}. It then fits a regression model predicting weight (lb) from length (inches), with a common length effect on weight, expressed in lb/inch, and 4 intercepts, 1 for each combination of the binary variables {cmd:foreign} and {cmd:odd}, giving the expected weight of a car 180 inches long with that combination of the variables {cmd:foreign} and {cmd:odd}. The parameters of this model are saved (using {helpb parmest}) in an output dataset (or resultsset), with 1 observation per estimated model parameter, which is written to the memory, overwriting the existing data. In this new dataset, the grouping variable {cmd:fogroup} is then reconstructed, using {helpb factext}. The output dataset (or resultsset) generated by {cmd:xgroup} is then merged into the output dataset (or resultsset) generated by {helpb parmest}, using {helpb addinby}, to reconstruct the variables {cmd:foreign} and {cmd:odd} in the 4 observations corresponding to intercepts. {phang2}{cmd:.sysuse auto, clear}{p_end} {phang2}{cmd:.gene byte odd=mod(_n,2)}{p_end} {phang2}{cmd:.lab var odd "Odd-numbered car"}{p_end} {phang2}{cmd:.clonevar lm180=length}{p_end} {phang2}{cmd:.replace lm180=lm180-180}{p_end} {phang2}{cmd:.tempfile tf1}{p_end} {phang2}{cmd:.xgroup foreign odd, gene(fogroup) saving(`"`tf1'"', replace)}{p_end} {phang2}{cmd:.xi, noomit: regress weight lm180 i.fogroup, noconst}{p_end} {phang2}{cmd:.parmest, label norestore}{p_end} {phang2}{cmd:.factext fogroup}{p_end} {phang2}{cmd:.addinby fogroup using `"`tf1'"', missing unmatched(keep)}{p_end} {phang2}{cmd:.describe}{p_end} {phang2}{cmd:.list label foreign odd estimate min* max* p}{p_end} {title:Author} {pstd} Roger Newson, National Heart and Lung Institute, Imperial College London, UK.{break} Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk} {title:Also see} {p 4 13 2} {bind: }Manual: {hi:[D] egen}, {hi:[D] merge}, {hi:[R] xi} {p_end} {p 4 13 2} On-line: help for {helpb egen}, {helpb merge}, {helpb xi} {break} help for {helpb addinby}, {helpb parmest}, {helpb factext} if installed {p_end}