------------------------------------------------------------------------------- help forprodvars(Roger Newson) -------------------------------------------------------------------------------

Create product variables for two lists of input variables

prodvarslvarlist[if] [in],rvarlist(rvarlist)[generate(stub)prefix(string)suffix(string)separator(string)lprefix(string)lsuffix(string)lseparator(string)llnamelrnamenolabellcharlist(charlist)rcharlist(charlist)ccharlist(charlist)ccprefix(string)ccsuffix(string)ccseparator(string)noconstantfloatreplacefast]where

lvarlistandrvarlistare the left and right variable lists, respectively.

Description

prodvarsinputs 2 variable lists, known as the left variable list and the right variable list. It produces as output a list of generated variables, one for each pair of variables from the left and right variable lists, each with a variable name derived either from a stub or from the names of the pair of input variables, and values equal to the products of the values of the two input variables. Optionally, the generated variables may also have variable labels derived from the variable labels, or variable names, of the input variables.prodvarsis useful for calculating variables for the design matrix of a multiple-intercept model. Such a multiple-intercept model is fitted to the data using an estimation command with thenoconstantoption. Typically, one of the two input variable lists is a list of indicator variables (or dummy variables), each indicating membership of one of several groups, forming a partition of the sample, and corresponding to group-specific intercepts in the fitted model. Such a variable list may be produced usingtabulatewith thegenerate()option, or byxiwith thenoomitoption. The other variable list is typically a list of variables corresponding to slopes, differences, or ratios in the fitted model. These variables are usually either quantitative variables, or group identifier variables corresponding to a factor with an omitted group, possibly produced usingxiwithout thenoomitoption. The generated product variables will then be included in the design matrix, together with the variables in the first input list (corresponding to group-specific intercepts), and will correspond to group-specific slopes, differences, or ratios.

Options

rvarlist(rvarlist)specifies the right variable list. The generated variables will correspond to pairs of variables, the first variable from the left variable listlvarlist, and the second variable from the right variable listrvarlist. Each generated variable will contain the product of the corresponding pair of input variables, at least in observations selected by theifandinqualifiers.

generate(stub)specifies a stub from which the output variable names will be created. Ifgenerate()is specified, then the output product variables will have names prefixed with thestub, and suffixed with serial numbers, ordered primarily by the order of the corresponding input variables in thelvarlistand secondarily by the order of the corresponding input variables specified byrvarlist(). For instance, if there are 3 variables in thelvarlistand 2 variables in the variable list specified byrvarlist(), and the user specifiesgenerate(b_), then the output product variables will be namedb_1,b_2,b_3,b_4,b_5andb_6. If the user specifies agenerate()option, then theprefix(),suffix()andseparator()options will be ignored.

prefix(string)specifies a prefix for generating the variable names of the generated product variables. The name of a product variable, corresponding to a left variable from the left variable list and a right variable from the right variable list, is formed by combining the prefix specified byprefix(), the left variable name, the separator specified byseparator(), the right variable name, and the suffix specified bysuffix(). The prefix and/or the separator and/or the suffix may be empty.

suffix(string)specifies a suffix for generating the variable names of the generated product variables.

separator(string)specifies a separator for generating the variable names of the generated product variables.

lprefix(string)specifies a prefix for generating the variable labels of the generated product variables. The variable label of a product variable, corresponding to a left variable from the left variable list and a right variable from the right variable list, is formed by combining the prefix specified bylprefix(), the left variable label (or name), the separator specified bylseparator(), the right variable label (or name), and the suffix specified bylsuffix(). The prefix and/or the separator and/or the suffix may be empty.

lseparator(string)specifies a separator for generating the variable labels of the generated product variables.

lsuffix(string)specifies a suffix for generating the variable labels of the generated product variables.

llnamespecifies that the variable labels of the generated product variables will be generated using the variable names of the left variables in the listlvarlist. Ifllnameis not specified, then the variable label of a generated product variable is generated using the variable label of the left variable, if this label is not empty, and using the variable name of the left variable otherwise.

lrnamespecifies that the variable labels of the generated product variables will be generated using the variable names of the right variables in the listrvarlist. Iflrnameis not specified, then the variable label of a generated product variable is generated using the variable label of the right variable, if this label is not empty, and using the variable name of the right variable otherwise.

nolabelspecifies that no variable labels will be generated for the generated product variables. Ifnolabelis specified, then the optionslprefix(),lseparator(),lsuffix(),llname, andlrnameare ignored.

lcharlist(charlist)specifies a list of names of variable characteristics for the generated product variables, to be inherited from the corresponding left input variables specified by thelvarlist.

rcharlist(charlist)specifies a list of names of variable characteristics for the generated product variables, to be inherited from the corresponding right input variables specified by thervarlist()option.

ccharlist(charlist)specifies a list of names of variable characteristics for the generated product variables, to be evaluated by combining the characteristics of the same names from the corresponding left input variables specified by thelvarlistand from the corresponding right input variables specified by thervarlist()option.

ccprefix(string)specifies a prefix string, to be used when combining the variable characteristics specified byccharlist()from the left and right input variables to form the characteristics of the same names for the generated product variables.

ccsuffix(string)specifies a suffix string, to be used when combining the variable characteristics specified byccharlist()from the left and right input variables to form the characteristics of the same names for the generated product variables.

ccseparator(string)specifies a separator string, to be used when combining the variable characteristics specified byccharlist()from the left and right input variables to form the characteristics of the same names for the generated product variables.

noconstantspecifies that generated product variables which are constant in the sample will be dropped. This option can be useful if the generated product variables are used in a design matrix.

floatspecifies that the highest precision storage type allowed for a generated product variable will befloat. Iffloatis not specified, then the highest precision storage type allowed for a generated product variable will bedouble. Note that, whether or notfloatis specified, all generated product variables are compressed to the lowest precision possible without losing information.

replacespecifies that, if any existing variables have the same names as those specified for the generated product variables, then these existing variables will be dropped. Ifreplaceis not specified, thenprodvarschecks whether any such existing variables exist, and fails if any exist.

fastis an option for programmers. It specifies thatprodvarswill do no extra work to preserve the original data (without any generated product variables) if the user presses Break.

Remarks

prodvarsis intended to produce design matrices for regression models with multiple intercepts, estimated using estimation commands with thenoconstantoption. This practice is in contrast to the more traditional practice of estimating regression parameters for models with a single intercept, which is identified in Stata by the parameter name_cons, if thenoconstantoption is not specified.The variable labels of the generated indicator variables can be made as informative as possible. They are similar to those generated by

xiandtabulate, but a lot more flexible. In particular, the parameters, and the corresponding variable labels, can be output to output datasets (or resultssets) by theparmestpackage, and the categorical factors can be reconstructed in these resultssets, using thedescsaveandfactextpackages. The packagesparmest,descsaveandfactextcan all be downloaded from SSC, using thessccommand in Stata.

ExamplesThe following example works if the

descsaveandparmestpackages are installed from SSC. (This can be done in Stata using thessccommand.)xiandprodvarsare used together to create a design matrix, with variables prefixed by_I, corresponding to one intercept for each level of the variableforeign, and variables prefixed by_H, corresponding to one slope of fuel consumption with respect to weight for each level offoreign. These parameters are estimated usingregressand displayed usingparmest. Note thatdescsaveandparmestdisplay the variable labels of the product variables produced byprodvars.. sysuse auto, clear . gene gpm=1/mpg . lab var gpm "Fuel consumption (gallons/mile)" . xi i.foreign, noomit . prodvars _I*, rvar(weight) pre(_H) sep(X) lpre("(") lsep(")*") lrname . descsave, list(, abbr(32) subvar noobs) . regress gpm _I* _H*, noconst . parmest, label list(, abbr(32))

The following example works if the

descsaveandparmestpackages are installed from SSC. We create a categorical factormod3, containing the sequence order (modulo 3) of the car model in the dataset, and having values 0, 1 and 2. We then usexi, with thenoomitoption, to produce a lst of variables, prefixed by_I, indicating membership of groups defined by all values of the variableforeign. We then usexi, without thenoomitoption, to produce a list of variables, prefixed byJ, indicating membership of groups defined by all non-zero values of the variablemod3. We then useprodvarsto produce product variables, prefixed by_H, corresponding to combinations of all values offoreignand non-zero values ofmod3. The final regression model contains an intercept for each value offoreign, defined as a mean weight for cars with that value offoreignand a baseline zero value ofmod3, and a weight difference for each combined value offoreignand non-zero value ofmod3, comparing mean car weights with the mean car weight for cars with the same value offoreignand a zero value ofmod3.. sysuse auto, clear . gene mod3=mod(_n,3) . lab var mod3 "Model sequence (modulo 3)" . xi i.foreign, noomit . xi i.mod3, pref(_J) . prodvars _I*, rvar(_J*) pre(_H) lsep(" & ") . descsave, list(, abbr(32) subvar noobs) . regress weight _I* _H*, noconst . parmest, label list(, abbr(32))

Saved results

prodvarssaves the following inr():Macros

r(prodvars)list of generated product variables

AuthorRoger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk

Also seeManual:

[R] tabulate,[R] xiOn-line: help fortabulate,xihelp forparmest,descsave,factextif installed