-------------------------------------------------------------------------------
help for prodvars                                                (Roger Newson)
-------------------------------------------------------------------------------

Create product variables for two lists of input variables

prodvars lvarlist [if] [in] , rvarlist(rvarlist) [ generate(stub) prefix(string) suffix(string) separator(string) lprefix(string) lsuffix(string) lseparator(string) llname lrname nolabel lcharlist(charlist) rcharlist(charlist) ccharlist(charlist) ccprefix(string) ccsuffix(string) ccseparator(string) noconstant float replace fast ]

where lvarlist and rvarlist are the left and right variable lists, respectively.

Description

prodvars inputs 2 variable lists, known as the left variable list and the right variable list. It produces as output a list of generated variables, one for each pair of variables from the left and right variable lists, each with a variable name derived either from a stub or from the names of the pair of input variables, and values equal to the products of the values of the two input variables. Optionally, the generated variables may also have variable labels derived from the variable labels, or variable names, of the input variables. prodvars is useful for calculating variables for the design matrix of a multiple-intercept model. Such a multiple-intercept model is fitted to the data using an estimation command with the noconstant option. Typically, one of the two input variable lists is a list of indicator variables (or dummy variables), each indicating membership of one of several groups, forming a partition of the sample, and corresponding to group-specific intercepts in the fitted model. Such a variable list may be produced using tabulate with the generate() option, or by xi with the noomit option. The other variable list is typically a list of variables corresponding to slopes, differences, or ratios in the fitted model. These variables are usually either quantitative variables, or group identifier variables corresponding to a factor with an omitted group, possibly produced using xi without the noomit option. The generated product variables will then be included in the design matrix, together with the variables in the first input list (corresponding to group-specific intercepts), and will correspond to group-specific slopes, differences, or ratios.

Options

rvarlist(rvarlist) specifies the right variable list. The generated variables will correspond to pairs of variables, the first variable from the left variable list lvarlist, and the second variable from the right variable list rvarlist. Each generated variable will contain the product of the corresponding pair of input variables, at least in observations selected by the if and in qualifiers.

generate(stub) specifies a stub from which the output variable names will be created. If generate() is specified, then the output product variables will have names prefixed with the stub, and suffixed with serial numbers, ordered primarily by the order of the corresponding input variables in the lvarlist and secondarily by the order of the corresponding input variables specified by rvarlist(). For instance, if there are 3 variables in the lvarlist and 2 variables in the variable list specified by rvarlist(), and the user specifies generate(b_), then the output product variables will be named b_1, b_2, b_3, b_4, b_5 and b_6. If the user specifies a generate() option, then the prefix(), suffix() and separator() options will be ignored.

prefix(string) specifies a prefix for generating the variable names of the generated product variables. The name of a product variable, corresponding to a left variable from the left variable list and a right variable from the right variable list, is formed by combining the prefix specified by prefix(), the left variable name, the separator specified by separator(), the right variable name, and the suffix specified by suffix(). The prefix and/or the separator and/or the suffix may be empty.

suffix(string) specifies a suffix for generating the variable names of the generated product variables.

separator(string) specifies a separator for generating the variable names of the generated product variables.

lprefix(string) specifies a prefix for generating the variable labels of the generated product variables. The variable label of a product variable, corresponding to a left variable from the left variable list and a right variable from the right variable list, is formed by combining the prefix specified by lprefix(), the left variable label (or name), the separator specified by lseparator(), the right variable label (or name), and the suffix specified by lsuffix(). The prefix and/or the separator and/or the suffix may be empty.

lseparator(string) specifies a separator for generating the variable labels of the generated product variables.

lsuffix(string) specifies a suffix for generating the variable labels of the generated product variables.

llname specifies that the variable labels of the generated product variables will be generated using the variable names of the left variables in the list lvarlist. If llname is not specified, then the variable label of a generated product variable is generated using the variable label of the left variable, if this label is not empty, and using the variable name of the left variable otherwise.

lrname specifies that the variable labels of the generated product variables will be generated using the variable names of the right variables in the list rvarlist. If lrname is not specified, then the variable label of a generated product variable is generated using the variable label of the right variable, if this label is not empty, and using the variable name of the right variable otherwise.

nolabel specifies that no variable labels will be generated for the generated product variables. If nolabel is specified, then the options lprefix(), lseparator(), lsuffix(), llname, and lrname are ignored.

lcharlist(charlist) specifies a list of names of variable characteristics for the generated product variables, to be inherited from the corresponding left input variables specified by the lvarlist.

rcharlist(charlist) specifies a list of names of variable characteristics for the generated product variables, to be inherited from the corresponding right input variables specified by the rvarlist() option.

ccharlist(charlist) specifies a list of names of variable characteristics for the generated product variables, to be evaluated by combining the characteristics of the same names from the corresponding left input variables specified by the lvarlist and from the corresponding right input variables specified by the rvarlist() option.

ccprefix(string) specifies a prefix string, to be used when combining the variable characteristics specified by ccharlist() from the left and right input variables to form the characteristics of the same names for the generated product variables.

ccsuffix(string) specifies a suffix string, to be used when combining the variable characteristics specified by ccharlist() from the left and right input variables to form the characteristics of the same names for the generated product variables.

ccseparator(string) specifies a separator string, to be used when combining the variable characteristics specified by ccharlist() from the left and right input variables to form the characteristics of the same names for the generated product variables.

noconstant specifies that generated product variables which are constant in the sample will be dropped. This option can be useful if the generated product variables are used in a design matrix.

float specifies that the highest precision storage type allowed for a generated product variable will be float. If float is not specified, then the highest precision storage type allowed for a generated product variable will be double. Note that, whether or not float is specified, all generated product variables are compressed to the lowest precision possible without losing information.

replace specifies that, if any existing variables have the same names as those specified for the generated product variables, then these existing variables will be dropped. If replace is not specified, then prodvars checks whether any such existing variables exist, and fails if any exist.

fast is an option for programmers. It specifies that prodvars will do no extra work to preserve the original data (without any generated product variables) if the user presses Break.

Remarks

prodvars is intended to produce design matrices for regression models with multiple intercepts, estimated using estimation commands with the noconstant option. This practice is in contrast to the more traditional practice of estimating regression parameters for models with a single intercept, which is identified in Stata by the parameter name _cons, if the noconstant option is not specified.

The variable labels of the generated indicator variables can be made as informative as possible. They are similar to those generated by xi and tabulate, but a lot more flexible. In particular, the parameters, and the corresponding variable labels, can be output to output datasets (or resultssets) by the parmest package, and the categorical factors can be reconstructed in these resultssets, using the descsave and factext packages. The packages parmest, descsave and factext can all be downloaded from SSC, using the ssc command in Stata.

Examples

The following example works if the descsave and parmest packages are installed from SSC. (This can be done in Stata using the ssc command.) xi and prodvars are used together to create a design matrix, with variables prefixed by _I, corresponding to one intercept for each level of the variable foreign, and variables prefixed by _H, corresponding to one slope of fuel consumption with respect to weight for each level of foreign. These parameters are estimated using regress and displayed using parmest. Note that descsave and parmest display the variable labels of the product variables produced by prodvars.

. sysuse auto, clear . gene gpm=1/mpg . lab var gpm "Fuel consumption (gallons/mile)" . xi i.foreign, noomit . prodvars _I*, rvar(weight) pre(_H) sep(X) lpre("(") lsep(")*") lrname . descsave, list(, abbr(32) subvar noobs) . regress gpm _I* _H*, noconst . parmest, label list(, abbr(32))

The following example works if the descsave and parmest packages are installed from SSC. We create a categorical factor mod3, containing the sequence order (modulo 3) of the car model in the dataset, and having values 0, 1 and 2. We then use xi, with the noomit option, to produce a lst of variables, prefixed by _I, indicating membership of groups defined by all values of the variable foreign. We then use xi, without the noomit option, to produce a list of variables, prefixed by J, indicating membership of groups defined by all non-zero values of the variable mod3. We then use prodvars to produce product variables, prefixed by _H, corresponding to combinations of all values of foreign and non-zero values of mod3. The final regression model contains an intercept for each value of foreign, defined as a mean weight for cars with that value of foreign and a baseline zero value of mod3, and a weight difference for each combined value of foreign and non-zero value of mod3, comparing mean car weights with the mean car weight for cars with the same value of foreign and a zero value of mod3.

. sysuse auto, clear . gene mod3=mod(_n,3) . lab var mod3 "Model sequence (modulo 3)" . xi i.foreign, noomit . xi i.mod3, pref(_J) . prodvars _I*, rvar(_J*) pre(_H) lsep(" & ") . descsave, list(, abbr(32) subvar noobs) . regress weight _I* _H*, noconst . parmest, label list(, abbr(32))

Saved results

prodvars saves the following in r():

Macros r(prodvars) list of generated product variables

Author

Roger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk

Also see

Manual: [R] tabulate, [R] xi On-line: help for tabulate, xi help for parmest, descsave, factext if installed