Extract factor values from a label variable created by parmest or parmby
factext [newvarlist] [if] [in] [ , from(varlist) string dofile(dofilename) parse(parse_string) fmissing(newvarname) ]
Description
factext is intended for use after the programs parmest or parmby. These are part of the parmest package, which can be downloaded from SSC, and which create output datasets (or resultssets) with one observation per parameter of the most recently fitted model. It is used when the fitted model contains factors (categorical variables), in which case some of the parameters correspond to dummy variables in the original dataset, indicating individual values of these factors. These dummy variables are usually created by xi, by tabulate, by John Hendrickx's desmat package. For continuous predictor variables, similar dummy-like variables, known as reference splines, can be created using the factref module of the SSC package bspline, with the labprefix() option. factext is used to create new factors with the same names in the new dataset created by parmest. These new factors can be used to make confidence interval plots and/or tables. Each new factor is assigned the appropriate value in observations belonging to parameters belonging to the factor, and missing values in other observations. The values of these factors are usually extracted from the label variable in the dataset created by parmby or parmest. If the model contains categorical factors, then the label variable will have values of the form
"factor_name==value"
in observations belonging to parameters belonging to these factors. The names of the factors to be re-created are specified in the newvarlist if it is present, and otherwise are specified by the factor_names. The factor values are specified in the values.
Users of Stata versions 11 or above should probably not use factext. Instead, they should probably use the fvregen package, also downloadable from SSC, which regenerates factor variables (introduced in Stata version 11) in a parmest output dataset by extracting their names and values from the parameter name variable, whose default name is parm.
Options
from(varlist) specifies a list of input string variables, from which the factors and their values are extracted. If this option is absent, then factext attempts to extract the factors from a single string variable named label. The from() option is used when the fitted model contains interactions, in which case the user must create a list of new string variables from label and specify these as the from() option (see Remarks). Factor values found in later variables in the from() list overwrite values for the same factors found in earlier variables in the from() list.
string specifies that the factors generated will be string variables. Otherwise they will be numeric variables.
dofile(dofilename) specifies a Stata do-file to be called by factext after the new factors have been created. This do-file is usually created by descsave, and contains commands to reconstruct the new factors with the storage types, display formats, value labels, variable labels and selected characteristics of the old factors with the same names in the original dataset.
parse(parse_string) specifies the string used to parse the input string variables specified in the from() option. This parse_string separates the factor_names from the values. If absent, it defaults to "==".
fmissing(newvarname) specifies the name of a new binary variable to be generated, containing missing values for observations excluded by the if and in qualifiers, 1 for other observations in which all the generated factors are missing, and 0 for other observations in which at least one of the generated factors is nonmissing.
Remarks
factext is typically used with the parmest and descsave packages to create a new dataset with one observation per parameter of the most recently fitted model, and data on the estimates, confidence intervals, P-values and other attributes of these parameters. These data are used to create tables and/or plots. Confidence interval plots are often produced using the eclplot package, which can also be downloaded from SSC. More information about the use of factext in combination with parmest, descsave and eclplot can be found in Newson (2003). In its default setting, with no from() option, factext can only handle labels for dummy variables corresponding to single factors, and cannot extract higher-order interactions. If there are higher-order interactions in the fitted model, then some of the values of label may be of a form such as
"factor_name1==value1 & factor_name2==value2"
or
"(factor_name==value)*varname"
(as created by xi). In this case, the user may use the split command to split the variable label into two or more string variables, each possibly containing values of the form
"factor_name==value"
These new string variables may then be input as the from() option of factext to extract the values. (See Examples below.)
If the model contains reference splines generated using the flexcurv module of the SSC package bspline, and the user has used flexcurv with the option labprefix("variable_name=="), where variable_name is the X-axis variable input to flexcurv, then the label variable may contain values of the form
"variable_name==value"
and factext can create a variable in the output dataset with the name and reference values of the X-axis variable. See the on-line help for bspline if installed.
To add extra observations to the dataset containing reference levels for the factors created by factext, the user may use the factref package, or merge in a dataset created using xcontract. To merge multiple factors and generate string variables containing the factor values, names and labels, use the factmerg package. The factmerg, factref and xcontract packages can be downloaded from SSC.
Examples
The following examples will work with the auto data if the SSC packages parmest and eclplot are installed. They will create confidence interval plots of the parameters corresponding to values of the factor rep78.
. sysuse auto, clear . tab rep78, gene(rep_) . parmby "regress mpg rep_*, noconst", label norestore . factext rep78 . eclplot estimate min95 max95 rep78
. sysuse auto, clear . xi: regress mpg i.rep78 . parmest, label norestore . factext . eclplot estimate min95 max95 rep78, yline(0)
The following example will work with the auto data if descsave is installed in addition to parmest and eclplot. The reconstructed categorical variables rep78 and foreign will have the variable and value labels belonging to the variables of the same names in the original dataset.
. sysuse auto, clear . tab foreign,gene(orig_) nolab . tempfile tf1 . descsave, do(`"`tf1'"', replace) . parmby "xi: regress mpg orig_* i.rep78, noconst", label norestore . factext, do(`"`tf1'"') . describe . eclplot estimate min95 max95 rep78, yline(0) . eclplot estimate min95 max95 foreign, xlab(0 1) . list foreign rep78 estimate min95 max95 p
The following example demonstrates higher order interactions. It will work with the auto data if descsave is installed in addition to parmest.
. sysuse auto, clear . tempfile tf1 . descsave, do(`"`tf1'"', replace) . parmby "xi: regress mpg i.foreign*i.rep78", label norestore . split label, parse(" & ") gene(s_) . factext, from(s_*) do(`"`tf1'"') . list foreign rep78 parm estimate min95 max95 p, nodisp
The parmest, descsave and eclplot packages can be installed from SSC.
Saved results
factext saves the following results in r():
Macros r(faclist) list of factors created
Author
Roger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk
References
Newson, R. 2003. Confidence intervals and p-values for delivery to the end user. The Stata Journal 3(3): 245-269. Download from the Stata Journal website.
Also see
Manual: [R] describe, [R] label, [R] tabulate, [R] xi, [D] split, [G] graph On-line: help for describe, label, tabulate, xi, split, graph help for parmest, descsave, desmat, factref, factmerg, eclplot, xcontract, fvregen, bspline if installed