Generating dummy variables from categorical variable using value label names
dummieslab varname [if exp] [in range] [, word(integer) from(string) to(string) template(string) truncate(integer) novarlabel ]
Description
dummieslab generates a set of dummy variables from a categorical variable. One dummy variable is created for each level of the original variable. Names for the dummy variables are derived from the value labels of the categorical variable. (Raw (unlabelled) values are used if the categorical variable has no value labels attached.)
Two different behaviours can be chosen for the variable names: (i) use full value labels; (ii) use the sth word of the label. In both cases, all invalid characters are stripped from the new variable names.
Any user-defined prefix and/or suffix can be added using the template option.
In all cases, no new variable will be generated unless all implied new names are valid.
dummieslab applied to variables with no label appends the level to the original variable name (very much like what tabulate does).
Options
word(s) requests that the sth word of the label be used as the new variable name. Note the use of word(-1) to specify the last word of the label.
from(string) and to(string) are used together to make replacements to the strings used to create the new variables. from(string) contains a list of words to be replaced by the list of words supplied in to(string), i.e. the first item in from is substituted by the first item in to, the second item in from is substituted by the second item in to, etc. By default, all invalid characters are dropped from the value labels to create new variable names. This behaviour can be overridden by the use of from(string) and to(string). For example, use from(" ") and to("_") to replace all blanks by underscores.
template(word) specifies a template for the new variable name. @ is used as a placeholder for inserting the extracted label. This option is used to insert a prefix (anything before @ in word) and/or a suffix (anything after @ in word).
truncate(n) truncates new variable names after n characters.
novarlabel prevents automated variable labelling of the generated dummies.
Saved results
local r(names) List of names of created dummies r(from) Name of the original categorical variable
Examples
. sysuse auto . label define newfor 0 "Domestic car" 1 "Foreign (European or Japanese) car" . label values foreign newfor . dummieslab foreign . dummieslab foreign, word(1) . dummieslab foreign, word(-1) . dummieslab foreign, from(" ") to("_") . dummieslab foreign, from(car or Foreign) to("" "_" "") . dummieslab foreign, from(car Foreign or) to("" "" "_") . dummieslab foreign, word(1) template("My_@_car")
Acknowledgments
Patrick Joly made helpful suggestions on the first version of dummieslab, which led to the addition of the from and to options. Daniel Klein suggested option novarlabel.
Authors
Philippe Van Kerm, CEPS/INSTEAD, Differdange, G.-D. Luxembourg philippe.vankerm@ceps.lu
Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk
Also see
On-line: tabulate On-line (if installed): dummies