Title
todummies -- Create indicator variables from categorical variable or vice versa
Syntax
Create indicator variables from categorical variable
todummies varname [levspec [levspec ...]] [if] [in] [ , to_options]
where levspec is
# ["varlabel"] | (numlist ["varlabel"])
and parentheses and double quotes are required.
Create categorical variable from indicator variables
fromdummies varlist [if] [in] , generate(newvarname) [from_options]
Description
todummies creates indicator variables (also called dummy variables) from one categorical variable.
One indicator variable is created for each specified level of the categorical variable. Enclosing more than one level in parentheses, creates one dummy variable, indicating observations for which varname equals one of these values. Omitted levels will be coded as missing values in the created indicator variables. Omitting all levels, results in one indicator for each level of the original variable (see tabulate).
min and max may be used as # and in numlist to refer to the minimum and maximum of varname. Consecutive # may be specified as a numlist of the form from/to (There is no space between from, / and to). There may not be more than 249 distinct values specified.
Indicator variables will be labeled according to the value labels of the categorical variable, if no variable labels are specified. If varname has no value label attached, the dummies will not be labeled.
fromdummies creates one categorical variable from binary indicator variables. It thus reverses
tabulate varname ,generate(stubname)
A value label is defined for newvarname, associating variable labels attached to variables in varlist with the corresponding values in newvarname.
Options
+------------+ ----+ to_options +-------------------------------------------------------
generate(namelist|stub) specifies names for the indicator variables. If one stub is specified, dummies stub1, stub2, ... will be created. If generate is not specified, stub defaults to varname.
reference(numlist) specifies values of varname to be used as the reference category. No additional indicator variable will be created. Instead, observations for which varname equals one of the values in numlist are coded 0 in all indicator variables. rest may be specified as reference, meaning all (nonmissing) values not specified in levspec. This option is ignored if no levels are specified.
sic changes the default stub to varname as typed (possibly abbreviated), if generate is not specified. If only one indicator variable is created, sic suppreses numeric suffixes.
novarlabel prevents labeling the indicator variables.
+--------------+ ----+ from_options +-----------------------------------------------------
generate(newvarname) specifies the name for the categorical variable. This is a required option.
reference creates the categorical variable, even if the dummies do not add up to 1. Observations for which all dummies are 0, are coded k + 1, where k is the number of indicator variables. The value label "reference" is added. fromdummies aborts with an error if the sum of the dummies is larger than 1.
names in the created value label, associates variable names in varlist with corresponding values in newvarname.
[no]vallabel[(lblname)] defines lblname as a value label for the categorical variable. Default lblname is newvarname. novallabel does not create a value label for the categorical variable.
varlabel(name) specifies a variable label for the categorical variable. There is no default variable label.
Remarks
todummies vs. todummy
todummies is useful to create indicator variables from one categorical variable, if some of the levels are to be collapsed.
Say we have a variable, foo, with 5 levels: values 1 to 5. We want to create one binary variable, indicating level 1 and another indicator variable, representing levels 2 to 3. Level 4 is the reference, and level 5 represents missing values.
There are many ways to create our variables. Here is one:
. recode foo (3 = 2)(5 = .), generate(bar) . tabulate bar ,generate(foobar)
We can now use variables foobar1 and foobar2 in our analysis, omitting variable foobar3 as the reference.
Here is how we use todummies to create the indicator variables
. todummies foo 1 (2 3) ,reference(4) generate(foobar)
Basically we collapsed the two lines of code above, into one line. Note that foobar3 is not created in this example.
We can also use todummy to create the indicator variables, coding
. todummy foo if (foo != 5) ,values(1 \ 2 3 \ 4) stub(foobar)
As with todummies, we have one line of code, but the need to specify an if qualifier makes this approach less convenient. The if qualifier makes sure that all observations with value 5 in variable foo are coded . (missing value) in the created indicator variables. Because specifying an if qualifier is easily forgotten, I recommend using todummies. Also, from a more technical point of view, todummy is very slow.
Examples
. sysuse nlsw88 . todummies race 2 (1 3) ,generate(black other) . fromdummies black other ,generate(race2)
. todummies occ (1/2 "High occ.") 3/4 ,reference(9 10) sic . fromdummies occ? ,generate(occu) names reference
Acknowledgments
dummies2 is inspired by dummies and dummieslab by Nick Cox and Philippe van Kerm.
Author
Daniel Klein, University of Kassel, klein.daniel.81@gmail.com
Also see
Online: tabulate, fvvarlist (Stata 11 an higher)
if installed: dummies, dummieslab, todummy