{smcl} {* version 1.2.2 23aug2012}{...} {cmd:help todummy} {hline} {title:Title} {p 5} {cmd:todummy} {hline 2} Create dummy variables {title:Syntax} {p 8} {cmd:todummy} {varlist} {ifin} {cmd:,} { {opt v:alues(vlist)}|{hi:{it:keyword}} } [{it:options}] {p 5} where {it:vlist} is a list of values specified in one or more {it:{help numlist}s} of the form {p 8} [{cmd:=}]{it:{help numlist}} [{cmd:\} [{cmd:=}]{it:{help numlist}} {it:...}] {synoptset 21 tabbed}{...} {synopthdr} {synoptline} {p2coldent:* {opt v:alues(vlist)}}specify values to be coded 1 {p_end} {synopt:{opt p:ercentile}}interpret {it:vlist} as list of percentiles {p_end} {synopt:{opt c:ut}}interpret {it:vlist} as cutpoints {p_end} {syntab:*{it:Keywords}} {synopt:{opt l:evels}}create one dummy for each level of the original variable {p_end} {synopt:{opt med:ian}}assign value 1 if the original variable is greater or equal to the 50th percentile {p_end} {synopt:{opt q}}create one dummy for each quartile of the original variable {p_end} {syntab:{it:Names}} {synopt:{opt g:enerate(namelist)}}create dummies {it:name{hi:1}}, {it:name{hi:2}}, ... {p_end} {synopt:{opt pre:fix(pre)}}use {it:pre} as prefix for created dummies {p_end} {synopt:{opt suff:ix(suff)}}use {it:suff} as suffix for created dummies {p_end} {synopt:{opt stub(stub)}}use {it:stub{hi:1}}, {it:stub{hi:2}}, ... as dummies' names {p_end} {synopt:{opt replace}}replace existing variables with dummies {p_end} {synopt:{opt nonam:es}}do not use value labels as variable names ({opt levels}) {p_end} {syntab:{it:Labels}} {synopt:[{ul:{cmd:r}}]{opt l:abel(lbllist)}}use {it:label{hi:1}}, {it:label{hi:2}}, ... as variable labels {p_end} {synopt:{opt novarl:abel}}do not assign variable labels {p_end} {syntab:{it:Missing values}} {synopt:{opt m:issing}}create dummy for missings ({opt levels}) or copy missing values {p_end} {syntab:{it:Advanced}} {synopt:{opt nosk:ip}[{cmd:(}drop{cmd:)}]}do not skip creation of existing dummies {p_end} {synopt:{opt ro(rel. operator)}}specify {help operator:relational operator} {p_end} {synopt:{opt noexc:lude}}use all observations to create dummies, even if excluded by {hi:if} and/or {hi:in} qualifiers {p_end} {synoptline} {p 5}* one of {opt values()} or {hi:{it:keyword}} must be specified {title:Description} {pstd} {cmd:todummy} creates indicator variables (also called dummies) from variables in {it:varlist}. There may either one or multiple dummies be created from each variable. If one dummy per variable is created, default names are {it:{hi:d_}varname}. {marker opt} {title:Options} {phang} {opt values(vlist)} assigns value 1 if the original variable equals the values specified in {it:vlist}, 0 otherwise. There will be as many dummies per variable as there are {it:numlists} in {it:vlist}. The first created dummy will be coded 1 if the original variable equals the values in the first {it:numlist}, the second dummy will be 1 if the original variable equals the values in the second {it:numlist} and so on. If more than one dummy is created the default names are {it:varname{hi:J}}, where {it:{hi:}J} indicates the number of the dummy created from the original variable. The dummies will not have variable labels. Non-integer values and missing values (i.e. {hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) are allowed in {it:numlists}. If {it:numlist} has missing values, the created dummy will {hi:not} have missing values. {phang} {opt percentile} interprets {it:vlist} as a list of percentiles (which must be between 0 and 100). If a {it:numlist} contains only one percentile, the created dummy variable will be coded 1 if the original variable is greater or equal to this percentile. Specifying {it:k} percentiles, where {it:k} > 1, will result in {it:k} + 1 dummies created. The first dummy will be coded 1 if the original variable is lower than or equal to the first specified percentile, the second dummy will be coded 1 if the original variable takes on values between the first and the second percentile and so on. An equal sign ({bf:=}) in front of a {it:numlist} causes the first and last dummy not to be created. Thus, specifying {it:k} percentiles will result in {it:k} - 1 dummies. If more than one dummy per variable is created, default names are {it:varname{hi:J}}, where {it:J} indicates the number of the dummy created from the original variable. The dummies' variable labels are {it:varname} ({it:P}), where {it:P} indicates the values of the percentiles the dummy represents. {phang} {opt cut} interprets {it:vlist} as cutpoints. If a {it:numlist} contains only one value, the created dummy variable will be coded 1 if the original variable is greater or equal to this value. Specifying {it:k} values, where {it:k} > 1, will result in {it:k} + 1 dummies created. The first dummy will be coded 1 if the original variable is lower than or equal to the first specified value, the second dummy will be coded 1 if the original variable falls into the range between the first and the second value and so on. An equal sign ({bf:=}) in front of a {it:numlist} causes the first and last dummy not to be created. Thus, specifying {it:k} values will result in {it:k} - 1 dummies. If more than one dummy per variable is created, default names are {it:varname{hi:J}}, where {it:J} indicates the number of the dummy created from the original variable. The dummies' variable labels are {it:varname} ({it:R}), where {it:R} indicates the range of values the dummy represents. Values may contain missings (i.e. {hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) and non-integers. If {it:numlist} has missing values, the created dummies will {hi:not} have missing values. {phang} {opt levels} creates one dummy for each level of the original variable. This is similar to what {help tabulate} does (note however, that only numerical variables are allowed with {cmd:todummy}). Extended missing values ({hi:.a}, {hi:.b}, ..., {hi:.z}) are copied from the original variable. Value labels from the original variable are used as variable names for the created dummies. If there are no value labels, default names are {it:varname{hi:J}}, where {it:{hi:}J} indicates the number of the dummy created from the original variable. The dummies' variable labels are {it:varname} ({it:L}), where {it:L} is the level. {phang} {opt median} assigns value 1 if the original variable is greater or equal to its median. The created dummies will not have variable labels. {phang} {opt q} creates one dummy for each quartile of the original variable. Thus, four dummies will be created from each variable. The first dummy will be coded 1 if the original variable is lower than or equal to its 25th percentile, the second dummy will be 1 if the original variable takes on values between the 25th and 50th percentile, and so on. The dummies' variable labels are {it:varname} ({it:P}), where {it:P} indicates the values of the percentile the dummy represents. {phang} {opt generate(namelist)} creates dummies {it:name}{hi:{it:1}}, {it:name}{hi:{it:2}}, ... . The number of names specified must equal the number of dummies to be created. {phang} {opt prefix(pre)} uses {it:pre} as prefix for created dummies. If {opt generate} and {opt suffix} are not specified, default prefix is {it:d_}, if one dummy per variable is to be created. Option {opt prefix} may be combined with {opt generate}, {opt suffix} and {opt stub}. {phang} {opt suffix(suff)} uses {it:suff} as suffix for created dummies. The option may be combined with {opt generate}, {opt prefix} and {opt stub}. {phang} {opt stub(stub)} uses {it:stub}{hi:{it:J}} as dummies' names. Here {it:J} is the number of the created dummy per variable. The number of stubs specified must equal the number of variables in {it:varlist}. The option may be combined with {opt prefix} and {opt suffix}. {phang} {opt replace} replaces existing variables in {it:varlist} with dummies. May not be specified with {opt generate}, {opt prefix}, {opt suffix} or {opt stub}. If more than one dummy per variable is created, {opt replace} is not allowed. {phang} {opt nonames} does not use value labels as dummies' names. If specified, dummies' names are {it:varname{hi:{it:J}}}, where {it:J} indicates the number of the dummy created from the original variable. Value labels will be used as variable labels for the created dummies. Only allowed with {opt levels}. {phang} [{cmd:r}]{opt label(lbllist)} specifies variable labels for the created dummies. If more dummies are created than names are specified, the dummies will not be labeled. Specifying {opt rlabel} allows re-using the labels for each original variable, meaning that dummies created from {it:varname1} will have the same labels as dummies created from {it:varname2}. Specify {it:{bf:"}lbl{bf:"}} if {it:lbl} contains embedded spaces. {phang} {opt novarlabel} does not use variable labels for the dummies. May not be specified with [{cmd:r}]{opt label}. {phang} {opt missing} creates a dummy for missing values in the original variable if specified with {opt levels}. If specified with {opt values}, {opt median}, or {opt q} it causes missing values ({hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) to be copied from the original variable. These values will by default be coded as system missings (.) if {it:numlist} has no missing values. If {it:numlist} has missing values, there will {hi:not} be missing values in the created dummies, unless {opt missing} is specified. {phang} {opt noskip}[{cmd:(}drop{cmd:)}] specifies how to handle existing dummies. In some cases {cmd:todummy} checks the existence of dummy names 'on the fly', meaning not until the dummies are created. If a dummy's name already exists in the dataset, default is to skip the creation of this dummy. This is not considered an error. Therefore a message is displayed but the program will not terminate. Specifying {opt noskip} will create a dummy in these cases, choosing a valid variable name. If {opt noskip(drop)} is specified, the existing variable will be {help drop}ped before creating the dummy. Note that this option differs from {opt replace}, which allows variables specified in {it:varlist} to be replaced with dummies. {phang} {opt ro(rel. operator)} specifies the relational operator used with {opt percentile} or {opt cut}. Default is {hi:>=}, meaning value 1 is assigned if the original variable is greater or equal to the specified value. Specifying {opt ro} has no effect if more than one dummy per variable is created. {phang} {opt noexclude} specifies that observations excluded by the {hi:if} and/or {hi:in} qualifiers are to be used to calculate the percentile or get the levels of the original variable. Only allowed with {opt percentile} or {opt levels}. {title:Examples} {phang2} {cmd: . sysuse nlsw88 ,clear} {pstd} Create a dummy variable indicating observations with wages above the median wage. {phang2} {cmd:. todummy wage ,values(50) percentile} {p_end} {pstd} Do the same using a {it:keyword} instead of {opt values} and {opt percentile} {phang2} {cmd:. todummy wage ,median} {p_end} {pstd} Create three dummy variables, the first indicating persons older than 45, the second indicating persons older than 40 and a third indicating persons between ages 38 and 40. {phang2} {cmd:. todummy age ,values(45 \ 40 \ = 38 40) cut} {p_end} {pstd} Create a dummy indicating persons working less than 40 hours. {phang2} {cmd:. todummy hours ,values(40) cut ro(<) generate(workhrs)} {p_end} {pstd} Create 3 x 4 dummies, representing the four quartiles for the variables age, wage and hours. {phang2} {cmd:. todummy age wage hours ,q rlabel("1st Q" "2nd Q" "3rd Q" "4th Q")} {p_end} {pstd} Create two dummies, one indicating managers, the second indicating sales. {phang2} {cmd:. todummy occupation ,values(2 \ 3) generate(managers sales)} {p_end} {pstd} Create a dummy for each level of race. Dummies names are white, black and other. {phang2} {cmd:. todummy race ,levels} {p_end} {pstd} Create two dummies, one indicating whites, the other indicating blacks or others. {phang2} {cmd:. todummy race ,values(1 \ 2 3) generate(white other)} {p_end} {title:Remarks} {pstd} Major changes have been introduced in version 1.2.0 21jul2011 of the program. The most important one regards the handling of missing values. In the current version missing values in the original variable will, in some cases, be coded 0 in the created dummies. This was not the case in versions prior to 1.2.0. Make sure to specify option {opt missing} to prevent this behavior if you do not find it convenient. Also option {opt noexclude} has changed. The default now is to only use observations not excluded by the {hi:if} and/or {hi:in} qualifiers, calculating percentiles and getting the levels of variables. It was the other way round in earlier versions. {pstd} Old syntax is still supported if compatible with new functionalities. No longer supported are options {opt binary} (introduced in version 1.1.1) and {opt cut(numlist)} if {it:numlist} contains more than one number. Also, in the current version, at least one option must be specified. {pstd} An older version (1.1.2 21may2011) of {cmd:todummy} is available from the author. {title:Acknowledgments} {pstd} The programs {help dummies} by Nicholas J. Cox and {help dummieslab} by Philippe Van Kerm and Nick Cox were inspiring. The latter is especially useful to create dummies for each level of the original variable in a more sophisticated way. {title:Author} {pstd}Daniel Klein, University of Kassel, klein.daniel.81@gmail.com {title:Also see} {psee} Online: {help tabulate}{p_end} {psee} if installed: {help dummies2}, {help dummieslab}, {help dummies} {p_end}