{smcl} {* version 1.2.0 21jul2011}{...} {viewerjumpto "Description" "todummy##desc"}{...} {viewerjumpto "Options" "todummy##opt"}{...} {viewerjumpto "Examples" "todummy##exmpl"}{...} {cmd:help todummy} {hline} {title:Title} {p 5} {cmd:todummy} {hline 2} Create dummy variables {title:Syntax} {p 8} {cmd:todummy} {varlist} {ifin} {cmd:,} {opt v:alues(vlist)} | {hi:{it:keyword}} [{it:options}] {p 5} where {it:vlist} has the form {p 8} [=]{it:{help numlist}} [ {cmd:\} [=]{it:{help numlist}}] {synoptset 21 tabbed}{...} {synopthdr} {synoptline} {p2coldent:*{opt v:alues(vlist)}}specify values to be coded '1' {p_end} {syntab:*{it:keywords}} {synopt:{opt l:evels}}create one dummy for each level of the original variable {p_end} {synopt:{opt med:ian}}assign value 1 if the original variable is greater or equal to the 50th percentile {p_end} {synopt:{opt q}}create one dummy for each quartile of the original variable {p_end} {syntab:{it:options}} {synopt:{opt p:ercentile}}interpret {it:vlist} as list of percentiles {p_end} {synopt:{opt c:ut}}interpret {it:vlist} as cutpoints {p_end} {synopt:{opt g:enerate(namelist)}}create dummies {it:name{hi:1}}, {it:name{hi:2}}, ... {p_end} {synopt:{opt pre:fix(pre)}}use {it:pre} as prefix for created dummies {p_end} {synopt:{opt suff:ix(suff)}}use {it:suff} as suffix for created dummies {p_end} {synopt:{opt stub(stub)}}use {it:stub{hi:1}}, {it:stub{hi:2}}, ... as dummies' names {p_end} {synopt:{opt replace}}replace existing variables with dummies {p_end} {synopt:{opt nosk:ip}[{cmd:(}drop{cmd:)}]}do not skip creation of existing dummies {p_end} {synopt:[{ul:{cmd:r}}]{opt l:abel(lbllist)}}use {it:label{hi:1}}, {it:label{hi:2}}, ... as variable labels {p_end} {synopt:{opt novarl:abel}}do not assign variable labels {p_end} {synopt:{opt m:issing}}create dummy for missings ({opt levels}) or copy missing values {p_end} {synopt:{opt ro(rel. operator)}}specify {help operator:relational operator} {p_end} {synopt:{opt noexc:lude}}use all observations to create dummies, even if excluded by {it:if} and/or {it:in} qualifiers {p_end} {synopt:{opt nonam:es}}do not use value labels as variable names ({opt levels}) {p_end} {synoptline} {p 5}* one of {opt values()} or {hi:{it:keyword}} must be specified {marker desc} {title:Description} {pstd} {cmd:todummy} creates dummies from variables in {it:varlist}. There may either be one or multiple dummies be created from each variable. If one dummy per variable is created, default names are {it:{hi:d_}varname}. {marker opt} {title:Options} {dlgtab:Options} {phang} {opt values(vlist)} assigns value 1 if the original variable equals the values specified in {it:vlist}, 0 otherwise. If more than one {it:numlist} is specified, the first created dummy will be coded '1' if the original variable equals the values in the first {it:numlist}, the second dummy will be '1' if the original variable equals the values in the second {it:numlist} and so on. If more than one dummy is created the default names are {it:varname{hi:J}}, where {it:{hi:}J} indicates the number of the dummy created from the original variable. The dummies will not be labeled. Non-integers and missing values (i.e. {hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) are allowed in {it:numlist}. If {it:numlist} has missing values, the created dummies will {hi:not} have missing values. {phang} {opt levels} creates one dummy for each level of the original variable. This is similar to what {help tabulate} does (note however, that only numerical variables are allowed with {cmd:todummy}). Extended missing values ({hi:.a}, {hi:.b}, ..., {hi:.z}) are copied from the original variable. Value labels from the original variable are used as variable names for the created dummies. If there are no value labels, default names are {it:varname{hi:J}}, where {it:{hi:}J} indicates the number of the dummy created from the original variable. The dummies are labeled {it:varname} ({it:L}), where {it:L} is the level. {phang} {opt median} assigns value 1 if the original variable is greater or equal to its median. The created dummies will not be labeled. {phang} {opt q} creates one dummy for each quartile of the original variable. Thus, four dummies will be created from each variable. The first dummy will be coded '1' if the original variable is lower than or equal to its 25th percentile, the second dummy will be '1' if the original variable takes on values between the 25th and 50th percentile, and so on. The dummies will be labeled {it:varname} ({it:R}), where {it:R} indicates the values of the percentile the dummy represents. {phang} {opt percentile} interprets {it:vlist} as a list of percentiles (which must be between 0 and 100). If a {it:numlist} specified contains only one percentile, the created dummy variable will be coded '1' if the original variable is greater or equal to this percentile. Specifying {it:k} percentiles, where {it:k} > 1, will result in {it:k} + 1 dummies created. The first dummy will be coded '1' if the original variable is lower than or equal to the first specified percentile, the second dummy will be coded '1' if the original variable takes on values between the first and the second percentile and so on. An equal sign ({it:=}) in front of a {it:numlist} causes the first and last dummy not to be created. Thus, specifying {it:k} percentiles will result in {it:k} - 1 dummies. If more than one dummy per variable is created, default names are {it:varname{hi:J}}, where {it:J} indicates the number of the dummy created from the original variable. The dummies will be labeled {it:varname} ({it:R}), where {it:R} indicates the values of the percentiles the dummy represents. {phang} {opt cut} interprets {it:vlist} as cutpoints. If a {it:numlist} specified contains only one value, the created dummy variable will be coded '1' if the original variable is greater or equal to this value. Specifying {it:k} values, where {it:k} > 1, will result in {it:k} + 1 dummies created. The first dummy will be coded '1' if the original variable is lower than or equal to the first specified value, the second dummy will be coded '1' if the original variable falls into the range between the first and the second value and so on. An equal sign ({it:=}) in front of a {it:numlist} causes the first and last dummy not to be created. Thus, specifying {it:k} values will result in {it:k} - 1 dummies. If more than one dummy per variable is created, default names are {it:varname{hi:J}}, where {it:J} indicates the number of the dummy created from the original variable. The dummies will be labeled {it:varname} ({it:R}), where {it:R} indicates the range of values the dummy represents. Values may contain missings (i.e. {hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) and non-integers. If {it:numlist} has missing values, the created dummies will {hi:not} have missing values. {phang} {opt generate(namelist)} creates dummies {it:name}{hi:{it:1}}, {it:name}{hi:{it:2}}, ... . The number of names specified must equal the number of dummies to be created. {phang} {opt prefix(pre)} uses {it:pre} as prefix for created dummies. If {opt generate} and {opt suffix} are not specified, default prefix is {it:d_}, if one dummy per variable is to be created. Option {opt prefix} may be used together with {opt generate}, {opt suffix} and {opt stub}. {phang} {opt suffix(suff)} uses {it:suff} as suffix for created dummies. The option may be used together with {opt generate}, {opt prefix} and {opt stub}. {phang} {opt stub(stub)} uses {it:stub}{hi:{it:J}} as dummies' names. Here {it:J} is the number of the created dummy per variable. The number of names specified must equal the number of variables in {it:varlist}. The option may be used with {opt prefix} and {opt suffix}. {phang} {opt replace} replaces existing variables in {it:varlist} with dummies. May not be specified with {opt generate}, {opt prefix}, {opt suffix} or {opt stub}. If more than one dummy per variable is be created, {opt replace} is not allowed. {phang} {opt noskip}[{cmd:(}drop{cmd:)}] specifies how to handle existing dummies. In some cases {cmd:todummy} checks the existence of dummy names 'on the fly', meaning not until the dummies are created. If a dummy's name already exists in the dataset, default is to skip the creation of this dummy. This is not considered an error. Therefore a message is displayed but the program will not terminate. Specifying {opt noskip} will create a dummy in these cases, choosing a valid variable name. If {opt noskip(drop)} is specified, the existing variable will be {help drop}ped before creating the dummy. Note that this option differs from {opt replace}, which allows variables specified in {it:varlist} to be replaced with dummies. {phang} [{cmd:r}]{opt label(lbllist)} specifies variable labels for the created dummies. If more dummies are created than names are specified, the dummies will not be labeled. Specifying {opt rlabels} allows re-using the labels for each original variable, meaning that dummies created from {it:varname1} will have the same labels as dummies created from {it:varname2}. Specify {it:{bf:"}lbl{bf:"}} if {it:lbl} contains embedded spaces. {phang} {opt novarlabel} does not use variable labels for the dummies. May not be specified with [{cmd:r}]{opt label}. {phang} {opt missing} creates a dummy for missing values in the original variable if specified with {opt levels}. If specified with {opt values}, {opt median}, or {opt q} it causes missing values ({hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) to be copied from the original variable. These values will by default be coded as system missings (.) if {it:numlist} has no missing values. If {it:numlist} has missing values, there will {hi:not} be missing values in the created dummies, unless {opt missing} is specified. {phang} {opt ro(rel. operator)} specifies the relational operator. Default is {it:>=}, meaning value 1 is assigned if the original variable is greater or equal to the specified value. Specifying {opt ro} has no effect if more than one dummy per variable is to be created. {phang} {opt noexclude} specifies that observations excluded by the {it:if} and/or {it:in} qualifiers are to be used to calculate the percentile or get the levels of the original variable. Only allowed with {opt percentile} or {opt levels}. {phang} {opt nonames} does not use value labels as dummies' names. If specified, the created dummies' names will be {it:varname{hi:{it:J}}}, where {it:J} indicates the number of the dummy created from the original variable. Value labels will be used as variable labels for the created dummies. Only allowed with {opt levels}. {marker exmpl} {title:Examples} {phang2} {cmd: . sysuse nlsw88 ,clear} {pstd} Create a dummy variable indicating observations with wages above the median wage. {phang2} {cmd:. todummy wage ,values(50) percentile} {p_end} {pstd} Do the same using a {it:keyword} instead of {opt values} and {opt percentile} {phang2} {cmd:. todummy wage ,median} {p_end} {pstd} Create three dummy variables, the first indicating persons older than 45, the second indicating persons older than 40 and a third indicating persons between ages 38 and 40. {phang2} {cmd:. todummy age ,values(45 \ 40 \ = 38 40) cut} {p_end} {pstd} Create a dummy indicating persons working less than 40 hours. {phang2} {cmd:. todummy hours ,values(40) cut ro(<) generate(workhrs)} {p_end} {pstd} Create 3 x 4 dummies, representing the four quartiles for the variables age, wage and hours. {phang2} {cmd:. todummy age wage hours ,q rlabel("1st Q" "2nd Q" "3rd Q" "4th Q")} {p_end} {pstd} Do the same but use {opt q} in a {it:numlist} and do not label the dummies. Note that {opt q} expands to '25 50 75' inside {opt values}. Remember to also specify {opt percentile} to indicate that the numbers are interpreted as such. {phang2} {cmd:. todummy age wage hours ,values(q) percentile novarlabel} {p_end} {pstd} Create two dummies, one indicating managers, the second indicating sales. {phang2} {cmd:. todummy occupation ,values(2 \ 3) generate(managers sales}) {p_end} {pstd} Create a dummy for each level of race. Dummies names are white, black and other. {phang2} {cmd:. todummy race ,levels} {p_end} {pstd} Create two dummies, one indicating whites, the other indicating blacks or others. {phang2} {cmd:. todummy race ,values(1 \ 2 3) generate(white other)} {p_end} {title:Remarks} {pstd} Major changes have been introduced in version 1.2.0 21jul2011 of the program. The most important one regards the handling of missing values. In the current version missing values in the original variable will, in some cases, be coded '0' in the created dummies. This was not the case in versions prior to 1.2.0. Make sure to specify option {opt missing} to prevent this behavior if you do not find it convenient. Also option {opt noexclude} has changed. The default now is to only use observations not excluded by the {it:if} and/or {it:in} qualifiers, calculating percentiles and getting the levels of variables. It was the other way round in earlier versions. {pstd} Old syntax is still supported if compatible with new functionalities. No longer supported are options {opt binary} (introduced in version 1.1.1) and {opt cut(numlist)} if {it:numlist} contains more than one number. Also, in the current version, at least one option must be specified. {pstd} An older version (1.1.2 21may2011) of {cmd:todummy} is available from the author. {title:Acknowledgments} {pstd} The programs {help dummies} by Nicholas J. Cox and {help dummieslab} by Philippe Van Kerm and Nick Cox were inspiring. The latter is especially useful to create dummies for each level of the original variable in a more sophisticated way. {title:Author} {pstd}Daniel Klein, University of Bamberg, klein.daniel.81@gmail.com {title:Also see} {psee} Online: {help tabulate}{p_end} {psee} if installed: {help dummieslab}, {help dummies} {p_end}