{smcl}
{* version 1.2.0 21jul2011}{...}
{viewerjumpto "Description" "todummy##desc"}{...}
{viewerjumpto "Options" "todummy##opt"}{...}
{viewerjumpto "Examples" "todummy##exmpl"}{...}
{cmd:help todummy}
{hline}

{title:Title}

{p 5}
{cmd:todummy} {hline 2} Create dummy variables


{title:Syntax}

{p 8}
{cmd:todummy} {varlist} {ifin} 
{cmd:,} {opt v:alues(vlist)} | {hi:{it:keyword}} [{it:options}]


{p 5}
where {it:vlist} has the form

{p 8}
[=]{it:{help numlist}} [ {cmd:\} [=]{it:{help numlist}}]


{synoptset 21 tabbed}{...}
{synopthdr}
{synoptline}
{p2coldent:*{opt v:alues(vlist)}}specify values to be coded '1'
{p_end}

{syntab:*{it:keywords}}
{synopt:{opt l:evels}}create one dummy for each level of the original 
variable
{p_end}
{synopt:{opt med:ian}}assign value 1 if the original variable is 
greater or equal to the 50th percentile
{p_end}
{synopt:{opt q}}create one dummy for each quartile of the original 
variable
{p_end}

{syntab:{it:options}}
{synopt:{opt p:ercentile}}interpret {it:vlist} as list of percentiles
{p_end}
{synopt:{opt c:ut}}interpret {it:vlist} as cutpoints
{p_end}
{synopt:{opt g:enerate(namelist)}}create dummies 
{it:name{hi:1}}, {it:name{hi:2}}, ...
{p_end}
{synopt:{opt pre:fix(pre)}}use {it:pre} as prefix for created dummies
{p_end}
{synopt:{opt suff:ix(suff)}}use {it:suff} as suffix for created dummies
{p_end}
{synopt:{opt stub(stub)}}use {it:stub{hi:1}}, {it:stub{hi:2}}, ... as 
dummies' names
{p_end}
{synopt:{opt replace}}replace existing variables with dummies
{p_end}
{synopt:{opt nosk:ip}[{cmd:(}drop{cmd:)}]}do not skip creation of 
existing dummies
{p_end}
{synopt:[{ul:{cmd:r}}]{opt l:abel(lbllist)}}use {it:label{hi:1}}, 
{it:label{hi:2}}, ... as variable labels
{p_end}
{synopt:{opt novarl:abel}}do not assign variable labels
{p_end}
{synopt:{opt m:issing}}create dummy for missings ({opt levels}) or copy 
missing values
{p_end}
{synopt:{opt ro(rel. operator)}}specify 
{help operator:relational operator}
{p_end}
{synopt:{opt noexc:lude}}use all observations to create dummies, even 
if excluded by {it:if} and/or {it:in} qualifiers  
{p_end}
{synopt:{opt nonam:es}}do not use value labels as variable names 
({opt levels})
{p_end}
{synoptline}
{p 5}* one of {opt values()} or {hi:{it:keyword}} must be specified

{marker desc}
{title:Description}

{pstd}
{cmd:todummy} creates dummies from variables in {it:varlist}. There 
may either be one or multiple dummies be created from each variable.
If one dummy per variable is created, default names are 
{it:{hi:d_}varname}. 

{marker opt}
{title:Options}

{dlgtab:Options}

{phang}
{opt values(vlist)} assigns value 1 if the original variable equals the 
values specified in {it:vlist}, 0 otherwise. If more than one 
{it:numlist} is specified, the first created dummy will be coded '1' if 
the original variable equals the values in the first {it:numlist}, the 
second dummy will be '1' if the original variable equals the values in 
the second {it:numlist} and so on. If more than one dummy is created 
the default names are {it:varname{hi:J}}, where {it:{hi:}J} indicates 
the number of the dummy created from the original variable. The dummies 
will not be labeled. Non-integers and missing values 
(i.e. {hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) are allowed in 
{it:numlist}. If {it:numlist} has missing values, the created dummies 
will {hi:not} have missing values.

{phang}
{opt levels} creates one dummy for each level of the original variable. 
This is similar to what {help tabulate} does (note however, that only 
numerical variables are allowed with {cmd:todummy}). Extended missing 
values ({hi:.a}, {hi:.b}, ..., {hi:.z}) are copied from the original 
variable. Value labels from the original variable are used as variable 
names for the created dummies. If there are no value labels, 
default names are {it:varname{hi:J}}, where {it:{hi:}J} indicates the 
number of the dummy created from the original variable. The dummies 
are labeled {it:varname} ({it:L}), where {it:L} is the level.

{phang}
{opt median} assigns value 1 if the original variable is greater or 
equal to its median. The created dummies will not be labeled.

{phang}
{opt q} creates one dummy for each quartile of the original variable. 
Thus, four dummies will be created from each variable. The first dummy 
will be coded '1' if the original variable is lower than or equal to 
its 25th percentile, the second dummy will be '1' if the original 
variable takes on values between the 25th and 50th percentile, and so 
on. The dummies will be labeled {it:varname} ({it:R}), where {it:R} 
indicates the values of the percentile the dummy represents.

{phang}
{opt percentile} interprets {it:vlist} as a list of percentiles (which 
must be between 0 and 100). If a {it:numlist} specified contains only 
one percentile, the created dummy variable will be coded '1' if the 
original variable is greater or equal to this percentile. Specifying 
{it:k} percentiles, where {it:k} > 1, will result in {it:k} + 1 dummies 
created. The first dummy will be coded '1' if the original variable is 
lower than or equal to the first specified percentile, the second dummy 
will be coded '1' if the original variable takes on values between the 
first and the second percentile and so on. An equal sign ({it:=}) in 
front of a {it:numlist} causes the first and last dummy not to be 
created. Thus, specifying {it:k} percentiles will result in {it:k} - 1 
dummies. If more than one dummy per variable is created, default names 
are {it:varname{hi:J}}, where {it:J} indicates the number of the dummy 
created from the original variable. The dummies will be labeled 
{it:varname} ({it:R}), where {it:R} indicates the values of the percentiles 
the dummy represents.

{phang}
{opt cut}  interprets {it:vlist} as cutpoints. If a {it:numlist} 
specified contains only one value, the created dummy variable will be 
coded '1' if the original variable is greater or equal to this value. 
Specifying {it:k} values, where {it:k} > 1, will result in {it:k} + 1 
dummies created. The first dummy will be coded '1' if the original 
variable is lower than or equal to the first specified value, the 
second dummy will be coded '1' if the original variable falls into the 
range between the first and the second value and so on. An equal sign 
({it:=}) in front of a {it:numlist} causes the first and last dummy not 
to be created. Thus, specifying {it:k} values will result in {it:k} - 1 
dummies. If more than one dummy per variable is created, default names 
are {it:varname{hi:J}}, where {it:J} indicates the number of the dummy 
created from the original variable. The dummies will be labeled 
{it:varname} ({it:R}), where {it:R} indicates the range of values the 
dummy represents. Values may contain missings 
(i.e. {hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) and non-integers. If 
{it:numlist} has missing values, the created dummies will {hi:not} have 
missing values.

{phang}
{opt generate(namelist)} creates dummies {it:name}{hi:{it:1}}, 
{it:name}{hi:{it:2}}, ... . The number of names specified must equal 
the number of dummies to be created.

{phang}
{opt prefix(pre)} uses {it:pre} as prefix for created dummies. If 
{opt generate} and {opt suffix} are not specified, default prefix is 
{it:d_}, if one dummy per variable is to be created. Option 
{opt prefix} may be used together with {opt generate}, {opt suffix} 
and {opt stub}.

{phang}
{opt suffix(suff)} uses {it:suff} as suffix for created dummies. The 
option may be used together with {opt generate}, {opt prefix} and 
{opt stub}.

{phang}
{opt stub(stub)} uses {it:stub}{hi:{it:J}} as dummies' names. Here 
{it:J} is the number of the created dummy per variable. The number of 
names specified must equal the number of variables in {it:varlist}. 
The option may be used with {opt prefix} and {opt suffix}.

{phang}
{opt replace} replaces existing variables in {it:varlist} with 
dummies. May not be specified with {opt generate}, {opt prefix}, 
{opt suffix} or {opt stub}. If more than one dummy per variable is 
be created, {opt replace} is not allowed.  

{phang}
{opt noskip}[{cmd:(}drop{cmd:)}] specifies how to handle existing 
dummies. In some cases {cmd:todummy} checks the existence of 
dummy names 'on the fly', meaning not until the dummies are created. 
If a dummy's name already exists in the dataset, default is to skip 
the creation of this dummy. This is not considered an error. Therefore 
a message is displayed but the program will not terminate. Specifying 
{opt noskip} will create a dummy in these cases, choosing a valid 
variable name. If {opt noskip(drop)} is specified, the existing 
variable will be {help drop}ped before creating the dummy. Note that 
this option differs from {opt replace}, which allows variables 
specified in {it:varlist} to be replaced with dummies. 

{phang}
[{cmd:r}]{opt label(lbllist)} specifies variable labels for the created 
dummies. If more dummies are created than names are specified, the 
dummies will not be labeled. Specifying {opt rlabels} allows re-using 
the labels for each original variable, meaning that dummies created 
from {it:varname1} will have the same labels as dummies created from 
{it:varname2}. Specify {it:{bf:"}lbl{bf:"}} if {it:lbl} contains 
embedded spaces. 

{phang}
{opt novarlabel} does not use variable labels for the dummies. May 
not be specified with [{cmd:r}]{opt label}.

{phang}
{opt missing} creates a dummy for missing values in the original 
variable if specified with {opt levels}. If specified with 
{opt values}, {opt median}, or {opt q} it causes missing values 
({hi:.}, {hi:.a}, {hi:.b}, ..., {hi:.z}) to be copied from the original 
variable. These values will by default be coded as system missings (.) 
if {it:numlist} has no missing values. If {it:numlist} has missing 
values, there will {hi:not} be missing values in the created dummies, 
unless {opt missing} is specified.

{phang}
{opt ro(rel. operator)} specifies the relational operator. Default is 
{it:>=}, meaning value 1 is assigned if the original variable is 
greater or equal to the specified value. Specifying {opt ro} has no 
effect if more than one dummy per variable is to be created.

{phang}
{opt noexclude} specifies that observations excluded by the {it:if} 
and/or {it:in} qualifiers are to be used to calculate the percentile 
or get the levels of the original variable. Only allowed with 
{opt percentile} or {opt levels}.

{phang}
{opt nonames} does not use value labels as dummies' names. If 
specified, the created dummies' names will be {it:varname{hi:{it:J}}}, 
where {it:J} indicates the number of the dummy created from the 
original variable. Value labels will be used as variable labels for the 
created dummies. Only allowed with {opt levels}.

{marker exmpl}
{title:Examples}

{phang2}
{cmd: . sysuse nlsw88 ,clear}

{pstd}
Create a dummy variable indicating observations with wages above the 
median wage.

{phang2}
{cmd:. todummy wage ,values(50) percentile}
{p_end}

{pstd}
Do the same using a {it:keyword} instead of {opt values} and 
{opt percentile}

{phang2}
{cmd:. todummy wage ,median}
{p_end}

{pstd}
Create three dummy variables, the first indicating persons older than 
45, the second indicating persons older than 40 and a third indicating 
persons between ages 38 and 40.

{phang2}
{cmd:. todummy age ,values(45 \ 40 \ = 38 40) cut}
{p_end}

{pstd}
Create a dummy indicating persons working less than 40 hours.

{phang2}
{cmd:. todummy hours ,values(40) cut ro(<) generate(workhrs)}
{p_end}

{pstd}
Create 3 x 4 dummies, representing the four quartiles for the 
variables age, wage and hours.

{phang2}
{cmd:. todummy age wage hours ,q rlabel("1st Q" "2nd Q" "3rd Q" "4th Q")}
{p_end}

{pstd}
Do the same but use {opt q} in a {it:numlist} and do not label the 
dummies. Note that {opt q} expands to '25 50 75' inside {opt values}. 
Remember to also specify {opt percentile} to indicate that the 
numbers are interpreted as such.

{phang2}
{cmd:. todummy age wage hours ,values(q) percentile novarlabel}
{p_end}

{pstd}
Create two dummies, one indicating managers, the second indicating 
sales.

{phang2}
{cmd:. todummy occupation ,values(2 \ 3) generate(managers sales})
{p_end}

{pstd}
Create a dummy for each level of race. Dummies names are white, black 
and other.

{phang2}
{cmd:. todummy race ,levels}
{p_end}

{pstd}
Create two dummies, one indicating whites, the other indicating blacks 
or others.

{phang2}
{cmd:. todummy race ,values(1 \ 2 3) generate(white other)}
{p_end}


{title:Remarks}

{pstd}
Major changes have been introduced in version 1.2.0 21jul2011 of the 
program. The most important one regards the handling of missing 
values. In the current version missing values in the original variable 
will, in some cases, be coded '0' in the created dummies. This was not 
the case in versions prior to 1.2.0. Make sure to specify option 
{opt missing} to prevent this behavior if you do not find it 
convenient. Also option {opt noexclude} has changed. The default now 
is to only use observations not excluded by the {it:if} and/or {it:in} 
qualifiers, calculating percentiles and getting the levels of 
variables. It was the other way round in earlier versions.

{pstd}
Old syntax is still supported if compatible with new functionalities. 
No longer supported are options {opt binary} (introduced in version 
1.1.1) and {opt cut(numlist)} if {it:numlist} contains more than one 
number. Also, in the current version, at least one option must be 
specified.

{pstd}
An older version (1.1.2 21may2011) of {cmd:todummy} is available from 
the author.


{title:Acknowledgments}

{pstd}
The programs {help dummies} by Nicholas J. Cox and {help dummieslab} by 
Philippe Van Kerm and Nick Cox were inspiring. The latter is especially 
useful to create dummies for each level of the original variable in a 
more sophisticated way.


{title:Author}

{pstd}Daniel Klein, University of Bamberg, klein.daniel.81@gmail.com


{title:Also see}

{psee}
Online: {help tabulate}{p_end}

{psee}
if installed: {help dummieslab}, {help dummies}
{p_end}