{smcl}
{hline}
help for {cmd:expgen} and {cmd:lexpgen}{right:(Roger Newson)}
{hline}

{title:Duplicate observations sorted in original order with generated variables}

{p 8 21 2}
{cmd:expgen} [ [ {help newvar:{it:newvarname1}} ] {cmd:=} ] {it:ncexp} {ifin} [ {cmd:,} 
{cmdab:o:ldseq}{cmd:(}{help newvar:{it:newvarname2}}{cmd:)}
{cmdab:c:opyseq}{cmd:(}{help newvar:{it:newvarname3}}{cmd:)}
{cmdab:s:ortedby}{cmd:(}{it:sortedby_interpretation}{cmd:)}
{cmdab:or:der}
{cmdab:m:issing} {cmdab:z:ero}
{cmd:fast}
]

{p 8 21 2}
{cmd:lexpgen} [ {help newvar:{it:newvarname1}} ] {ifin} {cmd:,} {cmdab:le:vels(}{help numlist:{it:numlist}}{cmd:)} [ {cmdab:clev:els}{cmd:(}{help newvarname:{it:newvarname4}}{cmd:)}
{it:expgen_options}
]

{pstd}
where {it:sortedby_interpretation} is one of

{pstd}
{cmdab:i:gnore} | {cmdab:g:roup} | {cmdab:u:nique}

{pstd}
or any abbreviation of any of these,
and {it:expgen_options} are any options used by {cmd:expgen}.


{title:Description}

{pstd}
{cmd:expgen} is an extended version of {helpb expand}.
It replaces each observation in the
current dataset with {it:nc} copies of the observation, sorted in the original order,
where {it:nc} is equal to the integer part {cmd:int(}{it:ncexp}{cmd:)} of the required expression {it:ncexp}.
{cmd:expgen} may also add new variables.
These are {help newvar:{it:newvarname1}} (containing the value of the expression {it:ncexp}),
{help newvar:{it:newvarname2}} (containing the sequential order of the original observation in the old dataset),
and {help newvar:{it:newvarname3}} (containing the sequential order of each copy in the set of copies
generated from its original observation).
If {cmd:missing} and/or {cmd:zero} is specified,
then observations in the old dataset with missing and/or non-positive values
for {cmd:int(}{it:ncexp}{cmd:)} are each replaced by one observation in the new dataset.
{cmd:lexpgen} is an alternative version of {cmd:expgen}.
It replaces each observation in the
current dataset with one copy of the observation for each of a user-specified input list of numbers,
sorted primarily in the original order,
and secondarily by the numbers in the list.
It may add the new variables added by {cmd:expgen},
and/or another new variable {help newvar:{it:newvarname4}},
containing the number in the input list of numbers corresponding to the copy.


{title:Options for {cmd:expgen} and {cmd:lexpgen}}

{phang}
{cmd:oldseq(}{help newvar:{it:newvarname2}}{cmd:)} specifies a new variable to be generated, containing the sequential
order of the observation in the old dataset from which the current
observation was copied, as defined in the old dataset.
This old sequential order is defined after the exclusion from the old dataset of observations
not satisfying the {helpb if} and {helpb in} requirements, and after the exclusion of
observations with missing values of {it:ncexp} (if {cmd:missing} was not specified)
and of observations with non-positive values of {cmd:int(}{it:ncexp}{cmd:)}
(if {cmd:zero} was not specified).
(See {cmd:missing} and {cmd:zero} below.)

{phang}
{cmd:copyseq(}{help newvar:{it:newvarname3}}{cmd:)} specifies a new variable to be generated, containing the sequential
order of the copied observation in the new dataset, in the set of
observations copied from the same original observation in the old dataset.
If an observation in the old dataset has a value {cmd:int(}{it:ncexp}{cmd:)} equal to {it:k},
then the {it:k} duplicates of that observation in the new dataset have values
of the {cmd:copyseq()} variable in ascending order from 1 to {it:k}.

{phang}
{cmd:sortedby(}{it:sortedby_interpretation}{cmd:)} specifies the interpretation given to the {varlist}
by which the input dataset is sorted,
given as output by the {help macro:extended macro function} {cmd:sortedby}.
This option is used to specify the {varlist} by which the output dataset will be sorted.
If {cmd:sortedby(ignore)} is specified,
then the {varlist} by which the input dataset is sorted is ignored,
and the output dataset is sorted primarily by the {cmd:oldseq()} variable (if specified),
and secondarily by the {cmd:copyseq()} variable (if specified).
If {cmd:sortedby(group)} is specified,
then the {varlist} by which the input dataset is sorted is assumed to be a group identifier,
possibly with multiple observations with the same combination of values,
and the output dataset is sorted primarily by these variables,
and secondly by the {cmd:oldseq()} variable (if specified),
and thirdly by the {cmd:copyseq()} variable (if specified).
If {cmd:sortedby(unique)} is specified,
then the {varlist} by which the input dataset is sorted is assumed to be a unique observation identifier,
with no more than one observation with the same combination of values
in the subset of observations specified by the {helpb in} and {helpb if} qualifiers,
and the output dataset is sorted primarily by these variables,
and secondly by the {cmd:copyseq()} variable (if specified).
If the user specifies {cmd:sortedby(unique)}
and the dataset is in fact not sorted,
or is in fact sorted by variables which do not uniquely identify the observations,
then {cmd:expgen} fails with an error message.
If the user does not specify a {cmd:sortedby()} option,
then {cmd:sortedby(ignore)} is assumed.
Note that, whichever {cmd:sortedby()} option is specified,
the order of the observations in the output dataset will be the same,
and will be primarily the order of the observations in the input dataset from which the copies were made,
and secondarily by the {cmd:copyseq()} variable (if specified),
or equivalently by the {cmd:clevel()} variable (if specified by {cmd:lexpgen}).
The {cmd:sortedby()} option only affects the list of variables recognized by Stata
as defining the sorting order,
as reported by {helpb describe}.

{phang}
{cmd:order} specifies that the variables by which the output dataset is sorted
will be reordered to the start of the {help order:variable order}.
These variables are as specified by the {cmd:sortedby()} option.
If {cmd:order} is specified,
then they will be the first variables in the output dataset listed by {helpb describe}.

{phang}
{cmd:missing} specifies that observations in the old dataset with missing values of
{it:ncexp} are each to be replaced in the new dataset with a single observation.
If {cmd:missing} is absent, then such old observations are deleted from the new dataset.

{phang}
{cmd:zero} specifies that observations in the old dataset with non-positive values
of {cmd:int(}{it:ncexp}{cmd:)} are each to be replaced in the new dataset with a single observation.
If {cmd:zero} is absent, then such old observations are deleted from the new dataset.

{phang}
{cmd:fast} is an option for programmers.
It specifies that {cmd:expgen} and {cmd:lexpgen} will take no action to restore the original dataset
if {cmd:expgen} or {cmd:lexpgen} fails,
or if the user presses {help break:Break}.


{title:Options for {cmd:lexpgen} only}

{phang}
{cmd:levels(}{help numlist:{it:numlist}}{cmd:)} is compulsory.
It specifies an input list of numeric levels
which {cmd:lexpgen} will use to create an expanded dataset,
with 1 observation per number in the list
per original observation in the unexpanded dataset.
Note that the input list may contain {help missing:numeric missing values},
but repeated values are removed.

{phang}
{cmd:clevel(}{help newvarname:{it:newvarname}}{cmd:)} specifies the name of a new numeric variable,
containing, in each observation in the expanded dataset,
the corresponding number in the list of numbers specified by the {cmd:levels()} option.
Observations in the new dataset will then be multiple copies of observations in the old unexpanded dataset,
sorted
primarily by the position of the corresponding observation in the old unexpanded dataset,
and secondarily by the value of the new variable specified by {cmd:clevels()}.


{title:Remarks}

{pstd}
{cmd:expgen} is designed as an improved version of {helpb expand}.
Unlike {helpb expand}, it returns duplicate observations sorted in the original order of the
corresponding observations in the old dataset, from which they were copied.
There is also the option of adding new variables to the duplicate observations.
{cmd:expgen} was designed for use, in a medical setting, in the case when the
original dataset is from a spreadsheet with one row per patient, and with
multiple columns containing multiple repeated measurements on the same patient
at different times and/or on different parts of the same patient's body.
We might want to expand this dataset to a new dataset with 1 observation per
repeated measurement, and then perform a repeated measures analysis.
The {cmd:copyseq()} variable can be used to generate a new variable in this new dataset,
containing the repeated measures.
The new variable might have values each copied from one of a list of variables in the original dataset,
depending on the value of the {cmd:copyseq()} variable.
(See Examples.
In SAS, this kind of duplication is done using a SAS data step
containing a SAS {cmd:OUTPUT} statement inside a SAS {cmd:DO}-loop,
iterating over a list of variables defined as a SAS array.)

{pstd}
{cmd:expgen} can be used with the {helpb keyby} package, downloadable from {help ssc:SSC},
to enforce the relational database model in Stata.
In the relational database model, a dataset is viewed as a mathematical function,
whose domain is the set of all available value combinations of a list of key variables,
and whose range is the set of all possible value combinations of the non-key variables.
If a dataset is keyed using {helpb keyby},
and is then expanded using {cmd:expgen},
specifying the {cmd:copyseq()} option and {cmd:sortedby(unique)},
then the expanded dataset generated by {cmd:expgen}
will automatically be a relational dataset,
keyed primarily by the original key variables specified by {helpb keyby},
and secondly by the {cmd:copyseq()} variable.
And, if the user also specifies the {cmd:order} option with {cmd:expgen},
then the key variables will be the first in the order of variables in the output dataset.


{title:Examples}

{p 8 12 2}{cmd:. expgen nreps=rep78, old(modseq) copy(repseq) miss zero}{p_end}

{p 8 12 2}{cmd:. sysuse auto, clear}{p_end}
{p 8 12 2}{cmd:. sort foreign make}{p_end}
{p 8 12 2}{cmd:. expgen nreps=rep78, copy(repseq) sortedby(unique) order}{p_end}
{p 8 12 2}{cmd:. describe}{p_end}

{pstd}
The following example uses the {help sysuse:census} dataset, and expands it from a dataset
of US states to a dataset of combinations of US state and age group:

{p 8 12 2}{cmd:. sysuse census, clear}{p_end}
{p 8 12 2}{cmd:. expgen =4, copy(agegp)}{p_end}
{p 8 12 2}{cmd:. lab def agegp 1 "0-4" 2 "5-17" 3 "18-64" 4 "65+"}{p_end}
{p 8 12 2}{cmd:. lab val agegp agegp}{p_end}
{p 8 12 2}{cmd:. gene popul = (agegp==1)*poplt5 + (agegp==2)*pop5_17 + (agegp==3)*(pop18p-pop65p) + (agegp==4)*pop65p}{p_end}
{p 8 12 2}{cmd:. drop pop poplt5 pop5_17 pop18p pop65p}{p_end}
{p 8 12 2}{cmd:. desc}{p_end}
{p 8 12 2}{cmd:. list state agegp popul}{p_end}

{pstd}
The following example uses the {help sysuse:housing} dataset,
and uses {cmd:lexpgen} to expand it from a dataset
of US states to a dataset of combinations of US state and current year.
Note that the option {cmd:sortedby(unique)}
specifies that the variable {cmd:state} identifies the input observations uniquely,
and ensures that the output dataset is identified as being sorted by {cmd:state} and {cmd:year}:

{p 8 12 2}{cmd:. use http://www.stata-press.com/data/r9/hsng.dta, clear}{p_end}
{p 8 12 2}{cmd:. desc}{p_end}
{p 8 12 2}{cmd:. gene pop70=pop*100/(100+popgrow)}{p_end}
{p 8 12 2}{cmd:. lab var pop70 "Population in 1970"}{p_end}
{p 8 12 2}{cmd:. lexpgen, levels(1970 1980) clevel(year) sortedby(unique)}{p_end}
{p 8 12 2}{cmd:. gene popul = pop70*(year==1970) + pop*(year==1980)}{p_end}
{p 8 12 2}{cmd:. format popul %10.1f}{p_end}
{p 8 12 2}{cmd:. lab var year "Current year"}{p_end}
{p 8 12 2}{cmd:. lab var popul "Current population"}{p_end}
{p 8 12 2}{cmd:. desc, fu}{p_end}
{p 8 12 2}{cmd:. list state year popul, sepby(state)}{p_end}


{title:Author}

{pstd}
Roger Newson, Queen Mary University London, UK.{break}
Email: {browse "mailto:r.newson@qmul.ac.uk":r.newson@qmul.ac.uk}


{title:Also see}

{p 4 13 2}
{bind: }Manual: {hi:[D] expand}, {hi:[D] expandcl}, {hi:[D] joinby}, {hi:[D] contract}, {hi:[D] collapse}, {hi:[D] reshape},
{hi:[D] order}, {hi:[D] describe}
{p_end}
{p 4 13 2}
On-line: help for {helpb expand}, {helpb expandcl}, {helpb joinby}, {helpb contract}, {helpb collapse}, {helpb reshape}, {helpb order}, {helpb describe}{break}
         help for {helpb keyby}, {helpb expandby} if installed
{p_end}