{smcl}
{hline}
help for {cmd:expgen}{right:(Roger Newson)}
{hline}

{title:Duplicate observations sorted in original order with generated variables}

{p 8 21 2}
{cmd:expgen} {help newvar:{it:newvarname1}} {cmd:=} {it:ncexp} {ifin} [ {cmd:,} 
{cmdab:o:ldseq}{cmd:(}{help newvar:{it:newvarname2}}{cmd:)}
{cmdab:c:opyseq}{cmd:(}{help newvar:{it:newvarname3}}{cmd:)}
{cmdab:m:issing} {cmdab:z:ero}
]


{title:Description}

{pstd}
{cmd:expgen} is an extended version of {helpb expand}.
It replaces each observation in the
current dataset with {it:nc} copies of the observation, sorted in the original order,
where {it:nc} is equal to the integer part {cmd:int(}{it:ncexp}{cmd:)} of the required expression {it:ncexp}.
{cmd:expgen} may also add new variables.
These are {help newvar:{it:newvarname1}} (containing the value of the expression {it:ncexp}),
{help newvar:{it:newvarname2}} (containing the sequential order of the original observation in the old dataset),
and {help newvar:{it:newvarname3}} (containing the sequential order of each copy in the set of copies
generated from its original observation).
If {cmd:missing} and/or {cmd:zero} is specified,
then observations in the old dataset with missing and/or non-positive values
for {cmd:int(}{it:ncexp}{cmd:)} are each replaced by one observation in the new dataset.


{title:Options for use with {cmd:expgen}}

{p 4 8 2}
{cmd:oldseq(}{help newvar:{it:newvarname2}}{cmd:)} specifies a new variable to be generated, containing the sequential
order of the observation in the old dataset from which the current
observation was copied, as defined in the old dataset.
This old sequential order is defined after the exclusion from the old dataset of observations
not satisfying the {helpb if} and {helpb in} requirements, and after the exclusion of
observations with missing values of {it:ncexp} (if {cmd:missing} was not specified)
and of observations with non-positive values of {cmd:int(}{it:ncexp}{cmd:)}
(if {cmd:zero} was not specified).
(See {cmd:missing} and {cmd:zero} below.)

{p 4 8 2}
{cmd:copyseq(}{help newvar:{it:newvarname3}}{cmd:)} specifies a new variable to be generated, containing the sequential
order of the copied observation in the new dataset, in the set of
observations copied from the same original observation in the old dataset.
If an observation in the old dataset has a value {cmd:int(}{it:ncexp}{cmd:)} equal to {it:k},
then the {it:k} duplicates of that observation in the new dataset have values
of the {cmd:copyseq()} variable in ascending order from 1 to {it:k}.

{p 4 8 2}
{cmd:missing} specifies that observations in the old dataset with missing values of
{it:ncexp} are each to be replaced in the new dataset with a single observation.
If {cmd:missing} is absent, then such old observations are deleted from the new dataset.

{p 4 8 2}
{cmd:zero} specifies that observations in the old dataset with non-positive values
of {cmd:int(}{it:ncexp}{cmd:)} are each to be replaced in the new dataset with a single observation.
If {cmd:zero} is absent, then such old observations are deleted from the new dataset.


{title:Remarks}

{pstd}
{cmd:expgen} is designed as an improved version of {helpb expand}.
Unlike {helpb expand}, it returns duplicate observations sorted in the original order of the
corresponding observations in the old dataset, from which they were copied.
There is also the option of adding new variables to the duplicate observations.
{cmd:expgen} was designed for use, in a medical setting, in the case when the
original dataset is from a spreadsheet with one row per patient, and with
multiple columns containing multiple repeated measurements on the same patient
at different times and/or on different parts of the same patient's body.
We might want to expand this dataset to a new dataset with 1 observation per
repeated measurement, and then perform a repeated measures analysis.
The {cmd:copyseq()} variable can be used to generate a new variable in this new dataset,
containing the repeated measures.
The new variable might have values each copied from one of a list of variables in the original dataset,
depending on the value of the {cmd:copyseq()} variable.
(See Examples.
In SAS, this kind of duplication is done using a SAS data step
containing a SAS {cmd:OUTPUT} statement inside a SAS {cmd:DO}-loop,
iterating over a list of variables defined as a SAS array.)


{title:Examples}

{p 8 12 2}{cmd:. expgen nreps=rep78, old(modseq) copy(repseq) miss zero}{p_end}

{pstd}
The following example uses the {help sysuse:census} dataset, and expands it from a dataset
of US states to a dataset of combinations of US state and age group:

{p 8 12 2}{cmd:. sysuse census, clear}{p_end}
{p 8 12 2}{cmd:. expgen =4, copy(agegp)}{p_end}
{p 8 12 2}{cmd:. lab def agegp 1 "0-4" 2 "5-17" 3 "18-64" 4 "65+"}{p_end}
{p 8 12 2}{cmd:. lab val agegp agegp}{p_end}
{p 8 12 2}{cmd:. gene popul = (agegp==1)*poplt5 + (agegp==2)*pop5_17 + (agegp==3)*(pop18p-pop65p) + (agegp==4)*pop65p}{p_end}
{p 8 12 2}{cmd:. drop pop poplt5 pop5_17 pop18p pop65p}{p_end}
{p 8 12 2}{cmd:. desc}{p_end}
{p 8 12 2}{cmd:. list state agegp popul}{p_end}


{title:Author}

{pstd}
Roger Newson, National Heart and Lung Institute, Imperial College London, UK.{break}
Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk}


{title:Also see}

{p 4 13 2}
{bind: }Manual: {hi:[D] expand}, {hi:[D] expandcl}, {hi:[D] joinby}, {hi:[D] contract}, {hi:[D] collapse}, {hi:[D] reshape}
{p_end}
{p 4 13 2}
On-line: help for {helpb expand}, {helpb expandcl}, {helpb joinby}, {helpb contract}, {helpb collapse}, {helpb reshape},{break}
         help for {helpb expandby} if installed
{p_end}