Simple data stacker -------------------
^stak^ varlist [^if^ exp] [^in^ range]^,^ [ ^i^nto^(^newvar^)^ ^g^id^(^newv > ar^)^ ^l^abels ^r^etain { ^w^ide | ^d^ummy } ^clear^ ]
Description -----------
^stak^ is a painless program to do simple ^reshape^s or ^stack^s. It verticall > y stacks the variables in varlist into a single new variable and has options for retaining the varlist and other variables in the data set. Use ^stack^ if you need more than a single stacked variable or ^reshape^ for more complicated restructuring.
Variables to be stacked must be either all numeric or all string. Variables may be repeated in varlist.
Consider the following data:
a b c 1. 1 4 7 2. 2 5 8 3. 3 6 9
^stak a b c^ creates a new dataset containing:
_into _gid 1. 1 a 2. 2 a 3. 3 a 4. 4 b 5. 5 b 6. 6 b 7. 7 c 8. 8 c 9. 9 c
The new dataset has 2 variables (^_into^ and ^_gid^) with k*n observations (k i > s the number of variables in varlist and n equals _N from the old dataset). The first n observations of ^_into^ are the data from variable a, the second n observations are the data from variable b, and the third n observations are the data from variable c.
Variable ^_gid^ identifies the groups. ^_gid^ is a labeled numeric variable th > at incrementally numbers the groups starting from 1. The value labels are the names of the stacked variables.
Options -------
^into(^newvar^)^ specifies the name of the stacked variable to be created. The default name is ^_into^. The name cannot be the same as any varname in varlist > ; when ^retain^ is specified, it cannot be the same as any varname in the dataset > .
^gid(^newvar^)^ specifies the name of the group id variable to be created. The default name is ^_gid^. The name cannot be the same as any varname in varlist; > when ^retain^ is specified, it cannot be the same as any varname in the dataset > .
^labels^ specifies that the variable labels of the varlist variables are to be used as value labels of the ^gid()^ variable. The default is to use the variable names of the varlist variables.
^retain^ includes a stacked copy of each variable not specified in varlist. A "stacked copy" is k stacked replicates of the original data.
^wide^ includes a wide copy of each variable in varlist. A "wide copy" replicates the original data of the kth variable in varlist when the ^gid()^ variable has value k; the wide copy has missing values otherwize. ^wide^ and ^dummy^ are alternatives; you can specify one but not both.
^dummy^ includes a dummy (or indicator) variable for each variable in varlist. > A "dummy" has value 1 for the kth variable in varlist when the ^gid()^ variable has value k; it has value 0 otherwize. Dummies are returned in the original variable names. ^dummy^ and ^wide^ are alternatives; you can specify one but n > ot both.
^clear^ indicates your understanding that the data in memory will be lost; if this option is not specified, you will be asked to confirm your intentions.
Comments --------
The new, stacked data file will be unnamed and the changed flag will be set. The data label will be a modified version of the data label (if any) of the original data. The modified version has "stak: " added as a prefix.
The treatment of existing value labels, variable labels, and formats depends on the type of variable:
^into()^ variable: none are transferred. ^gid()^ variable: none are transferred. ^wide^ variables: all are transferred. ^retain^ variables: all are transferred. ^dummy^ variables: all are modified... - existing value labels are dropped; - the variable label has " dummy" appended; - the format is set to %8.0g.
By default, the name of the value label assigned to the ^gid()^ variable is the same as the name of the ^gid()^ variable itself. If that value label name is i > n use by another variable (and that variable will be transferred to the new data file), then ^_gid^ is used instead. If ^_gid^ is already in use, it will be deleted, without notice, and reused for the ^gid()^ variable.
The storage type for the ^into()^ variable is automatically set to that of the "largest" type among the varlist variables so no precision is lost.
All variables in the new dataset are ^compress^ed. Although storage types of existing variables may change, precision will be maintained.
Characteristics? The _dta[] chars are retained. Variable-specific chars are retained when the variable is in the new dataset. ^dummy^ variables inherit the characteristics of their "parent" variables.
No attempt is made to determine whether the current ^set memory^ allocation is large enough to contain the new dataset. Nonetheless, the data are preserved and will be restored if a memory shortage occurs. The required memory allocation will depend on the number of variables to be stacked and whether the ^retain^, {^wide^|^dummy^}, and ^labels^ options are specified. The "worst > case" scenario, for k stacked variables and options ^retain^, ^wide^ and ^label^ spec > ified, will be a new dataset of size slightly larger than k times the size of the original dataset.
Examples --------
Given data:
a b c 1. 1 4 7 2. 2 5 8 3. 3 6 9
^. stak a b, into(value) gid(group) retain wide clear^
^. list^
value group a b c 1. 1 a 1 . 7 2. 2 a 2 . 8 3. 3 a 3 . 9 4. 4 b . 4 7 5. 5 b . 5 8 6. 6 b . 6 9
Variable value contains the stacked values of variables a and b. The varname "value" was specified in the ^into()^ option.
Variable group is a labeled numeric variable identifying the stacked groups of data. The varname "group" was specified in the ^gid()^ option.
Variables a and b result from the ^wide^ option.
Variable c results from the ^retain^ option.
Option ^clear^ eliminated a warning that the existing dataset will be cleared.
Alternatively, we could have specified:
^. stak a b, i(value) g(group) r dummy labels clear^
^. list^
value group a b c 1. 1 'a' var 1 0 7 2. 2 'a' var 1 0 8 3. 3 'a' var 1 0 9 4. 4 'b' var 0 1 7 5. 5 'b' var 0 1 8 6. 6 'b' var 0 1 9
There are two differences from the first example. Specifying ^labels^ caused the value labels of group to be the variable labels of the stacked variables (rather than the variable names).
Specifying ^dummy^ (rather than ^wide^) caused variables a and b to be dummy (indicator) variables.
Author ------
Thomas J. Steichen, RJRT, steicht@@rjrt.com
Acknowledgments ---------------
Nicholas J. Cox made helpful comments (and was the original author of some code and code ideas that I willfully pilfered).
Also see --------
Manual: ^[R] stack^ On-line: help for @contract@, @reshape@, @xpose@