Key the dataset by a variable list
keyby varlist [ , noorder missing fast ]
keybygen [ varlist ] , generate(newvarname) [ replace noorder missing fast ]
Description
keyby sorts the dataset currently in memory by the variables in a varlist, checking that the variables in the varlist uniquely identify the observations. This makes the variables in the varlist a primary key for the dataset in memory. If the user does not specify otherwise, then keyby also reorders the variables in the varlist to the start of the variable order in the dataset, and checks that all values of these variables are nonmissing. keybygen sorts the dataset currently in memory by the variables in a varlist, preserving the existing order of observations within each by-group, and then generates a new variable, containing the sequential order of each observation within its by-group, to form a primary key with the existing variables in the varlist. keyby and keybygen can be useful if the user combines multiple datasets using merge, which may cause a dataset in memory to become unsorted.
Options for keyby and keybygen
noorder specifies that the variables in the varlist (and the generate() variable created by keyby) are not reordered to the beginning of the variable order of the dataset in memory. If noorder is not specified, then the variables in the varlist (and the generate() variable created by keyby) are reordered to the beginning of the variable order (see order).
missing specifies that missing values in the variables in the varlist are allowed. If missing is not specified, then missing values in the variables in the varlist cause keyby or keybygen to fail.
fast is an option for programmers. It specifies that keyby or keybygen will take no action to restore the existing dataset in memory in the event of failure, or if the user presses the Break key.. If fast is not specified, then keyby will take this action, which uses an amount of time depending on the size of the dataset in memory.
Options for keybygen only
generate(newvarname) is required. It specifies the name of a new variable to be generated, containing, in each observation, the sequential order of that observation within its by-group defined by the varlist, or the sequential order of that observation in the dataset, if the varlist is empty. The new variable is appended to the varlist to form the new primary key, by which the dataset is sorted, and which uniquely identifies observations in the dataset. Note that keybygen, unlike keyby, works with an empty varlist. Also, note that the new variable specified by generate() may not have the same name as any existing variable in the varlist.
replace specifies that any existing variable with the name specified by the generate() option will be replaced. If replace is not specified, and an existing variable has the same name as the generate() option, then keybygen will fail.
Remarks
keyby is a "clean" version of the sort command without the stable option. keybygen is a "clean" version of the sort command with the stable option. Either of them can be used to make a dataset conform to the relational database model, under which a dataset is viewed as a mathematical function, whose domain is the set of existing primary key value combinations, and whose range is the set of all possible value combinations for variables outside the primary key. If all datasets conform to the relational database model, then the user can use the addinby package, which can also be downloaded from SSC, to add variables from a disk dataset into the dataset in memory, based on the values of a list of variables in the dataset in memory, which is also the primary key of the dataset on disk. The addinby command is a "clean" version of the merge command.
Examples
. keyby foreign make
. keyby foreign make, noorder
. keyby rep78 make, missing
. keybygen foreign, gene(modseq)
. keybygen foreign, gene(modseq) replace noorder
. keybygen, gene(obsseq)
Author
Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk
Also see
Manual: [D] generate, [D] sort, [D] gsort, [D] merge, [D] order, [U] 12.2.1 Missing values On-line: help for generate, sort, gsort, merge, order, missing help for addinby if installed