help strrec -------------------------------------------------------------------------------
Title
strrec -- Recode string variables, modify value labels or recode variables referring to value labels
Syntax
strrec varlist (rule) [(rule) ...] [if] [in] [, options]
where rule is one of
"str" ["str" ...] = # ["lbl"]
"str" ["str" ...] = "newstr"
str is a string. Double quotes may be omitted if str does not contain embedded spaces.
# is a number. Non-integers are allowed.
lbl is a value label. Double quotes may be omitted if lbl does not contain embedded spaces. Make sure to insert a blank between # and lbl if not using double quotes. This part of rule is optional.
newstr is a single string. Double quotes may be omitted if newstr does not contain embedded spaces.
Parentheses around rules must be used.
options Description ------------------------------------------------------------------------- main options prefix(name) use name as prefix for transformed variables generate(namelist) create variables name1, ..., namek containing transformed variables replace replace var with transformed values sub recode if str is a substring in var casesensitive case-sensitive recode string force new variables to be string variables numeric options define(name) specify name for defined value label nolabel do not define value label/s string options elsemiss set strings that do not meet the conditions of rules to missing copyrest copy strings that are excluded by the if and in qualifiers from var extended options vallab apply rules to value labels (numeric variables) nodelete do not delete value labels that are changed -------------------------------------------------------------------------
Description
strrec recodes string variables according to rules. Variables may either be recoded into numeric variables or into string variables. Any string in var that does not meet the conditions of rules is set to missing in created numeric variables and copied from var in created string variables. Value labels will be defined for numeric variables, assigning str (or, if specified lbl) to corresponding numeric values.
Remarks: strrec may also be used to recode numeric variables referring to their value labels. See option vallab.
Options
+---------+ ----+ Options +----------------------------------------------------------
prefix(name) uses name as prefix for recoded variables. If not specified r_ is default. May also be used with generate().
generate(namelist) creates new variables name1, ..., namek containing recoded variables. The number of names specified must equal the number of variables in varlist.
replace replaces var with transformed variable.
sub recodes var if str is a substring of var. In str, "?" means any single character, "*" means zero or more characters. The synonym wildcards may be used.
casesensitive specifies that str (as well as var) is treated "as is", meaning case-sensitive and with leading- trailing- and consecutive internal blanks. If specified, var will only be recoded if it perfectly matches str.
string forces new variable/s to be string variables. strrec sets the new variables' type (numeric or string) according to the first rule specified. If newstr is a single number in the first rule and you want to create string variables, specify string.
define(name) specifies a name for the created value label. If not specified, the new variables' names are used as value label name.
nolabel specifies that no value labels will be defined.
elsemiss specifies that strings that do not meet the conditions of rules are set to missing (""). Default is to copy those strings from var.
copyrest specifies that strings are copied from var, even if they are excluded by the if and in qualifier. Default is to set those strings to missing ("").
vallab applies rules to the value labels of (numeric) variables in varlist. In case "str" = "newstr" is used as rule/s, only the text in the variables' value labels will be changed. Using "str" = # "lbl" as rule/s will change the value labels (text and integer) and the values of all variables using the respective value label. The if and in qualifiers are ignored and only options sub, casesensitive and nodelete may be used. vlab or labrec are synonyms for vallab. See example.
nodelete does not delete (old) value labels that are changed by rules. This option should be specified if rules refer to the same text as str and lbl.
Examples
. sysuse auto ,clear (1978 Automobile Data)
. tabulate make
Make and Model | Freq. Percent Cum. -------------------+----------------------------------- AMC Concord | 1 1.35 1.35 AMC Pacer | 1 1.35 2.70 AMC Spirit | 1 1.35 4.05 Audi 5000 | 1 1.35 5.41 Audi Fox | 1 1.35 6.76 [...] VW Dasher | 1 1.35 94.59 VW Diesel | 1 1.35 95.95 VW Rabbit | 1 1.35 97.30 VW Scirocco | 1 1.35 98.65 Volvo 260 | 1 1.35 100.00 -------------------+----------------------------------- Total | 74 100.00
. strrec make ("AMC*" = 1 "AMC")("Audi*" = 2 "Audi") /// > [...] ("VW*" = 22 "VW")("Volvo*" = 23 "Volvo") ,sub generate(make_onl > y) make_only (3 real changes made) (2 real changes made) (1 real change made) (7 real changes made) (3 real changes made) (6 real changes made) (4 real changes made) (4 real changes made) (1 real change made) (2 real changes made) (2 real changes made) (3 real changes made) (1 real change made) (6 real changes made) (7 real changes made) (1 real change made) (5 real changes made) (6 real changes made) (1 real change made) (1 real change made) (3 real changes made) (4 real changes made) (1 real change made)
. tabulate make_only
make_only | Freq. Percent Cum. ------------+----------------------------------- AMC | 3 4.05 4.05 Audi | 2 2.70 6.76 [...] VW | 4 5.41 98.65 Volvo | 1 1.35 100.00 ------------+----------------------------------- Total | 74 100.00
vallab option
Consider the following value labels in a dataset
. label list agelbl: 888 Don't know 999 Refusal pollbl: -9 REFUSAL -8 DON'T KNOW 1 very interested 2 quite interested 3 hardly interested 4 not at all interested marlbl: 0 not married 1 married 9 Refusal tvlbl: 1 no time at all 2 less than 30 min 3 30 to 60 min 4 60 to 90 min 5 90 to 180 min 6 more than 180 min 8 don't know 9 refused
Note that missing value codes, as well as the spelling of the labels' text "don't know" and "refused" vary across variables.
. strrec _all ("don't know" = .a)(ref* = .b "refused") ,vallab sub [...] . label list marlbl: 0 not married 1 married .b refused agelbl: .a don't know .b refused pollbl: 1 very interested 2 quite interested 3 hardly interested 4 not at all interested .a don't know .b refused tvlbl: 1 no time at all 2 less than 30 min 3 30 to 60 min 4 60 to 90 min 5 90 to 180 min 6 more than 180 min .a don't know .b refused
All variables will be recoded accordingly. Note that old value labels (e.g. 888 Don't know) are deleted. To prevent strrec from doing so, add option nodelete. The option should be added if rules are used to swap value labels as in
. strrec anyvar ("one" = 5 "five") ("five" = 1 "one") Further Remarks
Make sure to always use the typewriter apostrophe ('), even if left single quotes (`) are used as apostrophe in the labels. strrec will change those left single quotes to typewriter apostrophes.
Author
Daniel Klein, University of Bamberg, klein.daniel.81@gmail.com
Also see
Online: encode, recode, label
if installed: labrec (old)