Extending the label commands (cont'd)
Daniel Klein International Centre for Higher Education Research Kassel klein@incher.uni-kassel.de klein.daniel.81@gmail.com -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Extending the label commands (cont'd)
Motivation
Specific problems and general solutions(ancillary)
The elabel command
Syntax elements
Variables that have no value label attached(ancillary)
Undefined value labels(ancillary)
Label multiple variables(ancillary)
Reverse label codings using transformation rules (ancillary)
Replicating numlabel(ancillary)
A close-to-real-life example
Cleaning and preparing data
Adding new commands to elabel
How to?
Detach value labels from variables
Delete integer-to-text mappings from value labels (ancillary)
A wrapper for labeldup(ancillary)
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- Motivation -------------------------------------------------------------------------------Data management
Manipulating variables is convenient in Stata
- wildcard characters (*, ~, ?) in variable names
- arithmetic, logical, and relational eexpessions as well as functions
- transformation rules (recode)
- renaming variables is systematic
Defining and manipulating variable and value labels is not so convenient
- no wildcard characters (*, ~, ?) in value label names
- eexpessions and functions do not apply to variable or value labels
- no transformation rules
- renaming value labels is not straightforward >> digression -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- Motivation -------------------------------------------------------------------------------Community-contributed commands to manipulate variable and value labels
Many community-contibuted (collection of) commands
- labutil (Cox, 2000; 2013)
- labeldup, labelrename (Weesie, 2005a)
- mlanguage (Weesie, 2005b)
- labelsof (Jann, 2007)
- varlabdef (Newson, 2009)
- labutil2 (Klein, 2011; 2013)
- vallabdef (Newson, 2018)
- ...
Great. However, ...
- many commands are tailored to solve one specific problem
- need to find the (one) specific command for any specific problem
- need to remember (sometimes cryptic) command names and syntax -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- The elabel command -------------------------------------------------------------------------------Introducing the elabel command
Yet another command for manipulating variable and value labels: elabel
- based on Stata's label commands
- follows general command - subcommand structure
- basic syntax familiar to most Stata users
Syntax
elabel subcommand [ elblnamelist ] [ mappings ] [ iff eexp ] [ , options ] -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- The elabel command -- Syntax elements -------------------------------------------------------------------------------elabel subcommand elblnamelist mappings iff eexp , options
The subcommands are the same as those used with label
elabel variable Label variables
elabel define Define and modify value labels
elabel values Attach value label to variables
elabel dir List names of value labels
elabel list List names and contents of value labels
elabel copy Copy value label
elabel drop Drop value labels
elabel save Save value labels in do-file
There are some additional subcommands; currently
elabel compare Compare value lables
elabel keep Keep value labels
elabel load Define value lables from file
elabel recode Recode value labels
elabel remove Remove value lables from variables and memory
elabel rename Rename value labels -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- The elabel command -- Syntax elements -------------------------------------------------------------------------------elabel subcommand elblnamelist mappings iff eexp , options
We start the demonstration with
. sysuse auto , clear . describe
Let us list the contents of value label origin
. elabel list origin
We can abbreviate origin using wildcard characters
. elabel list ori~
We do not even need to know the value label name
. elabel list (foreign)
Also, elabel sometimes returns additional results
. return list -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- The elabel command -- Syntax elements -------------------------------------------------------------------------------elabel subcommand elblnamelist mappings iff eexp , options
Mappings always resemble those used with the respective label commands
We could define a new value label for rep78 (Repair record)
elabel define reprec /// 1 "very bad" /// 2 "bad" /// 3 "medicore" /// 4 "good" /// 5 "very good"
Sometimes, there are extended mappings
We can group integer values and text; the above could be written as
. elabel define reprec (1/5) ("very bad" "bad" "medicore" "good" "very good") . elabel values rep78 reprec
We can also use eexpessions and functions; I will show you a minute
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- The elabel command -- Syntax elements -------------------------------------------------------------------------------elabel subcommand elblnamelist mappings iff eexp , options
With elabel's iff qualifier, we refer to integer-to-text mappings
In eexp, the # character represents integer values
. elabel list reprec iff (# < 3)
while the @ character represents text
. elabel list reprec iff strpos(@, "oo")
Both characters can be combined, of course
. elabel list reprec iff (# >= 3) & !strpos(@, "oo")
We will not see any useful application in this talk; an example >> digression
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- The elabel command -- Syntax elements -------------------------------------------------------------------------------elabel subcommand elblnamelist mappings if eexp , options
I have told you that elabel define allows expessions and functions in mappings
Say, we want to reverse the coding for rep78
. tabulate rep78 . elabel list (rep78)
Here is one way to reverse the coding
. replace rep78 = 6 - rep78 . tabulate rep78
But now the labels are messed
Here is how we reverse the labels
. elabel define reprec (= 6 - #) (= @) , replace . tabulate rep78 -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- A close-to-real-life example -- Cleaning and preparing data -------------------------------------------------------------------------------Rename value labels and modify variable labels
I have used the setup file for the Mikrozensus 2010 from MISSY (https://www.gesis.org/missy/materials/MZ/setups)
Here is the data (for which I have actually made up the observations)
. use mz2010 , clear . describe
I do not like all uppercase names; I find them cumbersome to type
. rename * , lower
I can do the same with value label names
. elabel rename * , lower
Perhaps, I want remove those underscores in value label names >> digression
Let us also remove those F# prefixes from variable labels
. elabel variable (*) (= regexr(@, "^F[0-9]*[ ]", "")) -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- A close-to-real-life example -- Cleaning and preparing data -------------------------------------------------------------------------------Identify and eliminate identical value labels
Based on variable labels, some of the variables seem to contain similar information
I wonder why the all have their own value label attached
Let us look at the those value labels
. elabel list (ef670-ef673)
As expected, those labels look pretty similar
labeldup (not part of elabel) identifies and removes duplicate labels
. labeldup , select . describe
How is ef673_vl different from ef670_vl
. elabel compare ef673_vl ef670_vl
Both labels are identical if we exclude value 5 >> digression
We should attach ef673_vl to ef670, ef671, and ef672
. elabel values ef670-ef672 (ef673)
We do no longer need value label ef670_vl
. elabel drop ef670_vl -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- A close-to-real-life example -- Cleaning and preparing data -------------------------------------------------------------------------------Change numeric values to extended missing values
Next, I would like to define missing values
Consider
. elabel list
It seems as if consecutive negative numbers, -1, -2, ..., -5 indicate missing values
I would like to have consecutive extended missing values, .a, .b, ..., .e
Of course, I would like to keep the value labels
Here is how we do this for value label ef1109_vl
. elabel recode ef1109_vl (-1/-5 = .a/.e) . return list . local rules `r(rules)'
Value label ef1109_vl is attached to more than one variable
We need to recode all variables accordingly
. elabel list ef1109_vl , varlist . return list . local varlist `r(varlist)' . recode `varlist' `rules' -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- Adding new commands to elabel -- How to? -------------------------------------------------------------------------------Adding new commands to elabel
There are two ways to add commands to elabel
- write a program, elabel_cmd_newcmd (in a minute)
- write a program, elabel_fcn_newfcn (see elabel (pseudo-)functions)
we can also use elabel's programming tools in any program (or do-file) -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- Adding new commands to elabel -- Detach value labels from variables -------------------------------------------------------------------------------Detach value labels from variables
label values attach and detaches value labels to a list of variables
Suppose, we wanted to detach a list of value labels from all variables
We want a command with the following syntax
elabel detach elblnamelist
Here is how we implement this
. program elabel_cmd_detach program elabel_cmd_detach already defined r(110);
. version 15
. elabel parse elblnamelist : `0' elblnamelist required r(100);
. foreach lbl of local lblnamelist { 2. quietly elabel list `lbl' , varlist 3. if ("`r(varlist)'" != "") elabel values `r(varlist)' . 4. }
. end command end is unrecognized r(199);
------------------------------------------------------------------------------- digression -------------------------------------------------------------------------------Renaming a value label
Step 1 copy the old value label using a new name
Step 2 attach the new value label to all variables that previously had the old value label attached make sure to do this in all label languages
Step 3 optionally, drop the old value label -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------Specific problems and general solutions
The following assumes familiarity with the elabel command
We will look at two problems and compare solutions from the labutil package with elabel
Problem 1
"Define a value label for values which are base 10 logarithms containing the antilogged values" (from help lablog)
. lablog logs , values(1/6)
. elabel define logs2 (1/6) (= strofreal(10^#))
. elabel list logs*
Compared with elabel, lablog arguably has the simpler syntax
Problem 2
Define a value label for values which map minutes after midnight to hours
. labmap time , values(0(60)240) first(12) step(1) max(12) postfix(" am")
. elabel define time2 (0(60)240) (= strofreal(cond(#, #/60, 12)) + " am")
. elabel list time*
Compared with elabel, does labmap have the simpler syntax?
Compared with lablog, the more general labmap has a more complicated syntax
elabel solves both problems with basically the same syntax
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- digression ------------------------------------------------------------------------------- Suppose, there is another set of value lables for a car's repair record. elabel define repair 0 "poor" 1 "excellent" .a "not available"
We would like to add the label for missing record to reprec
. elabel copy repair reprec iff mi(#) , add . elabel list reprec
We will not need the additional integer-to-text mapping in this talk
elabel define reprec .a "" , modify -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------Variables that have no value label attached
What if we refer to non-existing value labels?
. sysuse auto , clear . elabel list (mpg) -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------Undefined value labels
elabel dir lists undefined value labels
. sysuse auto , clear . elabel values mpg mpglbl . elabel dir , nomemory . return list -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------Label multiple variables
The syntax diagram for elabel variable shows extended mappings
. sysuse auto , clear . describe rep78 foreign . elabel variable rep78 "Repair Rec." foreign "Manufactured outside US" . elabel variable (rep78 foreign) ("Repair Record 1978" "Car type") -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------Reverse label codings using transformation rules
Let us define a label for rep78
. sysuse auto , clear . elabel define reprec (1/5) ("very bad" "bad" "medicore" "good" "very good") . elabel values rep78 reprec
We could use recode to reverse the coding
recode rep78 /// (1 = 5 "very bad") /// (2 = 4 "bad") /// (4 = 2 "good") /// (1 = 5 "very good") /// , generate(rep78r)
Defining value labels on the fly is a nice feature ...
But, do we really need to retype all those existing labels?
Here is a solution with elabel
. elabel recode reprec (1/5 = 5/1) , define(rep78r)
I have mentioned that elabel sometimes stores useful results
. return list
We can recode any variables accordingly
. recode rep78 `r(rules)' , generate(rep78r) . elabel values rep78r rep78r . tabulate rep78 rep78r -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------Replicating numlabel
Remember our value label reprec?
. sysuse auto , clear . elabel define reprec (1/5) ("very bad" "bad" "medicore" "good" "very good") , replace . elabel values rep78 reprec . elabel list reprec . tabulate rep78
Let us add numeric the values as a prefix to labels
Stata's solution is the numlabel command
. numlabel reprec , add
Remove the prefix
. numlabel reprec , remove
We can replicate the above with elabel
. elabel define reprec (= #) (= strofreal(#) + ". " + @) , modify . elabel define reprec (= #) (= subinstr(@, strofreal(#)+ ". ", "", .)) , modify -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- digression -------------------------------------------------------------------------------. elabel rename (*_*) (**) , dryrun -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- digression -------------------------------------------------------------------------------Compare parts of value labels
. elabel compare ef673_vl ef670_vl iff (# < 5) -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------Delete integer-to-text mappings from value labels
Syntax
elabel delete elblnamelist { numlist | iff eexp } [ , not ]
. *! version 1.0.0 24may2019 daniel klein . program elabel_cmd_delete program elabel_cmd_delete already defined r(110);
. version 11.2
. . elabel parse elblnamelist [ mappings ] [ iff ] [ , NOT ] : `0' invalid syntax r(197);
. . if mi(`"`iff'`mappings'"') { . display as err "iff eexp required" iff eexp required . exit 100 r(100); . } r(100);
. else if ((`"`iff'"' != "") & (`"`mappings'"' != "")) { . display as err "iff not allowed" . exit 101 . }
. . if ("`not'" == "not") local not !
. . if (`"`mappings'"' != "") { . elabel protectr . elabel numlist `"`mappings'"' . local numlist `r(numlist)' . local numlist : subinstr local numlist " " ", " , all . local iffeexp inlist(#, `numlist') . }
. else gettoken ifword iffeexp : iff
. . local iffeexp `not'(`iffeexp')
. . elabel define `lblnamelist' (= #) (= cond(`iffeexp', "", @)) , modify too few value label names specified r(198);
. end command end is unrecognized r(199);
. exit
end of do-file
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------A wrapper for labeldup
Syntax
elabel dup elblnamelist1 [ iff eexp ] [ , select names(elblnamelist2) nodrop ]
. *! version 1.0.0 daniel klein 24may2019 . program elabel_cmd_dup 1. version 11.2 2. . elabel parse [ elblnamelist ] [ iff ] /// > [ , Select Names(string asis) * ] : `0' 3. . if (`"`names'"' != "") elabel unab labellist2 : `names' , elblnamelist 4. . preserve 5. . if (`"`iff'"' != "") { 6. tempfile tmp 7. quietly label save using "`tmp'" 8. label drop _all 9. elabel load `iff' using "`tmp'" , as(do) 10. } 11. . labeldup `lblnamelist' , `select' names(`labellist2') `options' 12. . restore , not 13. . if (mi(`"`iff'"')) exit 14. . if ("`select'" == "select") elabel unab toload : * 15. . elabel load `toload' using "`tmp'" , modify as(do) 16. end
. exit
end of do-file
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------