Extending the label commands (cont'd)

Daniel Klein International Centre for Higher Education Research Kassel klein@incher.uni-kassel.de klein.daniel.81@gmail.com -------------------------------------------------------------------------------

index >>

-------------------------------------------------------------------------------

Extending the label commands (cont'd)

    Motivation

        Specific problems and general solutions(ancillary)
 
    The elabel command

        Syntax elements

        Variables that have no value label attached(ancillary)

        Undefined value labels(ancillary)

        Label multiple variables(ancillary)

        Reverse label codings using transformation rules
        (ancillary)

        Replicating numlabel(ancillary)
 
    A close-to-real-life example

        Cleaning and preparing data
 
    Adding new commands to elabel

        How to?

        Detach value labels from variables

        Delete integer-to-text mappings from value labels
        (ancillary)

        A wrapper for labeldup(ancillary)
 
 
-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Motivation
-------------------------------------------------------------------------------

Data management

Manipulating variables is convenient in Stata

- wildcard characters (*, ~, ?) in variable names

- arithmetic, logical, and relational eexpessions as well as functions

- transformation rules (recode)

- renaming variables is systematic

Defining and manipulating variable and value labels is not so convenient

- no wildcard characters (*, ~, ?) in value label names

- eexpessions and functions do not apply to variable or value labels

- no transformation rules

- renaming value labels is not straightforward >> digression -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Motivation
-------------------------------------------------------------------------------

Community-contributed commands to manipulate variable and value labels

Many community-contibuted (collection of) commands

- labutil (Cox, 2000; 2013)

- labeldup, labelrename (Weesie, 2005a)

- mlanguage (Weesie, 2005b)

- labelsof (Jann, 2007)

- varlabdef (Newson, 2009)

- labutil2 (Klein, 2011; 2013)

- vallabdef (Newson, 2018)

- ...

Great. However, ...

- many commands are tailored to solve one specific problem

- need to find the (one) specific command for any specific problem

- need to remember (sometimes cryptic) command names and syntax -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The elabel command
-------------------------------------------------------------------------------

Introducing the elabel command

Yet another command for manipulating variable and value labels: elabel

- based on Stata's label commands

- follows general command - subcommand structure

- basic syntax familiar to most Stata users

Syntax

elabel subcommand [ elblnamelist ] [ mappings ] [ iff eexp ] [ , options ] -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The elabel command -- Syntax elements
-------------------------------------------------------------------------------

elabel subcommand elblnamelist mappings iff eexp , options

The subcommands are the same as those used with label

elabel variable Label variables

elabel define Define and modify value labels

elabel values Attach value label to variables

elabel dir List names of value labels

elabel list List names and contents of value labels

elabel copy Copy value label

elabel drop Drop value labels

elabel save Save value labels in do-file

There are some additional subcommands; currently

elabel compare Compare value lables

elabel keep Keep value labels

elabel load Define value lables from file

elabel recode Recode value labels

elabel remove Remove value lables from variables and memory

elabel rename Rename value labels -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The elabel command -- Syntax elements
-------------------------------------------------------------------------------

elabel subcommand elblnamelist mappings iff eexp , options

We start the demonstration with

. sysuse auto , clear . describe

Let us list the contents of value label origin

. elabel list origin

We can abbreviate origin using wildcard characters

. elabel list ori~

We do not even need to know the value label name

. elabel list (foreign)

Also, elabel sometimes returns additional results

. return list -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The elabel command -- Syntax elements
-------------------------------------------------------------------------------

elabel subcommand elblnamelist mappings iff eexp , options

Mappings always resemble those used with the respective label commands

We could define a new value label for rep78 (Repair record)

elabel define reprec /// 1 "very bad" /// 2 "bad" /// 3 "medicore" /// 4 "good" /// 5 "very good"

Sometimes, there are extended mappings

We can group integer values and text; the above could be written as

. elabel define reprec (1/5) ("very bad" "bad" "medicore" "good" "very good") . elabel values rep78 reprec

We can also use eexpessions and functions; I will show you a minute

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The elabel command -- Syntax elements
-------------------------------------------------------------------------------

elabel subcommand elblnamelist mappings iff eexp , options

With elabel's iff qualifier, we refer to integer-to-text mappings

In eexp, the # character represents integer values

. elabel list reprec iff (# < 3)

while the @ character represents text

. elabel list reprec iff strpos(@, "oo")

Both characters can be combined, of course

. elabel list reprec iff (# >= 3) & !strpos(@, "oo")

We will not see any useful application in this talk; an example >> digression

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The elabel command -- Syntax elements
-------------------------------------------------------------------------------

elabel subcommand elblnamelist mappings if eexp , options

I have told you that elabel define allows expessions and functions in mappings

Say, we want to reverse the coding for rep78

. tabulate rep78 . elabel list (rep78)

Here is one way to reverse the coding

. replace rep78 = 6 - rep78 . tabulate rep78

But now the labels are messed

Here is how we reverse the labels

. elabel define reprec (= 6 - #) (= @) , replace . tabulate rep78 -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
A close-to-real-life example -- Cleaning and preparing data
-------------------------------------------------------------------------------

Rename value labels and modify variable labels

I have used the setup file for the Mikrozensus 2010 from MISSY (https://www.gesis.org/missy/materials/MZ/setups)

Here is the data (for which I have actually made up the observations)

. use mz2010 , clear . describe

I do not like all uppercase names; I find them cumbersome to type

. rename * , lower

I can do the same with value label names

. elabel rename * , lower

Perhaps, I want remove those underscores in value label names >> digression

Let us also remove those F# prefixes from variable labels

. elabel variable (*) (= regexr(@, "^F[0-9]*[ ]", "")) -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
A close-to-real-life example -- Cleaning and preparing data
-------------------------------------------------------------------------------

Identify and eliminate identical value labels

Based on variable labels, some of the variables seem to contain similar information

I wonder why the all have their own value label attached

Let us look at the those value labels

. elabel list (ef670-ef673)

As expected, those labels look pretty similar

labeldup (not part of elabel) identifies and removes duplicate labels

. labeldup , select . describe

How is ef673_vl different from ef670_vl

. elabel compare ef673_vl ef670_vl

Both labels are identical if we exclude value 5 >> digression

We should attach ef673_vl to ef670, ef671, and ef672

. elabel values ef670-ef672 (ef673)

We do no longer need value label ef670_vl

. elabel drop ef670_vl -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
A close-to-real-life example -- Cleaning and preparing data
-------------------------------------------------------------------------------

Change numeric values to extended missing values

Next, I would like to define missing values

Consider

. elabel list

It seems as if consecutive negative numbers, -1, -2, ..., -5 indicate missing values

I would like to have consecutive extended missing values, .a, .b, ..., .e

Of course, I would like to keep the value labels

Here is how we do this for value label ef1109_vl

. elabel recode ef1109_vl (-1/-5 = .a/.e) . return list . local rules `r(rules)'

Value label ef1109_vl is attached to more than one variable

We need to recode all variables accordingly

. elabel list ef1109_vl , varlist . return list . local varlist `r(varlist)' . recode `varlist' `rules' -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Adding new commands to elabel -- How to?
-------------------------------------------------------------------------------

Adding new commands to elabel

There are two ways to add commands to elabel

- write a program, elabel_cmd_newcmd (in a minute)

- write a program, elabel_fcn_newfcn (see elabel (pseudo-)functions)

we can also use elabel's programming tools in any program (or do-file) -------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Adding new commands to elabel -- Detach value labels from variables
-------------------------------------------------------------------------------

Detach value labels from variables

label values attach and detaches value labels to a list of variables

Suppose, we wanted to detach a list of value labels from all variables

We want a command with the following syntax

elabel detach elblnamelist

Here is how we implement this

. program elabel_cmd_detach program elabel_cmd_detach already defined r(110);

. version 15

. elabel parse elblnamelist : `0' elblnamelist required r(100);

. foreach lbl of local lblnamelist { 2. quietly elabel list `lbl' , varlist 3. if ("`r(varlist)'" != "") elabel values `r(varlist)' . 4. }

. end command end is unrecognized r(199);

    elabel parse resembles a rudimentary version of syntax

Let us try our command

. describe . elabel detach *673* -------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
digression
-------------------------------------------------------------------------------

Renaming a value label

Step 1 copy the old value label using a new name

Step 2 attach the new value label to all variables that previously had the old value label attached make sure to do this in all label languages

Step 3 optionally, drop the old value label -------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Specific problems and general solutions

The following assumes familiarity with the elabel command

We will look at two problems and compare solutions from the labutil package with elabel

Problem 1

"Define a value label for values which are base 10 logarithms containing the antilogged values" (from help lablog)

. lablog logs , values(1/6)

. elabel define logs2 (1/6) (= strofreal(10^#))

. elabel list logs*

Compared with elabel, lablog arguably has the simpler syntax

Problem 2

Define a value label for values which map minutes after midnight to hours

. labmap time , values(0(60)240) first(12) step(1) max(12) postfix(" am")

. elabel define time2 (0(60)240) (= strofreal(cond(#, #/60, 12)) + " am")

. elabel list time*

Compared with elabel, does labmap have the simpler syntax?

Compared with lablog, the more general labmap has a more complicated syntax

elabel solves both problems with basically the same syntax

-------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
digression
-------------------------------------------------------------------------------
    Suppose, there is another set of value lables for a car's repair record

. elabel define repair 0 "poor" 1 "excellent" .a "not available"

We would like to add the label for missing record to reprec

. elabel copy repair reprec iff mi(#) , add . elabel list reprec

We will not need the additional integer-to-text mapping in this talk

elabel define reprec .a "" , modify -------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Variables that have no value label attached

What if we refer to non-existing value labels?

. sysuse auto , clear . elabel list (mpg) -------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Undefined value labels

elabel dir lists undefined value labels

. sysuse auto , clear . elabel values mpg mpglbl . elabel dir , nomemory . return list -------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Label multiple variables

The syntax diagram for elabel variable shows extended mappings

. sysuse auto , clear . describe rep78 foreign . elabel variable rep78 "Repair Rec." foreign "Manufactured outside US" . elabel variable (rep78 foreign) ("Repair Record 1978" "Car type") -------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Reverse label codings using transformation rules

Let us define a label for rep78

. sysuse auto , clear . elabel define reprec (1/5) ("very bad" "bad" "medicore" "good" "very good") . elabel values rep78 reprec

We could use recode to reverse the coding

recode rep78 /// (1 = 5 "very bad") /// (2 = 4 "bad") /// (4 = 2 "good") /// (1 = 5 "very good") /// , generate(rep78r)

Defining value labels on the fly is a nice feature ...

But, do we really need to retype all those existing labels?

Here is a solution with elabel

. elabel recode reprec (1/5 = 5/1) , define(rep78r)

I have mentioned that elabel sometimes stores useful results

. return list

We can recode any variables accordingly

. recode rep78 `r(rules)' , generate(rep78r) . elabel values rep78r rep78r . tabulate rep78 rep78r -------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Replicating numlabel

Remember our value label reprec?

. sysuse auto , clear . elabel define reprec (1/5) ("very bad" "bad" "medicore" "good" "very good") , replace . elabel values rep78 reprec . elabel list reprec . tabulate rep78

Let us add numeric the values as a prefix to labels

Stata's solution is the numlabel command

. numlabel reprec , add

Remove the prefix

. numlabel reprec , remove

We can replicate the above with elabel

. elabel define reprec (= #) (= strofreal(#) + ". " + @) , modify . elabel define reprec (= #) (= subinstr(@, strofreal(#)+ ". ", "", .)) , modify -------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
digression
-------------------------------------------------------------------------------

. elabel rename (*_*) (**) , dryrun -------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
digression
-------------------------------------------------------------------------------

Compare parts of value labels

. elabel compare ef673_vl ef670_vl iff (# < 5) -------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Delete integer-to-text mappings from value labels

Syntax

elabel delete elblnamelist { numlist | iff eexp } [ , not ]

. *! version 1.0.0 24may2019 daniel klein . program elabel_cmd_delete program elabel_cmd_delete already defined r(110);

. version 11.2

. . elabel parse elblnamelist [ mappings ] [ iff ] [ , NOT ] : `0' invalid syntax r(197);

. . if mi(`"`iff'`mappings'"') { . display as err "iff eexp required" iff eexp required . exit 100 r(100); . } r(100);

. else if ((`"`iff'"' != "") & (`"`mappings'"' != "")) { . display as err "iff not allowed" . exit 101 . }

. . if ("`not'" == "not") local not !

. . if (`"`mappings'"' != "") { . elabel protectr . elabel numlist `"`mappings'"' . local numlist `r(numlist)' . local numlist : subinstr local numlist " " ", " , all . local iffeexp inlist(#, `numlist') . }

. else gettoken ifword iffeexp : iff

. . local iffeexp `not'(`iffeexp')

. . elabel define `lblnamelist' (= #) (= cond(`iffeexp', "", @)) , modify too few value label names specified r(198);

. end command end is unrecognized r(199);

. exit

end of do-file

 
 
-------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

A wrapper for labeldup

Syntax

elabel dup elblnamelist1 [ iff eexp ] [ , select names(elblnamelist2) nodrop ]

. *! version 1.0.0 daniel klein 24may2019 . program elabel_cmd_dup 1. version 11.2 2. . elabel parse [ elblnamelist ] [ iff ] /// > [ , Select Names(string asis) * ] : `0' 3. . if (`"`names'"' != "") elabel unab labellist2 : `names' , elblnamelist 4. . preserve 5. . if (`"`iff'"' != "") { 6. tempfile tmp 7. quietly label save using "`tmp'" 8. label drop _all 9. elabel load `iff' using "`tmp'" , as(do) 10. } 11. . labeldup `lblnamelist' , `select' names(`labellist2') `options' 12. . restore , not 13. . if (mi(`"`iff'"')) exit 14. . if ("`select'" == "select") elabel unab toload : * 15. . elabel load `toload' using "`tmp'" , modify as(do) 16. end

. exit

end of do-file

 
 
-------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------