Proper name case for foreign names in variables
Syntax
sproper [varlist] [if] [in] [, generate(stubname) replace oe ]
supper [varlist] [if] [in] [, generate(stubname) replace oe ]
slower [varlist] [if] [in] [, generate(stubname) replace oe ]
+-------------+ ----+ Description +------------------------------------------------------
The proper(), upper(), and lower() string functions do not include foreign letters in their definition of letters, but sproper, supper, and slower will treat foreign names correctly if the ASCII encoding is such that codes 065 to 090 map to A to Z (add 32 to get lowercase versions), codes 192 to 222 map to characters with diacritical marks (add 32 to get lowercase versions) as shown below in charset latin1 (ISO-8859-1). The problem was posed by a a question on Statalist. Note that this package operates on variables, as a program, not on strings, as would a function. However, the Mata functions included with the programs may be of use in other contexts. viewsource the programs to see the Mata functions.
Option oe adds to the definitions of letters by specifying that ASCII codes 138 (S hachek), 140 (OE), 142 (Z hachek), and 159 (Y double dot) are also letters with lowercase versions at 154, 156, 158, and 255 respectively (not officially part of charset latin1, but represented on many Stata user's charsets).
------------------------------------------------------------------------------- The natural way to display the 256-character latin1 (ISO-8859-1) character set is in a 16 by 16 hexadecimal grid, 0 to F in each dimension, but skipping the first 32 reserved characters numbered 0 to 31, starting with the space character at position 0x20 (32 in hexadecimal):
0123456789ABCDEF 2 !"#$%&'()*+,-./ 3 0123456789:;<=>? 4 @ABCDEFGHIJKLMNO 5 PQRSTUVWXYZ[\]^_ 6 `abcdefghijklmno 7 pqrstuvwxyz{|}~ 8 9 A B C D E F
Here you can see adding 16 as a displacement of one line from to () and adding 32 as a displacement of 2 lines from to () You can reproduce this mapping on your own Stata with this code:
forv i=2/15 { if `i'==2 di " 0123456789ABCDEF" if `i'<10 di %1.0f `i' " " _c if `i'>9 di in smcl "{c `=55+`i''} " _c forv j=0/15 { di in smcl "{c `=`i'*16+`j''}" _c } di }
A less natural way but perhaps more intuitive to display the latin1 (ISO-8859-1) character set is in a 23 by 10 decimal grid, 0 to 9 in the x dimension, again skipping the first 32 reserved characters, and starting with the space at position 32:
0123456789 03 !"#$%&' 04 ()*+,-./01 05 23456789:; 06 <=>?@ABCDE 07 FGHIJKLMNO 08 PQRSTUVWXY 09 Z[\]^_`abc 10 defghijklm 11 nopqrstuvw 12 xyz{|}~ 13 14 15 16 17 18 19 20 21 22 23 24 25
You can reproduce this table on your own system (to see any differences) with this code:
forv i=3/25 { if `i'==3 di " 0123456789" di %02.0f `i' _c loc s=cond(`i'==3,2,0) loc e=cond(`i'==25,5,9) forv j=0/`s' { di " " _c } forv j=`s'/`e' { di in smcl "{c `=`i'*10+`j''}" _c } di }
-------------------------------------------------------------------------------
+---------+ ----+ Example +----------------------------------------------------------
clear set charset latin1 set obs 5 g p="ZUIGA RODRGUEZ "+char(86)+char(85)+char(138)+char(159)+char(223)+char(14 > 2)+char(68) list replace p=proper(p) list sproper p in 1, replace supper p in 2/3, replace list slower p in 3, replace slower p in 4, replace oe g t="sproper" replace t="supper" in 2 replace t="slower" in 3 replace t="slower, oe" in 4 replace t="proper()" in 5 list
+---------+ ----+ Options +----------------------------------------------------------
generate(stubname) specifies a stubname, to prefix nealy created proper case versions of variables.
replace specifies that variables should be overwritten by proper case versions.
Author
Austin Nichols Urban Institute Washington, DC, USA austinnichols@gmail.com
Also see
On-line: help for set charset, FAQ on charset proper(), lower(), upper(), char(), string functions, [M-4] string, asciiplot (if installed; findit asciiplot if not).