Title
labcenswdi ---Automatically manages datasets obtained from US Census 2000 and World Development Indicators databases.
+--------------------+ ----+ Table of Contents +-----------------------------------------------
Syntax General description Description of the options Examples Author information -------------------------------------------------------------------------------
The syntax to display the default variable names with the variable descriptions > is
labcenswdi
The syntax to manage the data and the variable descriptions is
labcenswdi [newvarlist], nstr(#) [truncby(("text1") [("text2")]) truncwith(("text3") [("text4")]) repdes((# "text5") [(# "text6")...]) force comma saving(filename [,sub_option])]
options Description ------------------------------------------------------------------------- Options nstr(#) indicate the number of identifier variables in the dataset.
truncby(("text1") [("text2")]) truncate the variable descriptions.
truncwith(("text3") [("text4")]) replace the truncated characters with shorter ones.
repdes((# "text5") [(# "text6")...]) replace variable descriptions entirely
force convert nonnumeric strind to missing values.
comma remove 1000-separator commas.
saving(filename [,sub_option]) save the original variable descriptions to text file.
-------------------------------------------------------------------------
+-------------+ ----+ Description +------------------------------------------------------
labcenswdi automatically manages datasets obtained from databases providing variable descriptions on the second row. Such databases include, but not limited to, the US Census 2000 Summary Files, the American Community Survey, and the World Development (WDI). While renaming variables with the user's specified variable names, labcenswdi manages the variable descriptions including removing them from the second row to place them into Stata variable labels, reducing their length to 80 characters or less, and saving them to a text file. The new variable names should be supplied in newvarlist (see syntax) if the user elects to replace the original variable names with more meaningful names.
When a dataset containing variable descriptions on the second row is insheeted, Stata read all variables as string regardless of their contents (string or numeric). labcenswdi will attempt to automatically convert back to numeric all variables containing numerical contents. However, if these variables, for some observations, take on values with non-numeric characters or values with 1000-separator commas, unless explicitly requested by the user, no conversion from string to numeric will take place. Also, to conserve memory, demoting the variables is a good idea. labcenswdi automatically attempts to demote both string and numeric variables. For instance, storing an integer variable as double, or a string variable having maximum length of 5 characters as a str15, would be a waste of memory (see compress and data types). All of these tasks, in addition to the variable descriptions being removed from the second row to be placed into as variable labels, are accomplished by default when you type labcenswdi with option nstr(#). But, typed by itself, labcenswdi will display the default variable names along with the variable descriptions.
Note 1: To use labcenswdi on a WDI dataset, the data must be extracted in long form with countries or time in rows and series in columns.
Note 2: labcenswdi requires at least Stata 9.2.
+---------+ ----+ Options +----------------------------------------------------------
nstr(#) specifies the number of identifier (string) variables present in the dataset. These variables are assumed to be at the beginning of the dataset and will not be converted from string to numeric even if they have numeric content. Option nstr(#) is required.
truncby(("text1") [("text2")]) specifies the set or sets of characters by which the variable definitions should be truncated. This is important since Stata will truncate all labels with length greater than 80 characters. Up to two sets of characters may be specified (see examples below). Conspicuously, the two sets of characters must be different. If your sets of characters contain space, consider using quotes.
truncwith(("text3") [("text4")]) specifies the set or sets of characters with which "text1") (and "text2") is (are) to be replaced. If truncwith() is not specified, truncby() returns the variables descriptions without "text1" (and "text2")
repdes((# "text5") [(# "text6")...]) specifies a list of Census variable descriptions to be replaced with user-defined variable descriptions. #corresponds to the #th variable description to be replaced and text5, (text6,...) is the text(s) to replace with. For example, specifying repdes((1 "Workers 16 and plus in Agr. Sector")) will replace the first Census variable description with the quoted text. Prior to specifying this option, users are encouraged to use the first syntax to decipher the order of the U.S. Census variable descriptions in a dataset. Also, if more than one variable definitions need to be replaced, their corresponding order numbers should be specified in ascending order. Option repdes() supercedes option truncby() if applied to the same variable descriptions.
force specifies that non-numeric character values of numeric variables be converted to missing values. If the numerical variables in your dataset take on non-numeric characters such as (D), NA, -, .., and ND, you should specify the force option.
comma specifies that 1000-separator commas be removed from numbers displaying them. You need to specify this option if one or more variables take on values with 1000- separator commas, or those variables will not be converted from string to numeric.
saving(filename [,sub_option])] specifies that the original variable descriptions be saved to the text file filename. If used, sub_option must equal replace to overwrite an existing file. When option saving() is specified, at the end of the process, the file name (path included) is displayed as a link which, if clicked on, displays the file contents.
+----------+ ----+ Examples +---------------------------------------------------------
1) Example with a US Census 2000 dataset
Load a US Census 2000 Summary File 1 dataset
. insheet using http://fmwww.bc.edu/repec/bocode/d/dc_dec_2000_sf1_u_data1.txt, names clear delimiter(|)
. drop geo_id sumlevel
Display the all the variables and their descriptions
. labcenswdi
Manage the data by copy-pasting the following lines of code into your do-file editor
. labcenswdi fips county tot_hhs hhu18 fhhu18 fhhu18m fhhu18o fhhu18om fhhu18of /// nfhhu18 nfhhu18m nfhhu18f hhn18 fhhn18 fhhn18m fhhn18o fhhn18o > m fhhn18of /// nfhhn18 nfhhn18m nfhhn18f, nstr(2) saving(fem_hh, replace) /// truncby(("Households: Households with one or more people unde > r 18 years") /// ("Households: Households with no people under 18 years")) /// truncwith(("HH w/ 1+ person <18") ("HH w/o people <18")) /// repdes((1 "County Fips Code") (2 "County Name") /// (8 "HH w/ 1+ person <18; Family HH; Other family; Male househo > lder") /// (9 "HH w/ 1+ person <18; Family HH; Other family; Female house > holder") /// (17 "HH w/o people <18; Family HH; Other family; Male HHer no > wife present") /// (18 "HH w/o people <18; Family HH; Other family; Female HHer; > no husband present"))
. describe
2) Example with a long form WDI dataset downloaded with countries or time in rows and series in columns
Convert to numeric all string variables with numerical contents, rename all variables, replace the first three variable definitions (or series descriptors) with new ones, and use all variable definitions as variable labels
. labcenswdi country code year tractsk fertilha gdpcnst gdpcur /// gdppg agland irrigpct croplnd popdens popg ruraldens trade > /// urbpg, nstr(3) force repdes((1 "Name of Countries") /// (2 "Country code") (3 "Year"))
-----------------------------------------------------------------------------
P. Wilner Jeanty, Research Economist, Dept. of Agricultural, Environmental, and Development Economics, The Ohio State University
Email: jeanty.1@osu.edu.
Also see