------------------------------------------------------------------------------- help for wdireshape -------------------------------------------------------------------------------
Title
wdireshape --- Reshapes the World Development Indicators (WDI) database for panel data, seemingly unrelated regression, or cross-sectional analysis
+--------------------+ ----+ Table of Contents +-----------------------------------------------
Syntax General description of wdireshape Description of the options Examples References Acknowledgments Author information -------------------------------------------------------------------------------
Check the number of indicators and their order of appearance in a WDI dataset
wdireshape, sername(varname)
Reshape the dataset
wdireshape newvarlist, prepend(letter(s)) ctyname(varname) sername( varname) ctycode(varname) sercode(varname) [other options]
options Description ------------------------------------------------------------------------- Options prepend(letter(s)) indicate the prepending letter(s) ctyname(varname) indicate the variable holding the country names sername(varname) indicate the variable holding the series names ctycode(varname) indicate the variable containing the country code elements sercode(varname) indicate the variable holding the series code elements byper(#) reshape the data using sub-periods startyr(#) indicate the first year of the time period endyr(#) indicate the last year of the time period byvar reshape the data variable by variable sur reshape in wide form for seemingly unrelated regression analysis cros reshape in wide form for cross-sectional analysis nstring(#) remove the WDI missing value symbols, the double dots (..) -------------------------------------------------------------------------
+-------------+ ----+ Description +------------------------------------------------------
wdireshape reshapes a Stata dataset obtained by insheeting a text (.csv) file downloaded from the World Development Indicators (WDI) website or extracted from the WDI CD-ROM. The new dataset has a structure suitable for panel-data analysis, seemingly unrelated regression (SUR), or cross-sectional modeling. The panel-data structure is known as long form and the SUR and cross-sectional structures are known as wide form. wdireshape, a wrapper for the Stata's official reshape command (see reshape), enables users to rename the indicators with names of their devising. However, the number of variable names supplied and their order must match those of the indicators in the spreadsheet. After insheeting the raw dataset to be reshaped, users can use the first syntax of wdireshape to determine the number of indicators as well as their order of appearance in the dataset. While reshaping the data, wdireshape places the WDI series descriptors into labels and attaches them to the user-supplied variable names. Note that, for long series descriptors, labels will be truncated to 80 characters.
+-------------------+ ----+ Important Remarks +------------------------------------------------
Before extracting a .csv file from the WDI web site or a recent CD release, users must choose a data orientation with series or countries in rows and time in columns. A WDI dataset downloaded using this orientation is ready to be insheeted since the years are prepended with two letters "yr". Thus, no data preparation is needed. However, the WDI missing value symbols, the double dots (..), must be removed. Otherwise, Stata will treat as string data in any columns containing those double dots. At the users' request, wdireshape will remove the double dots.
Older CD releases, such as the WDI-2005 CD-ROM, produce .csv files that must be managed prior to insheeting. In particular, the years must be prepended with a letter, which can be done in a spreadsheet or by using the procedure suggested in Baum and Cox (2007). The raw or prepared dataset saved as a .csv file can be imported into Stata using the insheet command as follows:
. insheet using filename.csv, names clear
Data may also be imported into Stata by copying from Excel and pasting directly into the Stata data editor.
+------------------+ ----+ Required options +-------------------------------------------------
prepend(letter(s)) specifies the prepending letter(s). For example, specify prepend(yr) if yr are the prepending letters.
sername(varname) specifies the variable holding the series names.
ctyname(varname) specifies the variable containing country names.
ctycode(varname) specifies the variable containing the country code elements.
sercode(varname) specifies the variable containing the series code elements.
With these required options specified, wdireshape will attempt to reshape the entire dataset at once, which is the default.
+------------------+ ----+ Optional options +-------------------------------------------------
byper(#) requires wdireshape to reshape the dataset 1 year, 5 years, or 10 years at a time, as long as the time span contains no gaps. One of these three values should be used with the byper(#) option. If either 5 or 10 is specified, wdireshape will account for the fact that the last subperiod may not be of 5 or 10 years. Also, Stata will check whether the current memory size is enough to reshape the data 5 or 10 years at a time.
startyr(#) specifies the first year of the time period.
endyr(#) specifies the last year of the time period.
Note 1: The byper(#), startyr(#), and endyr(#) options must be combined.
byvar specifies that the dataset to be reshaped one variable at a time, as proposed by Kossinets (2006). The byvar option may not be combined with byper(#), startyr(#), and endyr(#).
Due to memory issues, reshaping large datasets at once may not be successful. In such a case, Stata will prompt the user to specify the byvar or byper(#) option, or to increase the amount of memory allocated to Stata. Note that you can reset the size of memory only if you are using Stata/MP, Stata/SE, or Stata/IC.
sur requests a wide form suitable for SUR analysis (see SUREG). By default, the dataset is reshaped in long form for panel data analysis (see XTREG). When the sur option is specified, in the reshaped dataset, the country names are postfixed to the user-supplied variable names and are represented by c1, c2, and so. Describe the reshaped dataset if you want to know what countries c1, c2,...,cn represent. In Stata 10 or higher, you can just look at the variable labels in the variable window. The SUR-reshaped structure displays the years in rows and the variables, for each country, in columns.
cros requests a wide form amenable to cross-sectional analysis. The CROS-reshaped structure displays the country names in rows and the variables, for each year, in columns. Obviously, cros may not be combined with sur.
Note 2: When the sur or cros option is specified, Stata will complain if the resulting number of variables exceeds its limits, which are 99 for small Stata, 2047 for intercooled, and 32,767 for Stata/MP and Stata/SE.
nstring(#) indicates that the dataset contains the WDI missing value symbols, the double dots (..), and that they should be removed. # represents the number of identifier variables in the dataset. For example, nstring(4) must be specified when the dataset includes names and code elements for both countries and series as identifier variables. When the nstring(#) option is specified, if an error occurs for any reasons, the dataset to be reshaped needs to be reloaded before running wdireshape again. Otherwise, Stata will abort with a type-mismatch error.
Note 3: Depending on your computer system performance, in case of large datasets, reshaping 10 years at a time - as long as the time period is at least 10 years - may be faster than reshaping variable by variable. However, when the time period contains gaps, the byper(#) option will not work.
+----------+ ----+ Examples +---------------------------------------------------------
Check the number of indicators and their order of appearance in a WDI dataset . wdireshape, sername(seriesname)
Reshape the entire dataset at once . wdireshape v1-v10, prepend(yr) ctyname(countryname) sername(seriesname) ctycode(countrycode) sercode(seriescode)
. wdireshape gdpcap trade invest cropland tractor popdensity arable, prepend(yr) ctyname(countryname) /// sername(seriesname) ctycode(countrycode) sercode(seriescode)
Reshape variable by variable . wdireshape export import gdpcur gdpcnst foodindx irrig, prepend(yr) ctyname(countryname) /// sername(seriesname) ctycode(countrycode) sercode(seriescode) byvar
Reshape 5 years at a time . wdireshape myvar1-myvar16, prepend(yr) ctyname(countryname) sername(seriesname) ctycode(countrycode) /// sercode(seriescode) byp(5) start(1960) end(2004)
Reshape 10 years at a time, but for SUR analysis . wdireshape myvar1-myvar16, prepend(yr) ctyname(countryname) sername(seriesname) ctycode(countrycode) /// sercode(seriescode) byp(10) start(1960) end(2004) sur
Note 4: To obtain series of averages, first run wdireshape without specifying t > he sur or cros option, and then run paverage (see paverage if installed)
Baum, C.B. and N.J. Cox. 2007. Getting those Data into Shape. Stata Journal 7: > 268-271
Kossinets, G. 2006. http://www.columbia.edu/acis/eds/data_tools/tips/reshape_ma > nyvar.do
The World Bank Group. 2009. World Development Indicators (WDI) Online. http://publications.worldbank.org/WDI/
Thanks to Kit Baum for useful comments. Thanks to Simon Feeny for pointing out > the need to allow for quotes in the WDI series descriptors.
P. Wilner Jeanty, Dept. of Agricultural, Environmental, and Development Economics, The Ohio State University
Email to jeanty.1@osu.edu for any comments or suggestions.
Also see
Manual: [D] reshape