Introductory Econometrics datasets

This file describes the data sets accompanying Introductory Econometrics: A Modern Approach, by Jeffrey M. Wooldridge, published by South-Western College Publishing, 2000.

The raw data sets are in files ending in .RAW, in ASCII format. Each row of the file represents a separate observation, with each column representing a different variable. (Therefore, the number of columns in each file equals the number of variables listed in the corresponding .DES file, which I describe below.) This is the standard format for storing data files, and should allow you to read the files using any spreadsheet or econometrics software package.

For time series data sets, the ordering of the observations is chronological, with the earliest time period in the first row and the most recent time period in the last row.

The independently pooled cross sections are arranged so that the cross section for the first year is followed by the cross section for the second year, and so on. The total number of rows is the sum of the cross-sectional sample sizes across all years.

Most panel data sets are stored as described in Chapter 13, where the observations for all time periods on the first cross-sectional unit are followed by all observations for the second unit, and so on. Within each cross-sectional unit, the time periods are chronological, with the earliest period coming first. For data sets arranged in this way, the total number of records is NT, where N is the number of cross-sectional units and T is the number of time periods for each unit. A few two-period panel data sets have the observations for both time periods in the same record, in which case the number of observations is simply the number of cross-sectional units. See Chapter 13 for more details.

Each raw data file comes with a description file, with a .DES suffix. The first entry in the .DES files gives a list of variable names -- as given in the text -- that can be used in a statement to read the data into an econometrics package. Next, the number of observations (rows in the data set, denoted "Obs.") is given along with a brief description of each variable.

All missing data are denoted by "." and string variables are enclosed in double quotes. All data files were outputted using Stata files, and so some precision may be lost in the rounding. This should not materially affect any of the results.

Many of the data sources are described in the text, but several are not. (The ones not described typically did not come from published papers or books.) Eventually, each raw data file will have a corresponding .SRC file that provides, where possible, the exact source of the data. Some of the .SRC files will also contain suggestions as to how to use the data sets.

I have included a few data sets, such as APPLE.RAW (telephone survey data on demand for regular and ecologically-friendly apples) and TRAFFIC2.RAW (data on traffic accidents in California) that are not explicitly used in either an example or the problems. This gives you a few additional data sets for class projects that the students will not have already worked with.

I hope you find the data sets to be useful. If you have questions regarding the data sets, please feel free to contact me at wooldri1@pilot.msu.edu.

Jeff Wooldridge

Annotated Listing of Data Sets available in Stata 6.0 format