Proposal for a Paper

"Enhancing Access to Information in Economics via the RePEc Project" 

for the Journal of Economic Perspectives 

Christopher F. Baum (Boston College)
Sune Karlsson (Stockholm School of Economics)
Thomas Krichel (University of Surrey)
Christian Zimmermann (Universit\'e du Qu\'ebec \`a Montr\'eal)

January 2000

We propose to discuss initiatives that enhance free information
access via the Internet for researchers in economics. The Journal of
Economic Perspectives has already made a
distinctive contribution to that topic by publishing the first paper
about Economics resources on the Internet (Goffe (1994)). Although
much of what is contained in that paper is now dated, but the
pioneering aspect of that publication cannot be
overestimated.  In 1997 the Journal published a related paper by
Goffe and Parks, with a narrower focus concentrating on
the consequences of technical and social innovations that will
impact the production and consumption of economic research. While
the technical infrastructure of much that is described in the paper
is reality today, much of the social environment has not caught up
with these technological innovations.

In this paper, we aim to narrow the focus somewhat further by
discussing an ongoing initiative that enhances the scholarly
communications process in economics. Via this initiative,
researchers' departments and research institutes may participate in
a decentralized archival scheme which makes information about their
research papers (working papers or preprints) accessible via the
Internet. If the papers are accessible on the Internet, the archive
may also contain the appropriate addresses from which they may be
downloaded. The downloadable information can include not only the
paper, in a variety of formats, but ancillary files associated with
the paper, such as datasets and computer programs. The scheme may
also be used for the distribution of software components that their
authors wish to share, even if the components are not associated
with a specific research paper. Our emphasis in describing this
scheme differs from that of Goffe and Parks (1997); rather than
focusing on processes that are technically feasible but have not yet
been widely implemented,  we discuss current developments that are
being used today by over 100 departments and research institutes
worldwide.

A strong rationale for improving the scholarly communications
process in economics lies with the sizable publication delays common
in leading economics journals. It is common that a paper takes over
three years from submission to publication in an academic journal
(not counting rejections). The timely dissemination of research has
therefore become increasingly difficult.  For this reason,
economists cannot rely on journals to keep abreast of the frontier
of research.  Prepublication through working papers or discussion
papers is now commonplace. The dissemination of those materials has
been greatly enhanced by researchers' widespread establishment of
Internet Web and FTP servers. However, this causes a long-standing
problem in information retrieval to reemerge: given that there are
tens of millions of accessible Web pages, any search engine will
have limited success in locating a specific author or paper, let
alone a specific topic, without the confusion of thousands of
irrelevant 'hits'. Even the establishment of classification schemes
such as EDIRC (Christian Zimmermann's list of Economics Departments,
Institutes and Research Centers on the Internet at
http://ideas.uqam.ca/EDIRC/) will only aid a researcher to locate a
specific department; if the author has relocated, or has circulated
the research paper through her coauthor's department or a research
institute, it may be very difficult to track down a paper. A
dissertation student embarking on an on-line literature search for
papers related to her proposed topic is facing an even greater
challenge. Various researchers have constructed and posted their own
lists of "useful links," but these lists are highly idiosyncratic.
Perhaps the greatest drawback to a fully decentralized scheme, in
which every department (or individual within a department)
"publishes" their own Web-accessible materials, is the tendency for
current research to circulate within a small
circle of insiders in the absence of broader publicity for specific
papers. To widen the circle and improve access by all researchers to
relevant materials, it is important to improve the access to
working papers--and to do so in such a way that does not require a
potential user to subscribe and pay for the service. Access to these
materials for less well-advantaged researchers in LDCs is an
important component of the strategy; many graduate programs are
benefiting today from a supply of well-tooled students from Third
World countries, and access to leading research will enhance
economic skills in those countries, and reduce the "brain drain"
resulting from inadequate research facilities.

The need to improve scholarly communication is not unique to
economics, and in many other disciplines different models have been
adopted. For instance, the centralized archive is heavily used in
high-energy physics (http://xxx.lanl.gov), and to a lesser degree in
computer science and mathematics. Such an archive involves a
significant infrastructure and staff. The impediments to creation of
that framework in our discipline are not technical, but economic in
nature. There is little incentive for any single institution to bear
the cost of establishing such a digital library, cataloging and
housing a vast and rapidly-expanding collection of research
materials, and providing the technical means for its dissemination
throughout the Web. However, since every institution will benefit
from participation in such an effort, we may solve this incentive
problem by creating a virtual collection via a network of linked
metadata archives. Each institution need only maintain
their own collection of metadata describing research and
instructional materials using a set of standardized templates--a
modest and affordable effort. This idea is not new; library
scientists developed such methods (such as the MARC
record format) decades ago, and the concept underlies the worldwide
interchange of information on cataloged books,
journals and media. 

Free information access to many institutions' working papers and
publications is being provided today by the RePEc project. 
The acronym RePEc stands for Research Papers in Economics.  However
the database also contains records on authors and institutions.  It
functions in a very decentralized way: a central archive holds
minimal information about the other participating archives
(basically where they are hosted on the Internet). Each of the other
archives hold, on their FTP or HTTP servers, the necessary
bibliographic information in a common format, including
links to online versions of the texts, if available. The
participating archives have full authority over their data. For
example, they can withdraw a report or post a revision at their own
will as their information is housed on their own server. This is a
major advantage of the decentralized archival scheme, since it
avoids the coordination problems inherent in an archival system
where a central staff must process each request for the alteration
of the metadata. In our system, all the metadata are held locally,
and changes to those locally controlled data are immediately
propagated through the system.

Close to 120 archives in 20 countries currently participate in
RePEc, some of them representing several institutions. About 80
universities contribute their working papers, including Berkeley,
Boston College, Brown University, Maryland, MIT, Iowa, Iowa State,
Ohio State, UCLA, and Virginia. The RePEc collection also contains
information on all NBER Working Papers, CEPR Discussion Papers, the
contents of the Fed in Print, Bank of England Working Papers, and
several paper series from the IMF and OECD, as well as the
contributions of many other research centers and institutes
worldwide.

The bibliographic templates describing each item current provide for
paper, articles, and software components. The article templates are
used to fully describe published articles, and are currently in use
by Econometrica, Journal of Applied Econometrics, Canadian Journal
of Economics, Federal Reserve Bulletin, RAND Journal of Economics
and IMF Staff Papers, to name only a few of the participating
journals. Participation does not imply that the articles are freely
available, but if the person's institution has made the requisite
arrangements with publishers (e.g. JSTOR for back issues of
Econometrica or Journal of Applied Econometrics), RePEc will contain
the links to access the material. 

The RePEc collection of metadata also contains links to several
hundred "software components"--functions, procedures, or code
fragments in the Stata, Mathematica, MATLAB, Octave, GAUSS, Ox, and
RATS languages, as well as code in FORTRAN, C and Perl. The ability
to catalog and describe software components affords users of these
languages the ability to search for code applicable to their
problem--even if it is written in a different language.
Language-specific software archives, such as those maintained by
individual software vendors or volunteers, do not share that
breadth. Since many programs in high-level languages may be readily
translated from, say, GAUSS to MATLAB, this breadth may be very
welcome to the user. And, of course, the description of each
software component will contain the link to retrieve it from the
archive.

Although RePEc constitutes a "virtual archive" of metadata
describing tens of thousands of working papers, journal articles and
software components, it does not provide a user interface per se.
The RePEc metadata are in the public domain, so that a variety of
"user services" may be established which take these data and present
them on the Web with different user interfaces. The only restriction
placed by the RePEc team on use of the metadata is that they may not
be sold or incorporated into a product that is sold. 

RePEc services like IDEAS  (http://ideas.uqam.ca/) regularly
retrieve the information at each archive, process it into a database
and make it available on the web, where it can browsed or searched.
IDEAS is the most popular of a dozen services using part or all of
this data, receiving over 500,000 page hits a month. Its database
includes about 62,000 working papers, 14,000 articles and over 400
software components from almost 1,000 publication series, with over 
21,000 items  available online. 

RepEc also provides a free current notification service, New
Economics Papers (NEP), that disseminates information on the latest
online entries in the RePEc database through email. NEP is a family
of about 40 field-specific mailing lists maintained by specialists
in those fields. Usually, it takes less than a week from the moment
a paper appears on the web until its presence is announced in email.
To view a list of the available field-specific lists and register
for a free subscription, please visit  http://netec.mcc.ac.uk/NEP/.

How your department may participate in RePEc

If your department chooses to contribute to the RePEc database, the
bibliographic data for your working paper series will reside on your
server (either an anonymous FTP server or an HTTP server).  You have
total control over the contents, and how frequently they are
updated. The RePEc templates describing each paper are plain text
files, with a few required entries: e.g.  

Author-Name: Bill Clinton 
Author-Email: president@whitehouse.gov
Title: My Foreign Policy 
Abstract: This text is a summary of my policy...
Series: Name of the series
File-URL: where the material is kept on the Web
Handle: a code that will permit RePEc software to integrate this
template
 
These files hold all the necessary information about the paper
series and its contents. If a paper is online, its address is
included. These text files are then regularly gathered by the
various RePEc scripts so that your info can be merged into the 
RePEc database and be made available through
services such as IDEAS and NEP.  Note that participating in RePEc is
totally free, and will remain so. You do not have to send any
material anywhere; you need only  maintain your template files on
your server. Your online papers stay on your server, since RePEc 
only links to them. You retain the liberty to post revisions or to
withdraw a paper. RePEc provides a library of scripts that can
generate nicely-formatted HTML listings from the text file 
listings of the papers for your departmental web pages. Or you may
just link to the IDEAS output. All the details for setting up these
template files are provided at
http://ideas.uqam.ca/ideas/stepbystep.html.
Additional help is available on the IDEAS web page. You may also
contact members of the RePEc team for any assistance or advice, via
email to  repec@netec.mcc.ac.uk or zimmermann.christian@uqam.ca.

References

Goffe, William L., (1994) "Computer network resources for
economists", Journal of Economic Perspectives, Vol.8, No.3, 97-119.

Goffe, William L. and Parks, Robert P. (1997), "The future
information infrastructure in economics" Journal of Economic
Perspectives, Vol. 11, No. 3, 75-94.