Proposal for a Paper "Enhancing Access to Information in Economics via the RePEc Project" for the Journal of Economic Perspectives Christopher F. Baum (Boston College) Sune Karlsson (Stockholm School of Economics) Thomas Krichel (University of Surrey) Christian Zimmermann (Universit\'e du Qu\'ebec \`a Montr\'eal) January 2000 We propose to discuss initiatives that enhance free information access via the Internet for researchers in economics. The Journal of Economic Perspectives has already made a distinctive contribution to that topic by publishing the first paper about Economics resources on the Internet (Goffe (1994)). Although much of what is contained in that paper is now dated, but the pioneering aspect of that publication cannot be overestimated. In 1997 the Journal published a related paper by Goffe and Parks, with a narrower focus concentrating on the consequences of technical and social innovations that will impact the production and consumption of economic research. While the technical infrastructure of much that is described in the paper is reality today, much of the social environment has not caught up with these technological innovations. In this paper, we aim to narrow the focus somewhat further by discussing an ongoing initiative that enhances the scholarly communications process in economics. Via this initiative, researchers' departments and research institutes may participate in a decentralized archival scheme which makes information about their research papers (working papers or preprints) accessible via the Internet. If the papers are accessible on the Internet, the archive may also contain the appropriate addresses from which they may be downloaded. The downloadable information can include not only the paper, in a variety of formats, but ancillary files associated with the paper, such as datasets and computer programs. The scheme may also be used for the distribution of software components that their authors wish to share, even if the components are not associated with a specific research paper. Our emphasis in describing this scheme differs from that of Goffe and Parks (1997); rather than focusing on processes that are technically feasible but have not yet been widely implemented, we discuss current developments that are being used today by over 100 departments and research institutes worldwide. A strong rationale for improving the scholarly communications process in economics lies with the sizable publication delays common in leading economics journals. It is common that a paper takes over three years from submission to publication in an academic journal (not counting rejections). The timely dissemination of research has therefore become increasingly difficult. For this reason, economists cannot rely on journals to keep abreast of the frontier of research. Prepublication through working papers or discussion papers is now commonplace. The dissemination of those materials has been greatly enhanced by researchers' widespread establishment of Internet Web and FTP servers. However, this causes a long-standing problem in information retrieval to reemerge: given that there are tens of millions of accessible Web pages, any search engine will have limited success in locating a specific author or paper, let alone a specific topic, without the confusion of thousands of irrelevant 'hits'. Even the establishment of classification schemes such as EDIRC (Christian Zimmermann's list of Economics Departments, Institutes and Research Centers on the Internet at http://ideas.uqam.ca/EDIRC/) will only aid a researcher to locate a specific department; if the author has relocated, or has circulated the research paper through her coauthor's department or a research institute, it may be very difficult to track down a paper. A dissertation student embarking on an on-line literature search for papers related to her proposed topic is facing an even greater challenge. Various researchers have constructed and posted their own lists of "useful links," but these lists are highly idiosyncratic. Perhaps the greatest drawback to a fully decentralized scheme, in which every department (or individual within a department) "publishes" their own Web-accessible materials, is the tendency for current research to circulate within a small circle of insiders in the absence of broader publicity for specific papers. To widen the circle and improve access by all researchers to relevant materials, it is important to improve the access to working papers--and to do so in such a way that does not require a potential user to subscribe and pay for the service. Access to these materials for less well-advantaged researchers in LDCs is an important component of the strategy; many graduate programs are benefiting today from a supply of well-tooled students from Third World countries, and access to leading research will enhance economic skills in those countries, and reduce the "brain drain" resulting from inadequate research facilities. The need to improve scholarly communication is not unique to economics, and in many other disciplines different models have been adopted. For instance, the centralized archive is heavily used in high-energy physics (http://xxx.lanl.gov), and to a lesser degree in computer science and mathematics. Such an archive involves a significant infrastructure and staff. The impediments to creation of that framework in our discipline are not technical, but economic in nature. There is little incentive for any single institution to bear the cost of establishing such a digital library, cataloging and housing a vast and rapidly-expanding collection of research materials, and providing the technical means for its dissemination throughout the Web. However, since every institution will benefit from participation in such an effort, we may solve this incentive problem by creating a virtual collection via a network of linked metadata archives. Each institution need only maintain their own collection of metadata describing research and instructional materials using a set of standardized templates--a modest and affordable effort. This idea is not new; library scientists developed such methods (such as the MARC record format) decades ago, and the concept underlies the worldwide interchange of information on cataloged books, journals and media. Free information access to many institutions' working papers and publications is being provided today by the RePEc project. The acronym RePEc stands for Research Papers in Economics. However the database also contains records on authors and institutions. It functions in a very decentralized way: a central archive holds minimal information about the other participating archives (basically where they are hosted on the Internet). Each of the other archives hold, on their FTP or HTTP servers, the necessary bibliographic information in a common format, including links to online versions of the texts, if available. The participating archives have full authority over their data. For example, they can withdraw a report or post a revision at their own will as their information is housed on their own server. This is a major advantage of the decentralized archival scheme, since it avoids the coordination problems inherent in an archival system where a central staff must process each request for the alteration of the metadata. In our system, all the metadata are held locally, and changes to those locally controlled data are immediately propagated through the system. Close to 120 archives in 20 countries currently participate in RePEc, some of them representing several institutions. About 80 universities contribute their working papers, including Berkeley, Boston College, Brown University, Maryland, MIT, Iowa, Iowa State, Ohio State, UCLA, and Virginia. The RePEc collection also contains information on all NBER Working Papers, CEPR Discussion Papers, the contents of the Fed in Print, Bank of England Working Papers, and several paper series from the IMF and OECD, as well as the contributions of many other research centers and institutes worldwide. The bibliographic templates describing each item current provide for paper, articles, and software components. The article templates are used to fully describe published articles, and are currently in use by Econometrica, Journal of Applied Econometrics, Canadian Journal of Economics, Federal Reserve Bulletin, RAND Journal of Economics and IMF Staff Papers, to name only a few of the participating journals. Participation does not imply that the articles are freely available, but if the person's institution has made the requisite arrangements with publishers (e.g. JSTOR for back issues of Econometrica or Journal of Applied Econometrics), RePEc will contain the links to access the material. The RePEc collection of metadata also contains links to several hundred "software components"--functions, procedures, or code fragments in the Stata, Mathematica, MATLAB, Octave, GAUSS, Ox, and RATS languages, as well as code in FORTRAN, C and Perl. The ability to catalog and describe software components affords users of these languages the ability to search for code applicable to their problem--even if it is written in a different language. Language-specific software archives, such as those maintained by individual software vendors or volunteers, do not share that breadth. Since many programs in high-level languages may be readily translated from, say, GAUSS to MATLAB, this breadth may be very welcome to the user. And, of course, the description of each software component will contain the link to retrieve it from the archive. Although RePEc constitutes a "virtual archive" of metadata describing tens of thousands of working papers, journal articles and software components, it does not provide a user interface per se. The RePEc metadata are in the public domain, so that a variety of "user services" may be established which take these data and present them on the Web with different user interfaces. The only restriction placed by the RePEc team on use of the metadata is that they may not be sold or incorporated into a product that is sold. RePEc services like IDEAS (http://ideas.uqam.ca/) regularly retrieve the information at each archive, process it into a database and make it available on the web, where it can browsed or searched. IDEAS is the most popular of a dozen services using part or all of this data, receiving over 500,000 page hits a month. Its database includes about 62,000 working papers, 14,000 articles and over 400 software components from almost 1,000 publication series, with over 21,000 items available online. RepEc also provides a free current notification service, New Economics Papers (NEP), that disseminates information on the latest online entries in the RePEc database through email. NEP is a family of about 40 field-specific mailing lists maintained by specialists in those fields. Usually, it takes less than a week from the moment a paper appears on the web until its presence is announced in email. To view a list of the available field-specific lists and register for a free subscription, please visit http://netec.mcc.ac.uk/NEP/. How your department may participate in RePEc If your department chooses to contribute to the RePEc database, the bibliographic data for your working paper series will reside on your server (either an anonymous FTP server or an HTTP server). You have total control over the contents, and how frequently they are updated. The RePEc templates describing each paper are plain text files, with a few required entries: e.g. Author-Name: Bill Clinton Author-Email: president@whitehouse.gov Title: My Foreign Policy Abstract: This text is a summary of my policy... Series: Name of the series File-URL: where the material is kept on the Web Handle: a code that will permit RePEc software to integrate this template These files hold all the necessary information about the paper series and its contents. If a paper is online, its address is included. These text files are then regularly gathered by the various RePEc scripts so that your info can be merged into the RePEc database and be made available through services such as IDEAS and NEP. Note that participating in RePEc is totally free, and will remain so. You do not have to send any material anywhere; you need only maintain your template files on your server. Your online papers stay on your server, since RePEc only links to them. You retain the liberty to post revisions or to withdraw a paper. RePEc provides a library of scripts that can generate nicely-formatted HTML listings from the text file listings of the papers for your departmental web pages. Or you may just link to the IDEAS output. All the details for setting up these template files are provided at http://ideas.uqam.ca/ideas/stepbystep.html. Additional help is available on the IDEAS web page. You may also contact members of the RePEc team for any assistance or advice, via email to repec@netec.mcc.ac.uk or zimmermann.christian@uqam.ca. References Goffe, William L., (1994) "Computer network resources for economists", Journal of Economic Perspectives, Vol.8, No.3, 97-119. Goffe, William L. and Parks, Robert P. (1997), "The future information infrastructure in economics" Journal of Economic Perspectives, Vol. 11, No. 3, 75-94.