RePEC Team meeting, 12 December 2003

A meeting of RePEc Team members was held at the University of Connecticut, Storrs on 12 December 2003.
Present: Kit Baum, Sune Karlsson, Thomas Krichel, Christian Zimmermann (host).

1. Experience with new maintainers

KB spoke of a variety of difficulties which new archive maintainers (NAMs) are encountering which usually involve repeated emails with him or with CZ or SK. One issue related to the team's desire to prevent proliferation of archives, particularly within an institution; KB recounted that on several recent occasions maintainers were adamant in indicating that they could not have a working relationship with an existing archive maintainer in a separate administrative unit, and needed to have a second archive. To some degree, this may represent local IT restrictions on access to web- or ftp-server directories, which may preclude multiple users from having rw access to the relevant directories. It was agreed that we should continue to dissuade multiplication of archives whereever feasible, and continue to refuse requests for purely personal archives.

KB also expressed concern that the existing 'step by step' instructions to which he refers NAMs are not on the "RePEc for Dummies" level that is perhaps appropriate. In particular, we need detailed step by step instructions for dealing with the "non-indexable HTTP directories" problem, which most commonly arises with NAMs running Microsoft IIS, but has occurred with Apache as well. Most NAMs do not have sufficient privilege over their servers to be able to install appropriate per-directory privileges (which would, in Apache terms, require addition of a stanza to httpd.conf -- a privilege jealously guarded by IT managers for a university-wide server). Thus, reworking of the step-by-step pages to provide more specific instructions for the initial setup is a high priority. These pages should be housed on, but KB would like assistance from the rest of the team in their reconstruction. The prospect that our software might try to locate 'papers.rdf' or 'seriesname.rdf' in the papers subdirectory, rather than requiring indexability, was rejected as not being technically feasible, and limiting: if a NAM created "papers2.rdf" or "papers2004.rdf" next year, it would not be located nor mirrored, and would require more human intervention to get it straight. Better to set it up as indexable from the start. SK also pointed out that our detailed instructions should be server-specific, since Apache will look for index.html (or default.html), whereas Microsoft IIS will search for default.htm.

A possibly useful development would be a set of CGI scripts which would write a valid archive-template, series-template, and sample paper-template as text files to be downloaded to the NAM's desktop. Detailed instructions would then direct the NAM to ensure that the files had type .rdf and that the appropriate subdirectory is created. This would ensure that the key features of these three template types would be customized and internally consistent for the NAM's initial setup. We want to stress that the series name is highly important (more so than the archive name), since it will be displayed by RePEc services; thus something like "Working Papers" should be discouraged, especially if the Provider-Institution is not recognizable (merely an acronym, for instance). These CGI scripts could also write the index.html (or default.htm)) files, provided that the NAM could specify which flavor of server she would employ.

A recent development that created some concern was the creation of two "vanity articles" series, e.g. "Articles by John Smith". The article template does not provide the ability to cite the journal name in which such an article is actually published (as opposed to the pseudo-journal "Articles by..."), so that this format is unworkable and clutters the journal listings with series that are clearly not published journals. Both vanity articles' authors have now been contacted and encouraged to recast these as "Papers by..." series, with the Publication-Status field used to properly cite the journal in which they appeared.

2. Potential collaboration with AEA

The group then broke for an excellent Chinese luncheon. On the way to and from the luncheon, and during, KB discussed the broad elements of his further communications with the AEA Electronic Publishing committee's chair, John Rust, and member Bob Parks. The original response from KB to Rust clearly stated that an EconWPA-like service, integrated with RAS, was a key element in RePEc's future strategy; we continue to see the need for a personal submission service to complement the use of institutional archives. At the same time, KB's communications with Rust stress RePEc's preference for institutional archives: it is more efficient for one person to catalog 20 departmental working papers than it is for 20 authors to each self-catalog their work, even if huge improvements in the user interface of a EconWPA-like service were forthcoming. Given that some authors would likely choose not to participate, a heavy reliance on personal submission (as Parks has characterized the real motivation for AEA's inquiries) would likely result in a less comprehensive coverage of many items. Parks argues that the personal submission route is more reliable, since the only EconWPA papers included in RePEc are those actually housed on Parks' server, removing the problem of broken links. However, except in rare cases, an entire department's series ought not to disappear (only when a server is reconfigured and a new archive URL must be installed, etc.) We also spoke of the topic of "certifiability": to what degree does inclusion in an archive imply anything about quality of the work? Rust has suggested that conference papers might be considered certified at least by the program committee's choices, and would be rated more highly than departmental series (which may be essentially open to members of that department), which may be in turn more highly rated than a personal submission archive. That is not to denigrate the latter, but only to indicate that (like much of the Web) there is no second opinion of quality associated with self-publication.

3. Enhancement of journal and citation metadata

KB indicated that he had signed a letter of understanding with Kluwer Academic Publishers, who have agreed to our terms of providing access to their metadata as individual series providers see fit. We also spoke at length of the practicability of enhancing our bare-bones bibliographic references for an important title such as Journal of Econometrics, under the notion that author, title, citation information and a link to the journal homepage are fair use, as long as abstracts are not included. It was agreed that KB should collect that metadata for JoE so that the many highly rated econometricians in RAS may claim their works in this journal.

With respect to the CitEc project, TK expressed interest in gathering any and all full-text PDFs that might be used for citation harvesting. KB will place such PDFs (including the latest CDROM contents from AER, JEL, JEP) on the carnation machine and notify JMBC.

4. Concerns with RAS and ACIS development

CZ recounted that problems with the existing RAS system, implemented in C, are an increasing concern. Beyond the interface flaws, a number of technical issues are arising that prevent users from properly accessing and updating their records, requiring additional manual intervention from the RePEc Team. The replacement software, ACIS Phase I, will largely replicate the functionality of RAS in a more maintainable language (perl) and will deal with some of the interface details and search issues. The current version reads and writes the existing ReDIF format, but will also write AMF XML for future use.

TK described his plans for Phase II of ACIS, which would extend authors' ability to claim authorship of their own works to the ability to claim citations of those works in other items in RePEc, and identification of citations to authors works by materials both within and outside of the RePEc database. This phase would not commence until RAS (the old HoPEc system) is replaced by ACIS Phase I.

CZ expressed concern that numerous enhancements should be considered in ACIS Phase I to overcome existing weaknesses. For instance, the new author registering in RAS (ACIS-I) should be directed to a screen in which s/he is invited to sign up for NEP lists. Ideally, updating of the personal information in ACIS-I should also be able to update email addresses in NEP, so that a user need not make this change in multiple services. It was suggested that the ACIS-I update step should create a log of name|date|old email|new email so that such a coordination feature could be installed in the future.

5. Issues with NEP

Other aspects of NEP were discussed at some length. The infrequency of some NEP lists' outputs was a concern -- basically a management problem, in which NEP managers are not holding their volunteer editors to high standards -- as well as the unprofessional appearance of the NEP emails (containing numerous mangled high-ASCII characters in authors' names, for instance). SK argued that we should filter the reports to enhance their appearance, removing tab characters and checking URLs where feasible. TK proposed that only freely downloadable materials should be included. Other participants strongly disagreed; excluding NBER and CEPR papers, for instance, would degrade the information content for all readers, and would annoy those who have access to the papers anyway. We can still URL-check from a site with access to these two series. It was also agreed that for future archival use the creation of a NEP report should write a machine-readable columnar text file, with listname | pubdate | item-handle records so that changes in archive handles (from fth, for instance) could be readily tracked without parsing the full-text report. A text database of these NEP contents would save considerable time in regenerating a given NEP report, or determining where a new item had been presented.

6. The move from ReDIF to AMF

Discussion of a move to AMF indicated that rech would be updated to read AMF, so that we would have the ability to accept NAMs who wanted to provide AMF metadata rather than ReDIF content. TK indicated that he would be implementing the Geneva protocol, which would essentially combine current archive- and series-templates into a single data structure. For the foreseeable future there will be a need for a converter than can read an AMF-formatted archive and return a data structure in ReDIF. TK indicated that he had prepared a "private namespace" that would allow the descriptors uniquely employed by ReDIF-Software templates to be properly documented in the AMF OAI-accessible form of the RePEc database. TK argued that we should be promoting the OAI accessibility of all materials contributed to RePEc as a selling point to archive maintainers: place your materials in RePEc (currently via ReDIF) and it will be accessible to any OAI harvester. KB argued (and SK, CZ concurred) that this is somewhat misleading since there is at present no mechanism for series retrieval from the RePEc OAI archive (as exists, e.g., in bepress' CDL OAI repository; we harvest from that repository via OAI directives which return the contents of a series). The consensus of the meeting was that the addition of a series-retrieval function (in OAI terms, the definition of sets) was essential before promotion of RePEc's OAI capabilities was undertaken.

7. The need for RAS-EconWPA integration

Over dinner at a local Italian restaurant, one further topic was taken up with vigor: the necessity of rescheduling the deliverables in ACIS Phase II (which involves citation "claiming") and ACIS Phase III (which would integrate the RAS functionality with the personal submission facility of an EconWPA). Even without interaction with the AEA e-Publications committee, it was agreed that the current state of affairs--KB directing those desiring a personal archive to either start a departmental archive or work with the EconWPA interface--is counterproductive. The existing EconWPA interface has been characterized as outdated, cumbersome and unintuitive, and its design precludes proper interaction with RAS. Many authors who submit via EconWPA complain that they then immediately try to find the paper in RAS and cannot. Integration of these two series, although not without technical hurdles, would allow us to provide a modern, reliable and highly integrated service. Users would have one point of contact for personal data, institutional affiliations, claiming of works (and ultimately citations) and entry of new works, where inclusion in an institutional repository is not feasible. With the added stimulus that such an integrated service might well dovetail nicely with the AEA E-Publications committee's recommendations, and attract their support, the concensus of the meeting led to a recommendation that TK explore the options for bringing forward the Phase III integration tasks, perhaps addressing them in parallel with the externally-sponsored citation tasks of Phase II.

CZ suggested that an invitation be offered to all RePEc Team members and NEP editors to gather at the ASSA San Diego meetings.

Respectfully submitted
Kit Baum
19 December 2003