This document discusses a number of issues related to the implementation of the Statistical Data Catalog - Local Server Access as a collaborative effort of the University Libraries (Barbara Mento), the FMRC's Graduate Statistical Assistant Program (Kit Baum), and Academic Technology Support (Ted Gaiser). It is largely based on discussions of Mento and Baum on 17 Sep 2001, with subsequent interactions with Tom Babbin of ATS and the GSAs at the 20 Sep 2001 staff meeting.
The major design feature of the prototype DR system (implemented in Lasso/FileMaker Pro, and ported to Quest/DR last summer) that needs reworking is the classification of datasets' accessibility to the user (patron). The original DR schema included two fields, labelled "Available Format" and "Available Media", each of which was filled in from a pop-up list with a single entry. In the original system, these choices included:
Available Media:In discussing this classification, it has been recognized that this nomenclature is not sufficiently descriptive, in two senses. First, the meanings of the media choices are not clear. Second, both fields should not be single choices, but should allow for the indication that a given dataset may be available in multiple 'media' and 'formats'.
This has led to the determination to devise a new classification scheme, and reclassify the existing datasets into that scheme. Both of these fields are entered in Quest/DR as 516 'type of file notes', so that no change to the Quest/DR design is necessary. We may want to add a revised scheme to the seachable entries in Advanced Search at a later date.
In the discussion of 17 Sep, Mento and Baum determined that there were seven distinct values for "Access options", which then take the place of "Available media" in the nomenclature, describing how the dataset may be accessed:
Many datasets may be available in more than one of these formats. Some datasets that we might want to catalog will not be available in any of these formats: e.g. a dataset whose use requires special permission, or access from a specific off-campus location. This nomenclature does include datasets which may not be immediately accessible: e.g. a dataset for which you must register as a user to be given the URL from which it may be downloaded. That dataset would fall into one or more of categories 1-3, but with a special note indicating that this step is required.
Likewise, the "available format" of the original nomenclature, which only permitted entry of a single choice, must be expanded to include the indication that a number of different formats may be simultaneously available. For instance, a dataset may be web-accessible (category 1), web-downloadable in a statistical package format (category 2), and accessible with custom interface software from a CDROM (category 4). Although the proposed nomenclature will not categorize the format as a subfield of the access option, all accessible formats should be entered for a specific dataset. The new field should be titled "Data format", taking on the choices listed above under "available formats" as well as Excel.
It is clear that use of the Quest cataloging application requires significantly greater attention to detail, and manual data entry, than the FileMaker Pro/Lasso method, which provided a popup menu for defined lists. There is significant risk in manual data entry that spelling, capitalization, and punctuation may not agree across records, causing them to be viewed as different values by Quest. Baum agreed to produce a hardcopy template, giving the MARC record codes and possible values for each of the controlled fields, which may then be filled out and transcribed into the catalog application. This task cannot be completed until the above list entries for "Access options" are abbreviated into agreed phrases.
Baum and Mento agreed that the existing ad hoc structure of web pages describing one or more datasets, linked from Quest/DR records, should be replaced with a consistent set of pages, organized by the major headings of the (ICPSR/Columbia) classification scheme used in DR. Those pages will be produced for each category defined in the scheme--including those not currently populated--and existing datasets' web links moved to the new pages. Additions to Quest/DR--for instance, the additional ICPSR datasets being housed on Glenmount by ARS--will be web-documented on these pages, which will reside on fmwww.bc.edu/DR.
The Quest/DR records refer to these web pages, which characteristically contain links to the provider's website, links to the full codebook and PDF documentation, and the datasets themselves. The Quest/DR records should also contain the direct URLs to these materials, where practicable. If a logical 'dataset' contains a large number of physical datasets (e.g. 7 different chunks, in each of 3 stat package formats) it is not workable to place all of those direct links in Quest/DR. But the link to an online codebook or other documentation should be placed in Quest/DR.
Server directories containing documentation and datasets should be made accessible to authorized users (patrons) on a platform-neutral basis. This implies that all data described in the Statistical Data Catalog - Local Server Access should be accessible to users, irregardless of platform (Windows, Mac OS, UNIX, Linux), precluding the use of servers that cannot support FTP and/or HTTP protocols. It should be noted that some end-user access of datasets in a statistical package format will require the use of HTTP, rather than FTP, for greatest ease of use and efficiency. Documents stored on servers should be in plain text or PDF format; proprietary formats such as MS-Word or Excel should be avoided, with those documents saved as PDFs to maximize accessibility (e.g. Microsoft Office applications do not run in a UNIX nor Linux environment, and not all users have Star*Office available).
All datasets described in the Statistical Data Catalog - Local Server Access should be accessible to authorized users, irregardless of platform. If compressed file formats are used, they should be standard gzip or zip formats, rather than reliance on platform-specific self-expanding archives such as DOS .exe files or Macintosh .hqx files.