Overview |
Position Description |
Program Objectives |
Provide the specialized computing and communications support required to make research a central element in the mission for Boston College and its goals in the next decadeas one of its essential Technology Strategies. Those of us that are well acquainted with BC's support for research computing--especially statistical computing--recognize that past initiatives such as the Strategic Planning process have been generally successful in providing the 'bricks and mortar' for such a strategy. The creation of the campus network, upgrading of desktop workstations and shared systems, and acquisition of software and databases has been responsive to faculty and graduate student needs. Overall, however, Boston College faces a sizable and costly challenge in meeting the UAPC's quoted goal. Support constitutes a major missing element.
We have designed the GSA program in order to address issues of support for statistical computing. This includes many of the computing needs of researchers in the social sciences, and in the professional schools of management, nursing, education, and social work. These considerations likely address some research computing needs of science and humanities scholars as well. Identifiable statistical computing support needs entail several key elements:
Few researchers may be concerned with all four of these elements, but all four make up essential components of any research university's computing environment. A selection of web pages from several research universities' support units compiled by the Academic Technology Committee highlight the breadth and depth of research-oriented services offered at those institutions. Of course, additional infrastructure is required to provide appropriate intranet and Internet connectivity, adequate computing power on the researcher's desktop, and reliable service for communications and computing hardware and software, but to focus on the missing element in this discussion, we consider the four elements listed above and the degree of support currently available for these elements at Boston College.
The Associate Vice President for Research has initiated the GSA program in order to improve the breadth and depth of support along these four dimensions, and to elicit a clear statement of infrastructure needs elsewhere in the University to fully meet these needs at the level envisioned by the UAPC's stated goals.
The professional support cadre at A&RS (ex-IPS) has traditionally included one or more individuals with significant knowledge and expertise in the use of statistical packages. The needed expertise goes far beyond the introductory material provided in a number of quantitative methods courses. Faculty and graduate students can generally use packages effectively on that level. To an increasing degree, though, the level of sophistication of quantitative analysis requires the use of far more complex techniques in data analysis. Packages may or may not support those techniques, or may require that an external "module" (procedure, function) be used to provide this functionality. To an increasing degree, statistical packages are extensible, and libraries of user-developed "modules" may provide the functionality that a researcher seeks. Empirical researchers often require significant assistance to handle these challenges, going beyond the time and general skills available from A&RS consultants. A GSA in a particular area should be able to develop a discipline-specific understanding of the software available to meet these specific needs, and assist researchers with its use. The challenge of transforming data for use in a specific program's environment often is a significant hurdle--one that should be eased by widespread use of high-level software tools such as Stat/Transfer. GSAs will be responsible for the provision of web-based documentation covering statistical package and data transformation issues.
An allied set of issues relate to the support for large datasets. Here much of the difficulty is logistical: how does a researcher deal with the contents of a CDROM, or magnetic tape? The logistics of evaluating the format of a data set on tape, finding appropriate space for its translation and relocation to disk, and reading the contents into a statistical package have always been daunting tasks, and many researchers have been dissuaded from even attempting to deal with these issues.
One might expect that these difficulties would subside with the transition of many data sets to CD-ROM and/or Internet-accessible formats, but in many ways this has merely raised a separate set of problems. Many CDROMs now utilize customized retrieval programs, often geared toward the retrieval and display of small amounts of information. There are often sizable challenges related to extracting an entire dataset from such a program. By the same token, the accessibility of many datasets on the Internet often provides the researcher with an overwhelming volume of information, outstripping the storage capacity (and perhaps processing power) of a desktop workstation. Issues of security also arise: is the desktop workstation regularly backed up, so that a researcher's work may be recovered after a disk failure? In many of these cases, the use of a shared workstation--increasingly, a UNIX system--may be much more appropriate, particularly if several researchers and research assistants are collaborating in the analysis of a particular dataset. The GSA should have an understanding of the pros and cons of working with a large dataset in a desktop environment versus a shared environment, and assist the researcher with acquiring an account and elementary knowledge of the UNIX environment. BCIT has announced that the Alpha ("VAX") will be removed in June 2001. Researchers should be moving their use of shared computing resources to UNIX, and GSAs will assist in that transition, as well as the dataset translation issues mentioned under point 1 above.
Large dataset usage also includes interaction with datasets available to the University community such as DRI Basic Economics, International Financial Statistics, COMPUSTAT, Panel Study of Income Dynamics (PSID), Survey of Consumer Finances (SCF), and Health and Retirement Survey (HRS). In some cases, researchers may be able to access these data in more than one manner, and the GSA should be familiar with those methods and their ease of use. With respect to the large survey datasets such as PSID, SCF and HRS, GSAs will try to identify the user community and their specific needs for these resources, with a goal of providing a single publicized source or Web address for the raw datafiles, enhancing access to those resources. GSAs will be responsible for the provision of Web-based documentation (such as FAQs) dealing with issues of large dataset usage.
The University has acquired site licenses for one of the leading packages (Mathematica) and has acquired copies of other packages for UNIX systems in the FMRC, Economics and CSOM Finance. Some of these packages are available and most conveniently used in a desktop environment. For some tasks where processing power outstrips the capabilities of a desktop workstation, use in a UNIX environment is more appropriate. This threshold may also be relevant for users of traditional programming languages, such as FORTRAN, C, or C++. The use of specialized packages or programming languages in a UNIX environment also requires end users to be familiar with various aspects of systems usage that go beyond the usual desktop workstation issues. GSAs are expected to gain familiarity with both the packages most germane to their clientele and, where appropriate, the potential for their use in the UNIX environment.
A major challenge for the development of a research computing environment is the provision of "metadata"--information about the availability, coverage and accessibility of datasets. Database management systems (DBMS) may provide significant capabilities in this direction, but require considerable design and specification of information databases. DBMS may also provide significant functionality for researchers for the organization and administration of their data--particularly of datasets used by multiple researchers, or among collaborators in different institutions. GSAs will work with Library professionals on the development of metadata structures for datasets commonly accessed by their clients, with the goal of making use of the Library's new public catalog system to provide not only references to datasets but greater detail on their contents and information needed for their use. GSAs will also provide feedback on ongoing efforts to provide web front ends to relational DBMS housing widely used datasets, assisting in the refinement of their user interface design.
In considering each of these four areas of strategic importance to researchers' statistical computing, a common thread emerges. A higher level of support is essential to meet each challenge, and some support elements must be locally-based, responsive to clients' needs. The effectiveness of the limited central resources available within Information Technology can be meaningfully leveraged by providing first-tier support in the field. The GSA program is designed to do just that, meeting some fraction of clients' needs on site, and providing referrals to A&RS consultants, Technology Consultants, or other support staff where appropriate.