SSC Archive Access Statistics

C F Baum, 23 October 2005

The Statistical Software Components (SSC) Archive, often termed the "Boston College" archive, is a RePEc repository containing user-written software for statistical analysis in a number of computer languages. The predominant language of SSC materials is Stata, and the archive contains Stata materials dating back to 1996. RePEc access to these materials is provided by IDEAS and EconPapers.

Every month, the SSC Archive maintainer posts the statistics from the prior month's activity to Statalist (usually within the first few days of the month). This document explains how those statistics are computed.

Materials set up for distribution via Stata's net install facility are organized into 'packages'. Each item that can be downloaded from the SSC Archive has a .pkg file, or manifest, listing its contents. But when Stata or a web browser accesses the archives, the web server providing access to the SSC Archive records every "hit" on an ado-file. ado-files are part of a package, which may contain a single ado- (and hlp-) file or may contain several or many indeed (e.g., the egenmore package). Many ado files are multiply authored, and the access statistics are computed to give each author credit for his or her work.

The web server logs are input to Stata and merged with a concordance file that lists each ado-file contained in a particular package. From the merged file, two reports are generated. The first lists each package|author combination and how many 'hits' it received. The 'package hits' are computed by taking ado-file 'hits' and dividing by the number of ado-files in that package. These numbers may be fractional, since Stata (or a human using a web browser) may not download all the ado-files associated with a package. For instance, Stata will only download the new content in the 'foo' package if you use ssc install foo, replace. This list, then, indicates which packages were most often downloaded (and presumably installed) by Stata users.

The second report aggregates (using Stata's collapse command) the first report by author, so that hits on the ado-files of any package authored or co-authored by that individual are counted.

