BCinfo Home  |    ATS Home  |   GStat Home  |   GStat FAQ

Running Stata on Unix / Linux

Graduate Statistical Assistant Program, FMRC, Boston College

Overview

The Stata statistical package for Unix or Linux comes in two versions: stata and xstata. If your Unix/Linux environment is character-mode, such as a telnet session, you must use stata. If you are using an XWindows environment, or a graphical interface such as KDE or Gnome, you may use xstata, which offers a graphical environment virtually identical to that available under Windows or Macintosh. All versions of Stata can run the same programs (.do-files) and access the same data files (.dta files); the only significant limitation of the character-mode environment is the inability to view graphs on the screen, or use the graphical do-file editor.

Any version of Stata may be used in either interactive or batch modes. We recommend that you make use of the batch mode for all but exploratory data analysis. In the batch mode, you prepare a program (.do-file) and run it. On a character-mode Stata, you cannot use the do-file editor, so you must use your favorite text editor on Unix or Linux: pico, emacs, vi, etc. Prepare the .do-file as a complete Stata program (e.g. example.do):

use /usr/local/stata/auto
desc
summ
gen logprice = log(price)
corr logprice weight length turn
save myauto, replace
exit,clear

and use the stata command to run it:

$ stata -b do example

By default, this command will create a logfile as example.log. If you would like to specify a logfile, include the line:

log using example1.log, replace

as the first line of your program (.do-file). If you specify log using example, replace, Stata will create a SMCL (Stata Markup Control Language) file, which contains markup characters that improve its appearance when processed by Stata, but otherwise are a nuisance. If you specify example1.log, or give the text option on the log command, a plain ASCII text log will be created. This log file may be viewed with the more Unix command, edited with your favorite editor, or printed with the Unix lpr command. If you generate a SMCL log file, you should use Stata to print it.

Default settings

You may have Stata commands that you would like to routinely execute. The file profile.do, if located in your current directory, will be executed automatically when you run Stata in either interactive or batch modes. You may be working with datasets that require a greater memory allocation than the default value (which differs among systems). For instance, say that you routinely work with 20 Mb .dta files. You might then place:

set mem 25m

in your profile.do, which would ensure that sufficient memory is available. Alternatively, you could place the set mem command at the top of your .do-files -- prior to the use statement -- or use the -k switch when executing Stata, e.g. $stata -k25000. The latter is not recommended unless you are using Stata interactively; otherwise it is much more efficient to code the required memory allocation into the .do-file, or specify it in profile.do. Do not ask for a great deal more memory than is needed, as it will encumber system resources even if you do not use the memory.

Another common choice for profile.do is a command to ensure that Stata's matrices can make use of their maximum size:

set matsize 800

Even if you are not using the matrix language explicitly, keep in mind that many Stata commands make use of matrices, and may fail if matsize is not sufficiently large. If you place this command in your .do-file, it must be given (like set mem) prior to accessing the data with use.

Dealing with long command lines

Stata commands must each be given on a single line. It is possible to change the default delimiter (with the delimit command), but that requires that every line in your program -- including comments -- must be delimited. Given that most commands are not that long, it is more efficient to deal with long lines when they appear. A long line may be continued using one form of the "comment" syntax, common to other programming languages:

regress price weight turn length /*
*/ displacement foreign

If we imagine that the regress command is too long to fit on one line, we may break it into two with this syntax. This is particularly useful for graph commands, which may become lengthy.

Graphics in a character-mode environment

Although Unix and Linux users in a character-mode version of Stata cannot view graphics on screen, it is altogether feasible to generate graphics files, save them, and move them to a desktop system where they may be viewed. Graphics may be printed from within character-mode Stata, but you may want to view the results first. To generate graphics, use:
graph price mpg, ti("Automobile price vs MPG") saving(scatter1,replace)

This command will generate a .gph binary file in your current working directory named scatter1.gph. Move it to the desktop system with your FTP client, using binary transfer mode. Then a desktop copy of Stata can access the file with File->Open, filtering on .gph (Stata Graph) files. There are also commands that permit you to alter aspects of the graphics produced: e.g. rotate all graphs to print as landscape, resize them, suppress the Stata logo, etc.; help translate for details.

Running lengthy jobs

If you are running a lengthy job from a telnet session, you should consider running the job using your Unix or Linux system's batch facility. Using the -b switch on Stata's command line is not batch execution from the operating system's point of view; it merely tells Stata that you are not giving commands interactively. Nor is the "background" mode in Unix or Linux a true batch facility; if a process is running in the background and you log off (or lose the telnet session), the background process will abort. You should instead use the Unix / Linux command batch, which is given as:

$ batch
stata -b do myjob
ctrl-D

Any number of valid Unix or Linux commands may be given; the ctrl-D tells the operating system that the batch job is complete. For instance, if you had three separate .do-files to be run, you could give three separate Stata commands in the batch job, and they would run in sequence. A batch job may be viewed on the process table with Unix / Linux prstat or top commands. When it is complete, it will send mail to the submitter.

Distributing Stata materials over the web

Unlike most statistical packages, Stata can access programs (.do-files), binary datasets (.dta files), graphics files (.gph files) and log files over the Web. To make files accessible to students or research collaborators, you merely need place them on your Web server pages, and notify them of the URLs. For instance, rather than typing the command file given above, you could just execute it over the Web from within Stata:

.do http://fmwww.bc.edu/ec-p/data/inst/example

You could also use the Stata type command to view that .do file on the screen, or the Stata copy command to make a copy of the .do file in your current directory. The same commands could be used to view .log files stored on your web server (e.g. if you wanted students to view the results of running a Stata job). Note that although .dta (binary data) files may be accessed via use or copy, they should never be referenced in a type command, nor should they be printed. The same rule applies for .gph (binary graphics) files.

Extending Stata

The worldwide Stata user community makes available a number of Stata packages that add functionality to the program. These packages are often referenced within Stata by the findit command, which may reference packages published in the STB -- Stata Technical Bulletin, now SJ -- The Stata Journal -- as well as packages available from the ssc command. These packages may be inspected from within interactive Stata, and installed using instructions provided by findit or ssc. Note that both of these commands are recent additions to version 7 of Stata; if you are using a shared Unix or Linux system, contact the system administrator to have them installed. If you are managing your Unix or Linux system, use the update query command to determine whether your system is up to date. If updates are available, they should be applied as soon as possible to take advantage of bug fixes and additional capabilities added to official Stata. Updates to Unix or Linux versions of Stata are even easier to install than those of the Windows or Macintosh desktop versions.


Last modified: 03 December 2001 cfb