-------------------------------------------------------------------------------
help for stylerules
-------------------------------------------------------------------------------

Suggestions on Stata programming style

Introduction

Programming in Stata, like programming in any other computer language, is partly a matter of syntax - Stata has rules that must be obeyed - and partly a matter of style. Good style includes - but is not limited to - writing programs that are, above all else, clear. They are clear to the programmer, who may revisit them repeatedly, and they are clear to other programmers, who may wish to understand them, to debug them, to extend them, to speed them up, to imitate them or to plagiarise them.

People who program a great deal know this: setting rules for yourself and then obeying them ultimately yields better programs and saves time.

I suggest one overriding rule:

Set and obey programming style rules for yourself.

Moreover,

Obey each of the rules I suggest unless you can make a case that your own rule is as good or better.

Enough pious generalities. The devil in programming is in the details:

Presentation

Always include a comment containing the version number of your program, your name or initials, and the date the program was last modified above the program line. For example,

*! 1.0.0 Jane Roe 24jan2003 program myprog

(As said, this line is indeed just a comment line: it bears no relation to the Stata version command. Both should be used.)

Use sensible, intelligible names where possible, for programs, variables and macros.

Choose a name for your program that does not conflict with anything already existing. Suppose you are contemplating newname. If typing either which newname or which newname.class gives you a result, StataCorp are already using the name. Similarly, if ssc type newname.ado gives you a result, a program with your name is already on SSC. No result from either does not guarantee that the program is not in use elsewhere: findit newname may find such a program, although often it will also find much that is irrelevant to this point.

Brevity of names is also a virtue. However, no platform on which Stata is currently supported requires an 8-character limit. Tastes are in consequence slowly shifting: an intelligible long name for something used only occasionally would usually be considered preferable to something more cryptic.

Note that actual English words for program names are supposedly reserved for StataCorp.

Use the same names and abbreviations for command options that are in common use in official Stata's commands. Try to adopt the same conventions for options syntax: for example, allow a numlist where similar commands use a numlist. Implement sensible defaults wherever possible.

Use natural names for logical constants or variables. Thus local OK should be 1 if true and 0 if false, permitting idioms such as if `OK'. (But beware such logicals taking on missing values.)

Type expressions so they are readable. Some possible rules are

put spaces around each binary operator except ^ (gen z = x + y is clear, but x ^ 2 looks odder than x^2)

* and / allow different choices. num / den is arguably clearer than num/den, but readers might well prefer 2/3 to 2 / 3. Overall readability is paramount: compare for example hours + minutes / 60 + seconds / 3600 with hours + minutes/60 + seconds/3600

put a space after each comma in a function, etc.

parenthesise for readability

Note, however, that such a spaced-out style may make it difficult to fit expressions on one line, another desideratum.

Adopt a consistent style for flow control. Stata has if, while, foreach and forvalues structures that resemble those in many mainstream programming languages. Programmers in those languages often argue passionately about the best layout. Choose one such layout for yourself.

Here is one set of rules:

tab lines consistently after if or else or while or foreach or forvalues (the StataCorp convention is that a tab is 8 spaces and is greatly preferable if Stata is to show your programs properly)

do not put anything on a line after a brace, either an opening { or a closing }.

put a space before braces

align the i of if and the e of else, and align closing braces } with the i, or the e, or the w of while, or the f of foreach or forvalues:

if ... { ... ... }

else { ... ... }

while ... { ... ... }

foreach ... { ... ... }

In Stata 8 up, putting the opening and closing braces on lines above and below the body of each construct is compulsory (with the exceptions that the whole of an if construct or the whole of an else construct may legally be placed on one line). For earlier releases, it is strongly advised.

Write within 80 columns (72 are even better). The awkwardness of viewing (and understanding) long lines outweighs the awkwardness of splitting commands into two or more physical lines.

Use #delimit ; sparingly (Stata isn't C): commenting out end-of-lines is tidier where possible (although admittedly still ugly). The /// comment introduced in Stata 8 is most helpful here, and arguably more pleasing visually than /* */.

Use blank lines to separate distinct blocks of code.

Consider putting quietly on a block of statements, rather than on each or many of them. An alternative in some cases is to use capture.

You may express logical negation by either ! or ~. Choose one and stick with it. StataCorp have flipped from preferring ~ to preferring !.

Group tempname, tempvar and tempfile declarations.

Well-written programs don't need many comments. (Comment: We could certainly argue about that!)

Use appropriate display styles for messages and other output. All error messages (and no others) should be displayed as err. In addition, attach a return code to each error message: 198 (syntax error) will often be fine.

Helpful Stata features

Stata is very tolerant through version control of out-of-date features, but that does not mean that you should be. To maximise effectiveness and impact, and to minimise problems, write using the latest version of Stata and exploit its features.

Make yourself familiar with all the details of syntax. It can stop you re-inventing little wheels. Use wildcards for options to pass to other commands when appropriate.

Support if exp and in range where applicable. This is best done using marksample touse (or occasionally mark and markout). Have touse as a temporary variable if and only if marksample or a related command is used. See help on marksample.

_result() still works, but it is unnecessarily obscure compared with r(), e() or s() class results.

Make effective use of information available in e() and r(). If your program is to run in a context which implies results or estimates are available (say, after regress), make use of the stored information from the prior command.

Where appropriate, ensure that your command returns the information that it computes and displays, so that another user may employ it quietly and retrieve that information.

Ensure that programs that focus on time-series or panel data work with time-series operators if at all possible. In short, exploit tsset.

Define constants to machine precision. Thus use _pi or c(pi) rather than some approximation such as 3.14159, or use -digamma(1) for the Euler-Mascheroni constant gamma, rather than 0.57721. Cruder approximations may give results adequate for your purposes, but that doesn't mean that you should eschew wired-in features.

Familiarise yourself with the built-in material revealed by creturn list. Scrolling right to the end will show several features that may be useful to you.

SMCL is the standard way to format Stata output.

Respect for datasets

In general, make no change to the data unless that is the direct purpose of your program or that is explicitly requested by the user. For example,

your program should not destroy the data in memory unless that is essential for what it does

you should not create new permanent variables on the side unless notified or requested

do not use variables, matrices, scalars or global macros whose names might already be in use: there is absolutely no need to guess at names unlikely to occur, as temporary names can always be used (see help on tempvar, tempname, and tempfile)

do not change the type of a variable unless requested

do not even change the sort order of data: programs can easily be made sortpreserve.

Speed and efficiency

Test for fatal conditions as early as possible. Do no unnecessary work before checking that a vital condition has been satisfied.

Use summarize, meanonly for speed when its returned results are sufficient. Also consider whether a qui count is what fits the purpose better.

foreach and forvalues are cleaner and faster than most while loops, and much faster than for. Within programs, avoid for like the plague.

macro shift can be very slow when many variables are present. With 10,000 variables, for example, working all the way through a variable list with macro shift would require around 50 million internal macro renames. Using foreach or while without a macro shift is faster.

Avoid egen within programs: it is usually slower than a direct attack.

Try to avoid looping over observations, which is very slow. Fortunately, it can usually be avoided.

Avoid preserve if possible. preserve is attractive to the programmer but can be expensive in time for the user with large data files. Programmers should learn to master marksample.

Specify the type of temporary variables to minimise memory overhead. So if a byte variable can be used, specify generate byte `myvar' rather than let the default type be used, which would waste storage space.

Temporary variables will be automatically dropped at the end of a program, but also consider dropping them when they are no longer needed, to minimise memory overhead, and indeed to reduce the chances of your program stopping because there is no room to add more variables.

Avoid using a variable to hold a constant: a macro or a scalar is usually all that is needed.

Reminders

Remember to think about string variables as well as numeric variables. Does the task carried out by your program make sense for string variables? If so, will it work properly? If not, do you need to trap input of a string variable as an error, say through syntax?

Remember to think about making your program support by varlist: when natural. See byable.

Remember to think about weights and implement them when natural.

The job isn't finished until the .hlp is done. Use SMCL to set up your help files: old-style help files, while supported, are not documented, while help files not written in SMCL cannot take advantage of its paragraph mode, which allows lines to auto-wrap to fit the desired screen width of the user. For an introduction to the SMCL required to write a basic help file, see [U] 21.11.6 Writing on-line help or examplehelpfile.

Style in the large

Style in the large is difficult to prescribe, but here are some vague generalities.

Before writing a program, do check that it has not been written already! findit is the broadest search tool.

The best programs do just one thing well. There are exceptions, but what to a programmer is a Swiss army knife with a multitude of useful tools may look to many users like a confusingly complicated command.

Very large programs become increasingly difficult to understand, to build, and to maintain, roughly as some power of their length. Consider breaking such programs into subroutines and/or use a structure of command and subcommands.

The more general code is often both shorter and more robust.

Don't be afraid to realise that at some point you may be best advised to throw it all away and start again from scratch.

Note: Use the best tools

Find and use a text editor which you like and which supports programming directly. A good editor will, for example, will be smart about indenting and will allow you to search for matching braces. Some even show syntax highlighting. For much more detailed comments on various text editors for Stata users, see http://fmwww.bc.edu/repec/bocode/t/textEditors.html.

Author

Nicholas J. Cox, University of Durham, U.K. n.j.cox@durham.ac.uk

Acknowledgements

Many thanks to Kit Baum, Bill Gould, Alan Riley and Vince Wiggins for general benedictions and numerous specific contributions.

History

this version: 12 April 2005

See also

the Stata manuals... the Stata .ado code...