{smcl} {* 12april2005}{...} {hline} help for {hi:stylerules} {hline} {title:Suggestions on Stata programming style} {title:Introduction} {p 4 4 2}Programming in Stata, like programming in any other computer language, is partly a matter of syntax {c -} Stata has rules that must be obeyed {c -} and partly a matter of style. Good style includes {c -} but is not limited to {c -} writing programs that are, above all else, clear. They are clear to the programmer, who may revisit them repeatedly, and they are clear to other programmers, who may wish to understand them, to debug them, to extend them, to speed them up, to imitate them or to plagiarise them. {p 4 4 2}People who program a great deal know this: setting rules for yourself and then obeying them ultimately yields better programs and saves time. {p 4 4 2}I suggest one overriding rule: {p 8 8 2}Set and obey programming style rules for yourself. {p 4 4 2}Moreover, {p 8 8 2}Obey each of the rules I suggest unless you can make a case that your own rule is as good or better. {p 4 4 2}Enough pious generalities. The devil in programming is in the details: {title:Presentation} {p 4 4 2}Always include a comment containing the version number of your program, your name or initials, and the date the program was last modified above the {cmd:program} line. For example, {p 8 8 2}{cmd:*! 1.0.0 Jane Roe 24jan2003}{break} {cmd:program myprog} {p 4 4 2}(As said, this line is indeed just a comment line: it bears no relation to the Stata {help version} command. Both should be used.) {p 4 4 2}Use sensible, intelligible names where possible, for programs, variables and macros. {p 4 4 2}Choose a name for your program that does not conflict with anything already existing. Suppose you are contemplating {it:newname}. If typing either {cmd:which} {it:newname} or {cmd:which} {it:newname}{cmd:.class} gives you a result, StataCorp are already using the name. Similarly, if {cmd:ssc type} {it:newname}{cmd:.ado} gives you a result, a program with your name is already on SSC. No result from either does not guarantee that the program is not in use elsewhere: {cmd:findit} {it:newname} may find such a program, although often it will also find much that is irrelevant to this point. {p 4 4 2}Brevity of names is also a virtue. However, no platform on which Stata is currently supported requires an 8-character limit. Tastes are in consequence slowly shifting: an intelligible long name for something used only occasionally would usually be considered preferable to something more cryptic. {p 4 4 2}Note that actual English words for program names are supposedly reserved for StataCorp. {p 4 4 2}Use the same names and abbreviations for command options that are in common use in official Stata's commands. Try to adopt the same conventions for options syntax: for example, allow a {help numlist} where similar commands use a {cmd:numlist}. Implement sensible defaults wherever possible. {p 4 4 2}Use natural names for logical constants or variables. Thus {cmd:local OK} should be 1 if true and 0 if false, permitting idioms such as {cmd:if `OK'}. (But beware such logicals taking on missing values.) {p 4 4 2}Type expressions so they are readable. Some possible rules are {p 8 8 2}put spaces around each binary operator except {cmd:^} ({cmd:gen z = x + y} is clear, but {cmd:x ^ 2} looks odder than {cmd:x^2}) {p 8 8 2}{cmd:*} and {cmd:/} allow different choices. {bind:{cmd:num / den}} is arguably clearer than {cmd:num/den}, but readers might well prefer {cmd:2/3} to {bind:{cmd:2 / 3}}. Overall readability is paramount: compare for example {bind:{cmd:hours + minutes / 60 + seconds / 3600}} with {bind:{cmd:hours + minutes/60 + seconds/3600}} {p 8 8 2}put a space after each comma in a function, etc. {p 8 8 2}parenthesise for readability {p 4 4 2}Note, however, that such a spaced-out style may make it difficult to fit expressions on one line, another desideratum. {p 4 4 2}Adopt a consistent style for flow control. Stata has {help if}, {help while}, {help foreach} and {help forvalues} structures that resemble those in many mainstream programming languages. Programmers in those languages often argue passionately about the best layout. Choose one such layout for yourself. {p 8 8 2}Here is one set of rules: {p 8 8 2}tab lines consistently after {cmd:if} or {cmd:else} or {cmd:while} or {cmd:foreach} or {cmd:forvalues} (the StataCorp convention is that a tab is 8 spaces and is greatly preferable if Stata is to show your programs properly) {p 8 8 2}do not put anything on a line after a brace, either an opening {c -(} or a closing {c )-}. {p 8 8 2}put a space before braces {p 8 8 2}align the {cmd:i} of {cmd:if} and the {cmd:e} of {cmd:else}, and align closing braces {cmd:{c )-}} with the {cmd:i}, or the {cmd:e}, or the {cmd:w} of {cmd:while}, or the {cmd:f} of {cmd:foreach} or {cmd:forvalues}: {p 8 16 2}{cmd:if} ... {cmd:{c -(}}{break} ...{break} ...{p_end} {p 8 8 2}{cmd:{c )-}} {p 8 16 2}{cmd:else {c -(}}{break} ...{break} ...{p_end} {p 8 8 2}{cmd:{c )-}} {p 8 16 2}{cmd:while} ... {cmd:{c -(}}{break} ...{break} ...{p_end} {p 8 8 2}{cmd:{c )-}} {p 8 16 2}{cmd:foreach} ... {cmd:{c -(}}{break} ...{break} ...{p_end} {p 8 8 2}{cmd:{c )-}} {p 8 8 2}In Stata 8 up, putting the opening and closing braces on lines above and below the body of each construct is compulsory (with the exceptions that the whole of an {cmd:if} construct or the whole of an {cmd:else} construct may legally be placed on one line). For earlier releases, it is strongly advised. {p 4 4 2}Write within 80 columns (72 are even better). The awkwardness of viewing (and understanding) long lines outweighs the awkwardness of splitting commands into two or more physical lines. {p 4 4 2}Use {cmd:#delimit ;} sparingly (Stata isn't C): commenting out end-of-lines is tidier where possible (although admittedly still ugly). The {cmd:///} comment introduced in Stata 8 is most helpful here, and arguably more pleasing visually than {cmd:/* */}. {p 4 4 2}Use blank lines to separate distinct blocks of code. {p 4 4 2}Consider putting {help quietly} on a block of statements, rather than on each or many of them. An alternative in some cases is to use {help capture}. {p 4 4 2}You may express logical negation by either {cmd:!} or {cmd:~}. Choose one and stick with it. StataCorp have flipped from preferring {cmd:~} to preferring {cmd:!}. {p 4 4 2}Group {help tempname}, {help tempvar} and {help tempfile} declarations. {p 4 4 2}Well-written programs don't need many comments. (Comment: We could certainly argue about that!) {p 4 4 2}Use appropriate {help display} styles for messages and other output. All error messages (and no others) should be {cmd:display}ed {cmd:as err}. In addition, attach a return code to each error message: 198 (syntax error) will often be fine. {title:Helpful Stata features} {p 4 4 2}Stata is very tolerant through version control of out-of-date features, but that does not mean that you should be. To maximise effectiveness and impact, and to minimise problems, write using the latest version of Stata and exploit its features. {p 4 4 2}Make yourself familiar with all the details of {help syntax}. It can stop you re-inventing little wheels. Use wildcards for options to pass to other commands when appropriate. {p 4 4 2}Support {cmd:if} {it:exp} and {cmd:in} {it:range} where applicable. This is best done using {cmd:marksample touse} (or occasionally {cmd:mark} and {cmd:markout}). Have {cmd:touse} as a temporary variable if and only if {cmd:marksample} or a related command is used. See help on {help marksample}. {p 4 4 2}{cmd:_result()} still works, but it is unnecessarily obscure compared with {cmd:r()}, {cmd:e()} or {cmd:s()} class results. {p 4 4 2}Make effective use of information available in {cmd:e()} and {cmd:r()}. If your program is to run in a context which implies results or estimates are available (say, after {cmd:regress}), make use of the stored information from the prior command. {p 4 4 2}Where appropriate, ensure that your command returns the information that it computes and displays, so that another user may employ it {help quietly} and retrieve that information. {p 4 4 2}Ensure that programs that focus on time-series or panel data work with time-series operators if at all possible. In short, exploit {help tsset}. {p 4 4 2}Define constants to machine precision. Thus use {cmd:_pi} or {cmd:c(pi)} rather than some approximation such as 3.14159, or use {cmd:-digamma(1)} for the Euler-Mascheroni constant gamma, rather than 0.57721. Cruder approximations may give results adequate for your purposes, but that doesn't mean that you should eschew wired-in features. {p 4 4 2}Familiarise yourself with the built-in material revealed by {cmd:creturn list}. Scrolling right to the end will show several features that may be useful to you. {p 4 4 2}SMCL is the standard way to format Stata output. {title:Respect for datasets} {p 4 4 2}In general, make no change to the data unless that is the direct purpose of your program or that is explicitly requested by the user. For example, {p 8 8 2}your program should not destroy the data in memory unless that is essential for what it does {p 8 8 2}you should not create new permanent variables on the side unless notified or requested {p 8 8 2}do not use variables, matrices, scalars or global macros whose names might already be in use: there is absolutely no need to guess at names unlikely to occur, as temporary names can always be used (see help on {help tempvar}, {help tempname}, and {help tempfile}) {p 8 8 2}do not change the type of a variable unless requested {p 8 8 2}do not even change the sort order of data: programs can easily be made {cmd:sortpreserve}. {title:Speed and efficiency} {p 4 4 2}Test for fatal conditions as early as possible. Do no unnecessary work before checking that a vital condition has been satisfied. {p 4 4 2}Use {cmd:summarize, meanonly} for speed when its returned results are sufficient. Also consider whether a {cmd:qui count} is what fits the purpose better. {p 4 4 2}{cmd:foreach} and {cmd:forvalues} are cleaner and faster than most {cmd:while} loops, and much faster than {cmd:for}. Within programs, avoid {cmd:for} like the plague. {p 4 4 2}{cmd:macro shift} can be very slow when many variables are present. With 10,000 variables, for example, working all the way through a variable list with {cmd:macro shift} would require around 50 million internal macro renames. Using {cmd:foreach} or {cmd:while} without a macro shift is faster. {p 4 4 2}Avoid {cmd:egen} within programs: it is usually slower than a direct attack. {p 4 4 2}Try to avoid looping over observations, which is very slow. Fortunately, it can usually be avoided. {p 4 4 2}Avoid {cmd:preserve} if possible. {cmd:preserve} is attractive to the programmer but can be expensive in time for the user with large data files. Programmers should learn to master {help marksample}. {p 4 4 2}Specify the type of temporary variables to minimise memory overhead. So if a {cmd:byte} variable can be used, specify {cmd:generate byte `myvar'} rather than let the default type be used, which would waste storage space. {p 4 4 2}Temporary variables will be automatically dropped at the end of a program, but also consider dropping them when they are no longer needed, to minimise memory overhead, and indeed to reduce the chances of your program stopping because there is no room to add more variables. {p 4 4 2}Avoid using a variable to hold a constant: a macro or a scalar is usually all that is needed. {title:Reminders} {p 4 4 2}Remember to think about string variables as well as numeric variables. Does the task carried out by your program make sense for string variables? If so, will it work properly? If not, do you need to trap input of a string variable as an error, say through {help syntax}? {p 4 4 2}Remember to think about making your program support {cmd:by} {it:varlist}{cmd::} when natural. See {help byable}. {p 4 4 2}Remember to think about weights and implement them when natural. {p 4 4 2}The job isn't finished until the {cmd:.hlp} is done. Use SMCL to set up your help files: old-style help files, while supported, are not documented, while help files not written in SMCL cannot take advantage of its paragraph mode, which allows lines to auto-wrap to fit the desired screen width of the user. For an introduction to the SMCL required to write a basic help file, see {hi:[U] 21.11.6 Writing on-line help} or {help examplehelpfile}. {title:Style in the large} {p 4 4 2}Style in the large is difficult to prescribe, but here are some vague generalities. {p 8 8 2}Before writing a program, do check that it has not been written already! {help findit} is the broadest search tool. {p 8 8 2}The best programs do just one thing well. There are exceptions, but what to a programmer is a Swiss army knife with a multitude of useful tools may look to many users like a confusingly complicated command. {p 8 8 2}Very large programs become increasingly difficult to understand, to build, and to maintain, roughly as some power of their length. Consider breaking such programs into subroutines and/or use a structure of command and subcommands. {p 8 8 2}The more general code is often both shorter and more robust. {p 8 8 2}Don't be afraid to realise that at some point you may be best advised to throw it all away and start again from scratch. {title:Note: Use the best tools} {p 4 4 2}Find and use a text editor which you like and which supports programming directly. A good editor will, for example, will be smart about indenting and will allow you to search for matching braces. Some even show syntax highlighting. For much more detailed comments on various text editors for Stata users, see {browse "http://fmwww.bc.edu/repec/bocode/t/textEditors.html":http://fmwww.bc.edu/repec/bocode/t/textEditors.html}. {title:Author} {p 4 4 2}Nicholas J. Cox, University of Durham, U.K.{break} n.j.cox@durham.ac.uk {title:Acknowledgements} {p 4 4 2}Many thanks to Kit Baum, Bill Gould, Alan Riley and Vince Wiggins for general benedictions and numerous specific contributions. {title:History} {p 4 4 2}this version: 12 April 2005{break} {* previous versions: 15 November 2004, 28 January 2003, 21 August 2002, 19 January 2001,} {* 20 January 2000, 30 November 1999, 29 October 1998, 22 September 1998} {title:See also} {p 4 4 2}the Stata manuals...{p_end} {p 4 4 2}the Stata {cmd:.ado} code...