-------------------------------------------------------------------------------
help for mktab                                                      Nick Winter
-------------------------------------------------------------------------------

Print table of estimates in delimited or screen-presentation format

    Basic syntax:

     mktab (depvar1 varlist1) (depvar2 varlist2) ... (depvarN varlistN)
           [weight] [if exp] [in range] [, options ]


    Full syntax:

     mktab ([eqname1:] depvar1a [depvar1b ... =] varlist1 [, noconstant])
           ([eqname2:] depvar2a [depvar2b ...  =] varlist2 [, noconstant])
           ...
           ([eqnameN:] depvarNa [depvarNb ...  =] varlistN [, noconstant])
           [weight] [if exp] [in range] [, options ]


    Options:

           print(varlist) log(filename[, append | replace]) cmd(cmdname)
           aux(name[=label][,name[=label] ... ])
           est(name[=label][,name[=label] ... ]) flag(#=label[,#=label ... ])
           tag(string) delimit(string) noisily continue xlabel ylabel t1title
           t2title b1title b2title notitle nobtitle mif(conditions)
           miflabel(labels) plevels nose bfmt(%fmt) sfmt(%fmt) efmt(%fmt)
           pfmt(%fmt) onetail screen vspace(#) connect notags latex nocaption
           html command_options ]


Description

mktab estimates one or more single-equation models, and prints the results in
tab- (or other) delimited format for importation into a spreadsheet or
word-processor. This facilitates creating tables in "journal-article" format,
i.e. with standard errors in parentheses below parameter estimates,
significance flagged, and so on.

mktab estimates each model specified. It then produces a delimited table with
the dependent variables across the top, the RHS variables (and auxiliary
parameter and returned estimates results) down the left, and the coefficients
and standard errors in the cells. Each column corresponds to a single model;
each row to a single RHS variable.

mktab is similar to John Gallup's exemplary outreg. The primary differences are
that it estimates all the models and generates the table with a single command,
and that it does not require preserving and restoring the data. In addition, it
has an option to display output in space-delimited columns on the screen.

if, in and weights are supported.

This command has been tested with regress, probit, logit, oprobit, and ologit,
along with their survey counterparts.  It should work with any single-equation
estimator. Time series operators are not supported.


Specification of Models to Run

Models may be specified with the "new" Stata 6.0 multi-equation syntax.  Thus,
they can be entered individually, each surrounded by parentheses.  If several
models involve the same RHS variables, they can be specified together, with an
equal sign separating the LHS and RHS variables, as in
(mpg price weight = rep78 foreign).


Options

print(varlist) specifies the list (and ordering) of RHS variables to include
     in the output. The default is to include all variables from all models.
     Wildcards can be used to refer to variables in this list (and will
     expand only to include variables in the models).

log(filename) is the name of a log file for the results. The usual log
     options, replace and append are valid. If a log file is open and this
     option is specified, then the open log file will remain open for any
     output other than the comma-delimited table (see noisily, below) and
     will be re-opened after the table printed.

cmd(cmdname) is the name of the command to run the model. The default is 
     regress.

aux(string) is a comma-separated list of auxillary estimates to print
     immediately after varlist and in the listed order, e.g. aux(_cons),
     optionally with labels, e.g. aux(_cut1=Cut One,_cut2=Cut Two).  They
     need not exist for all the estimated models.

est(string) is a comma-separated list of returned estimates to print with
     optional labels (e.g. est(N,r2=R Squared)).

mif(conditions) specifies a set of if conditions that vary across the
     multiple equations. They are combined with the if condition, if any,
     with an &. Conditions should take the form:

          mif( [stub : ] cond1 [\ cond2 [...]])

     where stub is pre-pended to each of the cond's.  So, for example,
     mif(rep78== : 1 \ 2 \ 3 \ 4 \ 5 ) would apply if rep78==1 to the first
     equation estimates, if rep78==2 to the second, and so on.  The number of
     mif() conditions specified must match the number of equations. However,
     if there is exactly one equation specified, and multiple mif()
     conditions, then the single equation will be duplicated for each mif()
     condition.  This means that if you want to run the same model across
     multiple subgroups, you need only specify the equation once, as in the
     example below.

miflabels(labels) specifies a set of labels for the columns, to replace those
     generated by default by the mif() option. For example

          mktab (mpg = price weight length) , mif(foreign==1\foreign==0)
          miflabel("Foreign" "Domestic")

     would label the two columns of output "Foreign" and "Domestic", rather
     than "foreign==1" and "foreign==0"

flag(flaglist) specifies a comma-separated list of one or more significance
     (p) levels, specified as integers, and corresponding symbols. The
     default is flag(1=**,5=*,10=^), which labels p<0.01 with **; p<0.05 with
     *, and p<0.10 with ^. Flag values must be in ascending order. To
     suppress significance marking, specify flag(0=*).

connect indicates that the significance level flags (e.g. **) should be
     connected with the coefficients, rather than separated into their own
     output columns.

tag(string) is an identifying name that prints in the third column of output
     (e.g. tag(Model One)).

notags indicates that line numbers, line types, and tag information should
     not be included. This option is selected automatically when screen
     formatting is requested.

delimit(string) specifies a delimiter to be used to separate the columns. The
     default is to tab-delimit the output.

noisily indicates that the individual models should be displayed as they are
     run.  If a log file is open when mktab is executed, this output will be
     sent to that log file; the delimited table will be sent to the file
     indicated in the log() option.

continue indicates that the master row numbers should continue from the last
     invocation of mktab. This allows creation of a single log file with
     multiple tables, one after the other.

xlabel uses variable labels (if any) to label the RHS variables.

ylabel uses variable labels (if any) to label the LHS variables.

onetitle indicates that all the equations have the same LHS variable, and
     that the column header (i.e., variable name or label) should span all
     the columns, rather than being repeated for each.

t1title, t2title, b1title, and b2title allow up to two lines of titles at the
     top and bottom of the table.

notitle and nobtitle suppress printing of the default top and bottom title
     information. This is helpful in conjunction with the continue() option.

plevels indicates that p-levels should be displayed under the coefficients,
     in place of standard errors.

nose suppressed the display of standard errors (and p-levels)

bfmt sfmt efmt and pfmt indicate the display format to use for coefficients,
     standard errors, returned estimates, and p-values, respectively. The
     defaults are %4.3f for coefficients and standard errors, %3.2f for
     estimates, and %5.4 for p-values.

onetail indicates that one-tailed significance levels should be calculated.
     The default is two-tailed.

screen indicates that the table should be formatted for the screen, rather
     than delimited. This will produce a space-delimited table that will
     display appropriately in a fixed-pitch font. It is useful for reading
     model results during interactive use.

vspace indicates the width of the column for variable names, when results are
     formatted to the screen. The default is 20.

latex indicates that the table should be formatted for inclusion in a LaTeX
     file. See discussion of LaTeX, below.

nocaption indicates that the table caption (taken from t1title()) should be
     omitted from the LaTeX output.

html indicates that the table should be formatted as an HTML table.

command_options can include any options appropriate to the command being run
     (e.g. robust).


Output

mktab outputs the following columns:

1 row number:     consecutive numbering of the lines in the table as a whole,
                  which allow re-sorting into the correct order

2 line type:      0 for header/title rows
                  1 for coefficients
                  2 for standard errors
                  3 for estimates, and
                  4 for footer rows.
                  These allow re-sorting of the data for formatting

3 the tag:        Contents of the tag() option, to identify the table

4 variable names: The name of the variable for this row (blank for std error
                  rows)

5+ Alternating columns of coefficients for each model, with significance
                  symbols for each coefficient.  Rows alternate between
                  coefficients and their standard errors. Standard errors are
                  surrounded by parentheses.


LaTeX output

The latex option specifies that the output table should be formatted as a
LaTeX table (i.e., in a tabular environment).  The table is formatted with
coefficient and standard error lines decimal-aligned, and other lines
centered.  Standard errors are printed in a smaller font under the
coefficients.

The resulting table makes use of two LaTeX packages: threewaytables and
booktabs, so be sure to include \usepackage{threeparttable,booktabs} in your
LaTeX preamble.

For example, the following command produces the following LaTeX output, and
saves it in the file mylog.tex:

     . mktab (mpg price rep78 = gratio weight), log(mylog.tex) latex

     \begin{table}[ht]
     \begin{center}
     \begin{threeparttable}
     \caption{{\em }}
     \begin{tabular}{ l r@{}l r@{}l r@{}l }
     \toprule
     & \multicolumn{2}{c}{{\em mpg}}& \multicolumn{2}{c}{{\em price}}&
          \multicolumn{2}{c}{{\em rep78}}
     \\
     \cmidrule(lr){2-7}
     gear\_ratio&0.&099&1454.&284&0.&535
     \\
     &\raisebox{.7ex}[0pt]{\scriptsize (1.}&\raisebox{.7ex}[0pt]{\scriptsize
          365)}
     &\raisebox{.7ex}[0pt]{\scriptsize
          (978.}&\raisebox{.7ex}[0pt]{\scriptsize 092)}
     &\raisebox{.7ex}[0pt]{\scriptsize (0.}&\raisebox{.7ex}[0pt]{\scriptsize
          389)}
     \\
     weight&--0.&006$^{**}$&2.&692$^{**}$&--0.&000
     \\
     &\raisebox{.7ex}[0pt]{\scriptsize (0.}&\raisebox{.7ex}[0pt]{\scriptsize
          001)}
     &\raisebox{.7ex}[0pt]{\scriptsize (0.}&\raisebox{.7ex}[0pt]{\scriptsize
          574)}
     &\raisebox{.7ex}[0pt]{\scriptsize (0.}&\raisebox{.7ex}[0pt]{\scriptsize
          000)}
     \\
     \bottomrule
     \end{tabular}
     \begin{tablenotes}[flushleft]
     \item \hspace{-0.2em}$^{**}$ p$<$0.01; $^$ p$<$0.05;
          \raisebox{.7ex}[0pt]{\tiny $\wedge$} p$<$0.10 two tailed
     \end{tablenotes}
     \end{threeparttable}
     \end{center}
     \end{table}
     \clearpage

The following example LaTeX document would include this table:

     \documentclass[8pt]{extarticle}
     \usepackage{threeparttable,booktabs}

     \oddsidemargin 0.0in
     \evensidemargin 0.0in
     \textwidth 6.5in
     \topmargin 0.5in
     \textheight 9.0in

     \begin{document}
     \input{mylog.tex}
     \end{document}

In fact, this wrapper LaTeX file can be created automatically by mktab, if
you issue the following command:

     . mktab wrapper , log(mylog.tex)

This creates a file _mylog.tex, which contains the wrapper listed above.
This can be handy when fine-tuning a table or set of tables.


Examples

     . mktab (mpg price rep78 = gratio weight), log(mylog)

     This runs three regressions (with DVs mpg, price and rep78), each with
     RHS variables gratio and weight (and a constant). It prints the
     coefficients (and their standard errors) into a file called mylog.log.


     . mktab (vote90 vote92 = dem rep ideology) cmd(probit)
          aux(_cons=Intercept) est(N,ll=Log Likelihood,r2_p=Pseudo R2)
          tag(Table One) log(mylog, append) continue

     This runs two probit models (DVs vote90 and vote92), each with three
     dependent variables (dem, rep and ideology). The table includes
     coefficients and standard errors for the three RHS variables and the
     constant term (labeled Intercept), as well as estimates N, ll, and r2_p
     (which are labeled "N", "LnL", and "Pseudo R2", respectively.) It
     appends the table to mylog.log, and numbers the rows consecutively with
     the previous table.


     . mktab (rep78 mpg price) (rep78 mpg price weight), cmd(oprobit)
          f(.1=**,1=*) pr(5) log(mylog, replace) xlab ylab delimit(,)
          a(_cut1=Cut One,_cut2=Cut Two,_cut3=Cut Three,_cut4=Cut Four)

     This runs two ordered probit models of rep78, one with independant
     variables mpg and price, the other with mpg, price, and weight. It
     prints the results with five decimal places, uses the variable labels
     from the data set, flags coefficients p<0.001 with ** and p<0.01 with *,
     and delimits the output with commas. This produces the following output:

              1,0,,Estimates (using oprobit)
              2,1,,Variable,Repair Record,,Repair Record,
              3,1,, ,1978,,1978,
              4,2,,Mileage (mpg),0.11342,**,0.05978, 
              5,3,,,(0.02897),,(0.04091),
              6,2,,Price,0.00010, ,0.00014, 
              7,3,,,(0.00005),,(0.00005),
              8,2,,Weight (lbs.),--, ,-0.00055, 
              9,3,,, ,,(0.00031),
              10,2,,Cut One,0.96916, ,-1.66922, 
              11,3,, ,(0.82655),,(1.68018),
              12,2,,Cut Two,1.83700, ,-0.78752, 
              13,3,, ,(0.79926),,(1.66034),
              14,2,,Cut Three,3.22326,**,0.63996, 
              15,3,, ,(0.82574),,(1.65004),
              16,2,,Cut Four,4.18628,**,1.65129, 
              17,3,, ,(0.87507),,(1.65440),
              18,6,,** p<0.001, * p<0.01, two tailed
              19,7,,


     . mktab (rep78 mpg price) (rep78 mpg price weight), cmd(oprobit)
          f(.1=**,1=*) pr(5) log(mylog, replace) xlab ylab screen a(_cut1=Cut
          One,_cut2=Cut Two,_cut3=Cut Three,_cut4=Cut Four)

     This runs the same models, but formats the output for the screen.  This
     produces the following output:

              Variable               Repair      Repair   
                                     Record      Record   
                                      1978        1978    

              Mileage (mpg)           0.113**     0.060   
                                     (0.029)     (0.041)
              
              Price                   0.000       0.000   
                                     (0.000)     (0.000)

              Weight (lbs.)              --      -0.000   
                                                 (0.000)

              Cut One                 0.969      -1.669   
                                     (0.827)     (1.680)
              Cut Two                 1.837      -0.788   
                                     (0.799)     (1.660)
              Cut Three               3.223**     0.640   
                                     (0.826)     (1.650)
              Cut Four                4.186**     1.651   
                                     (0.875)     (1.654)

              ** p<0.001; * p<0.01; two tailed

        
Notes

This program is inspired by (and some programming lifted from) Christopher
Ferrell's esthold and estprt commands. Code for parsing the multiple
equation syntax was taken shamelessly from the reg3 command - thank you,
Vince Wiggins and Statacorp.

The tag, row number and row type columns can be used to sort the rows in
useful ways once inside the spreadsheet.  For example, sorting on row type,
one would then have blocks of coefficients, standard errors, and results
grouped together, possibly across a large number of tables (different tags).
So one can easily set formatting options for each type (block) of row,
and then resort by tag and row number to return to the orginal output order.  

If you set things up right, however, you should be able to take these results
directly into a word processor.

The significance calculations are based on e(df_r) containing degrees of 
freedom for t ratios; for models that do not generate e(df_r) (e.g. probit)
the z ratio is used.


Author

Nicholas Winter
Department of Political Science
Cornell University
nw53@cornell.edu