-------------------------------------------------------------------------------
help for makematrix
-------------------------------------------------------------------------------

Make a matrix of results from other commands

    Outline syntax:

        makematrix [matrix_name] ,
                 from(results_list) [production_options] [list_options] :
                 ["]command["] [varlist] ... [, options ]

    A matrix name matrix_name may be specified. If so, a matrix with that
    name will be produced and remain in memory.

    command must be specified together with whatever elements from [varlist]
    [if exp] [in range] [weight] [, options ] are appropriate.

    Strictly, command is the first token following the colon :.  Hence bind
    commands specified by two or more words in double quotes.  Thus if
    command were run mydo.do, specify it as "run mydo.do".


Description

    makematrix runs command repeatedly for a specified variable list
    (optionally, two variable lists) to produce a matrix of results. As
    usual, a matrix could be a vector. The matrix will be listed using matrix
    list, unless the list option is specified, in which case it will be
    listed using the list command.

    There are various modes of operation, which are best appreciated by
    studying the detailed examples discussed in the next section.


Remarks 

    First, some terminology. We call a Stata (statistical) command
    essentially univariate if it requires only one variable; it may repeat
    itself if supplied with two or more variables.  summarize is an
    essentially univariate command; it does work for two or more variables,
    by repeating its operation for those variables. We call such a command
    essentially bivariate if it requires only two variables, and may repeat
    itself otherwise.  correlate is an essentially bivariate command; it does
    work for three or more variables, by repeating its operation for pairs of
    those variables.  spearman is also an essentially bivariate command,
    although it does not in fact accept more than two variables. Finally, we
    call such a command essentially multivariate if it produces just one set
    of results even if supplied with three or more variables.  regress is an
    essentially multivariate command. From now on the word "essentially" is
    not used, but should be understood.

    The output of correlate given a varlist of two or more variables is a
    matrix of correlations for every pair of variables in varlist. How could
    we produce an equivalent directly for spearman? We need to find out that
    spearman leaves a correlation behind in r(rho):

    . makematrix, from(r(rho)) : spearman head trunk length displacement
        weight

    The result is displayed using matrix list and we will normally want to
    tidy up the presentation, say by

    . makematrix, from(r(rho)) format(%4.3f) : spearman head trunk length
        displacement weight

    However, let us leave these details of presentation on one side. In this
    case, given a bivariate command, and a varlist, and a single result from
    which to compile the matrix, makematrix takes each pair of variables from
    varlist, runs a bivariate command for that pair, and puts a single result
    in the cell defined by each pair of variables. So both rows and columns
    are specified by varlist.

    Alternatively, we might want different sets of variables on the rows and
    the columns, perhaps specifying a submatrix of the full matrix.  The
    option cols() can be used to specify variables to appear as columns. The
    variables in the varlist of command will then appear as rows.  Say we did
    a principal component analysis of five variables and followed with
    calculation of scores:

    . pca head trunk length displacement weight
    . score score1-score5
    . makematrix, from(r(rho)) cols(score?) : correlate head trunk length
        displacement weight

    Here the full correlation matrix of variables and scores, as would be
    produced by correlate, is 10 X 10, and the submatrix produced by
    makematrix is only 5 X 5.

    We can show two or more scalar results from each command run.  This is
    possible in various ways. A univariate command can be repeated, each time
    yielding two or more scalars:

    . makematrix, from(r(mean) r(sd) r(skewness)) : su head trunk length
        displacement weight, detail

    makematrix reasons in this way: The user wants three scalars, which I
    will show in three columns. So I must run the command specified in turn
    on each variable supplied, which I will show on the rows. So for each
    variable in varlist, makematrix runs a univariate command, and puts two
    or more scalars in the cells of each row.

    A bivariate command can be repeated, each time yielding two or more
    scalars:

    . makematrix, from(r(rho) r(p)) lhs(rep78-foreign) : spearman mpg

    makematrix reasons in this way: The user wants two scalars, which I will
    show in two columns. So I must run the command specified in turn on the
    variable supplied. The option lhs() is also specified, so that must be
    used to supply the other variable. Whenever lhs() is specified, it
    specifies the rows of the matrix.  That is, in this case, the rows show
    the results of spearman rep78 mpg ...  spearman foreign mpg. Notice how
    the variables specified in lhs() appear on the left-hand side of the
    varlist which spearman runs.  (lhs() also names the left-hand side of the
    matrix, but that is a happy accident.) This is also allowed:

    . makematrix, from(r(rho) r(p)) rhs(rep78-foreign) : spearman mpg

    In this case, the rows show the results of spearman mpg rep78 ...
    spearman mpg foreign, and are exactly the same as in the previous
    example. Again, whenever rhs() is specified, it specifies the rows of the
    matrix.  Notice how the variables specified in rhs() appear on the
    right-hand side of the varlist which spearman runs.  (By a small stretch,
    you can also think of it as naming the right-hand side of the matrix,
    given that we could repeat the row names on that side.) In other cases,
    which is used may well matter:

    . makematrix, from(e(r2) e(rmse) _b[_cons] _b[mpg]) lhs(rep78-foreign) :
        regress mpg

    . makematrix, from(e(r2) e(rmse) _b[_cons] _b) rhs(rep78-foreign) :
        regress mpg

    The first series of regressions predicts rep78 ... foreign in turn from
    mpg. The second series predicts mpg from rep78 ...  foreign in turn. The
    r-square results will be the same, but not the root mean square errors,
    or the intercepts or slopes.  Note that _b by itself has the
    interpretation of _b[row_variable].

    In fact lhs() and rhs() can be used to produce a series of multivariate
    results. Suppose we have weightsq = weight^2.

    . makematrix, from(e(r2) e(rmse)) lhs(mpg-trunk length-foreign) :
        regress weight weightsq

    This series predicts mpg ... foreign in turn from weight and weightsq.
    When either lhs() or rhs() is specified they define the varying rows,
    while the varlist supplied is fixed for each run of command.

    There is one more nuance to be explained. Say you want a table of sums
    for a set of variables. You might try

    . makematrix, from(r(sum)): su head trunk length displacement weight,
        meanonly

    However, makematrix cannot distinguish between this and a similar problem
    with a bivariate command, so it will attempt to run summarize on all
    distinct pairs of variables. This will succeed, except that what is left
    behind in r(sum) will be the sum of the second of each pair of variables.
    What you will prefer is a vector, and that is the option to specify:

    . makematrix, from(r(sum)) vector: su head trunk length displacement
        weight, meanonly


Options

    from(results_list) is required. The results_list may in particular
    contain names of e-class results containing scalars (such as e(rmse));
    names of r-class results containing scalars (such as r(rho)); estimates
    of intercepts, slopes or standard errors such as _b[_cons], _b[mpg] or
    _se[mpg]; or names of globals (such as S_1).  Do not prefix global names
    with $. _b or _se by itself has the interpretation of _b[row_variable] or
    _se[row_variable].  Expressions such as r(rho)^2 or log10(r(p)) are also
    allowed.  However, note that no spaces must occur within any individual
    expression and that the more complicated the expression you use, the more
    likely it is that it will not be acceptable as a matrix column name.

    production_options are

        cols(column_varlist) specifies a list of variables to appear on the
        columns of the matrix. (varlist then defines the rows only.}

        lhs(lhs_varlist) specifies a list of variables to appear on the
        left-hand side of the variable list supplied to command. These will
        appear on the rows of the matrix. (The list of results in from() then
        defines the columns of the matrix.)

        rhs(rhs_varlist) specifies a list of variables to appear on the
        right-hand side of the variable list supplied to command. These will
        appear on the rows of the matrix. (The list of results in from() then
        defines the columns of the matrix.)

        vector specifies that the results are to be compiled as a single
        vector. This option is necessary when, and only when, (1) there is a
        single result in from(); (2) there is no cols(), lhs() or rhs(); (3)
        command is a univariate command.  Without vector, makematrix would
        otherwise attempt to treat command as a bivariate command and carry
        out calculations for all pairs of variables.

        Only one of cols(), lhs(), rhs() and vector may be specified.

        listwise specifies that the results of cmd should be determined for
        as many observations as possible.  Note that as a consequence the
        number of observations used in each calculation may differ. By
        default casewise deletion is used to ensure consistency in
        observations selected.

    list_options control the presentation of the matrix.

        list specifies that the list command be used to present the matrix.

        label specifies that an attempt be made to show variable labels
        wherever variables specify the rows of the matrix. Unless the list
        option is also specified, variable labels longer than 32 characters
        will not be shown and periods which matrix list does not construe as
        time series operators will be suppressed.

        format(format) specifies the format to be used for columns with list.
        As an extension of standard format options, multiple formats may be
        specified, one for each column.  For example, format(%3.2f %4.3f
        %5.4f) specifies that the columns of the matrix have the specified
        formats.

        dp(# [ # [ ... ] ]) is an alternative to format() and specifies the
        number of decimal places to be shown with list. Note that, for
        example, dp(2 3 4) is equivalent to format(%3.2f %4.3f %5.4f).

        rightjustify specifies that row names are to be shown right-justified
        with list. The default is to present them left-justified.

        Other options may be specified that are options of list if the list
        option is specified or of matrix list otherwise.


Examples 

    . sysuse auto, clear

    . makematrix, from(r(rho)) : spearman head trunk length displacement
        weight
    . makematrix, from(r(rho)) format(%4.3f) : spearman head trunk length
        displacement weight

    . pca head trunk length displacement weight
    . score score1-score5
    . makematrix, from(r(rho)) cols(score?) : correlate head trunk length
        displacement weight
    . makematrix R, from(r(rho)) cols(score?) : correlate head trunk length
        displacement weight
    . matrix colnames R = "score 1" "score 2" "score 3" "score 4" "score 5"
    . matrix li R, format(%4.3f)

    . makematrix , from(r(rho) r(p)) label cols(price) : spearman mpg-foreign
    . makematrix , from(r(rho) r(p)) list label format(%4.3f %6.5f) sep(0)
        cols(price) : spearman mpg-foreign

    . makematrix, from(r(mean) r(sd) r(skewness)) : su head trunk length
        displacement weight, detail
    . makematrix, from(r(mean) r(sd) r(skewness)) list format(%2.1f %2.1f
        %4.3f) sep(0) : su head trunk length displacement weight, detail

    . makematrix, from(r(rho) r(p)) lhs(rep78-foreign) : spearman mpg
    . makematrix, from(r(rho) r(p)) rhs(rep78-foreign) : spearman mpg
    . makematrix, from(e(r2) e(rmse) _b[_cons] _b[mpg]) lhs(rep78-foreign) :
        regress mpg
    . makematrix, from(e(r2) e(rmse) _b[_cons] _b[mpg]) lhs(rep78-foreign)
        list dp(3 2 2 3) abb(9) sep(0) divider : regress mpg
    . makematrix, from(e(r2) e(rmse) _b[_cons] _b) rhs(rep78-foreign) :
        regress mpg
    . makematrix, from(e(r2) e(rmse) _b[_cons] _b) rhs(rep78-foreign) list
        dp(3 2 2 3) abb(9) sep(0) divider : regress mpg

    . gen weightsq = weight^2
    . makematrix, from(e(r2) e(rmse)) lhs(mpg-trunk length-foreign) : regress
        weight weightsq
    . makematrix, from(e(r2) e(rmse)) lhs(mpg-trunk length-foreign) list dp(3
        2) sep(0) divider : regress weight weightsq

    . makematrix, from(r(sum)) vector: su head trunk length displacement
        weight, meanonly


Author

    Nicholas J. Cox, Durham University, U.K.
    n.j.cox@durham.ac.uk


Acknowledgments

    Ken Higbee made valuable comments on this help file.  Eric Uslaner
    pointed towards a bug. Hervé Stolowy pointed towards another bug, and
    Alan Riley explained it. Lisa Gilmore alerted me to a problem with
    handling double quotes.


Also see

    On-line:  help for list, matrix list; statsby, tabstat