Title
mm_collapse() -- Make matrix of summary statistics by subgroups
Syntax
real matrix mm_collapse(X, w, id [, f, ...])
real matrix _mm_collapse(X, w, id [, f, ...])
where
X: real matrix containing data (rows are observations, columns are variables) w: real colvector containing weights or 1 id: real colvector containing subgroup ID variable f: pointer scalar containing address of the function to be used, i.e. f = &functionname(); the default function is mean() ...: up to 10 optional arguments to pass through to f
Description
mm_collapse() returns a matrix of summary statistics by subgroups. It is similar to Stata's collapse.
X provides the data. Rows are observations and columns are variables. Summary statistics are computed for each variable.
w specifies weights associated with the observations (rows) in X. Specify w as 1 to obtain unweighted results.
id specifies the subgroup identification numbers associated with the observations (rows) in X. Each distinct value in id defines a subgroup or panel for which to compute the summary statistics.
The default is to compute arithmetic means using the mean() function. Alternatively, specify f, where f is a pointer to a function, i.e. f = &functionname() (see [M-2] ftof). For example, specify &variance() to compute variances. f is assumed to return a real scalar and take a data column vector as first argument and weights as second argument.
_mm_collapse() is analogous to mm_collapse() but but assumes X, w, and id to be sorted by id. _mm_collapse() is faster and uses less memory than mm_collapse().
The matrix returned by mm_collapse() or _mm_collapse() contains the subgroup codes in the first column; the second and following columns, one for each variable in X, contain the computed statistics.
Remarks
Examples:
. sysuse auto (1978 Automobile Data)
. preserve . collapse (mean) price turn, by(rep78) . list +---------------------------+ | rep78 price turn | |---------------------------| 1. | 1 4,564.5 41 | 2. | 2 5,967.6 43.375 | 3. | 3 6,429.2 41.0667 | 4. | 4 6,071.5 38.5 | 5. | 5 5,913 35.6364 | |---------------------------| 6. | . 6,430.4 37.6 | +---------------------------+
. restore . mata: X = st_data(., ("price", "turn")) . mata: ID = st_data(., "rep78") . mata: mm_collapse(X, 1, ID) 1 2 3 +-------------------------------------------+ 1 | 1 4564.5 41 | 2 | 2 5967.625 43.375 | 3 | 3 6429.233333 41.06666667 | 4 | 4 6071.5 38.5 | 5 | 5 5913 35.63636364 | 6 | . 6430.4 37.6 | +-------------------------------------------+
. mata: w = st_data(., "weight") . mata: mm_collapse(X, w, ID) 1 2 3 +-------------------------------------------+ 1 | 1 4608.601613 41.11935484 | 2 | 2 6230.200895 43.59932911 | 3 | 3 7003.15439 41.80276852 | 4 | 4 6240.01355 39.81842818 | 5 | 5 6287.482192 35.74011742 | 6 | . 6736.482783 38.01827126 | +-------------------------------------------+
. mata: mm_collapse(X, w, ID, &mm_median()) 1 2 3 +----------------------+ 1 | 1 4934 42 | 2 | 2 5104 44 | 3 | 3 4816 42 | 4 | 4 5798 42 | 5 | 5 5719 36 | 6 | . 4453 38 | +----------------------+
. mata: mm_collapse(X, w, ID, &mm_quantile(), .25) 1 2 3 +----------------------+ 1 | 1 4195 40 | 2 | 2 4060 41 | 3 | 3 4482 40 | 4 | 4 4890 35 | 5 | 5 4425 35 | 6 | . 4424 35 | +----------------------+
Conformability
mm_collapse(X, w, id, f, ...), _mm_collapse(X, w, id, f, ...), X: n x k w: n x 1 or 1 x 1 id: n x 1 f: 1 x 1 ...: (depending on f) result: g x (1 + k), where g is the number of subgroups
Diagnostics
mm_collapse() and _mm_collapse() cannot be used with built-in functions (use wrappers).
mm_collapse() and _mm_collapse() return J(0, 1 + cols(X), .) if X and id are void.
Source code
mm_collapse.mata
Author
Ben Jann, ETH Zurich, jannb@ethz.ch
Also see