{smcl} {* 18feb2009}{...} {cmd:help mata mm_collapse()} {hline} {title:Title} {pstd} {bf:mm_collapse() -- Make matrix of summary statistics by subgroups} {title:Syntax} {p 11 18 2} {it:real matrix} {cmd:mm_collapse(}{it:X}{cmd:,} {it:w}{cmd:,} {it:id} [{cmd:,} {it:f}{cmd:,} {it:...}]{cmd:)} {p 11 18 2} {it:real matrix} {cmd:_mm_collapse(}{it:X}{cmd:,} {it:w}{cmd:,} {it:id} [{cmd:,} {it:f}{cmd:,} {it:...}]{cmd:)} {p 4 4 2} where {p 12 18 2} {it:X}: {it:real matrix} containing data (rows are observations, columns are variables) {p_end} {p 12 18 2} {it:w}: {it:real colvector} containing weights or 1 {p_end} {p 11 18 2} {it:id}: {it:real colvector} containing subgroup ID variable {p_end} {p 12 18 2} {it:f}: {it:pointer scalar} containing address of the function to be used, i.e. {it:f} = {cmd:&}{it:functionname}{cmd:()}; the default function is {helpb mf_mean:mean()} {p_end} {p 10 18 2} {it:...}: up to 10 optional arguments to pass through to {it:f} {p_end} {title:Description} {pstd} {cmd:mm_collapse()} returns a matrix of summary statistics by subgroups. It is similar to Stata's {helpb collapse}. {pstd}{it:X} provides the data. Rows are observations and columns are variables. Summary statistics are computed for each variable. {pstd} {it:w} specifies weights associated with the observations (rows) in {it:X}. Specify {it:w} as 1 to obtain unweighted results. {pstd} {it:id} specifies the subgroup identification numbers associated with the observations (rows) in {it:X}. Each distinct value in {it:id} defines a subgroup or panel for which to compute the summary statistics. {pstd} The default is to compute arithmetic means using the {helpb mf_mean:mean()} function. Alternatively, specify {it:f}, where {it:f} is a pointer to a function, i.e. {bind:{it:f} = {cmd:&}{it:functionname}{cmd:()}} (see {helpb m2_ftof:[M-2] ftof}). For example, specify {cmd:&variance()} to compute variances. {it:f} is assumed to return a real scalar and take a data column vector as first argument and weights as second argument. {pstd}{cmd:_mm_collapse()} is analogous to {cmd:mm_collapse()} but but assumes {it:X}, {it:w}, and {it:id} to be sorted by {it:id}. {cmd:_mm_collapse()} is faster and uses less memory than {cmd:mm_collapse()}. {pstd}The matrix returned by {cmd:mm_collapse()} or {cmd:_mm_collapse()} contains the subgroup codes in the first column; the second and following columns, one for each variable in {it:X}, contain the computed statistics. {title:Remarks} {pstd}Examples: {com}. sysuse auto {txt}(1978 Automobile Data) {com}. preserve {txt} {com}. collapse (mean) price turn, by(rep78) {txt} {com}. list {txt} {c TLC}{hline 7}{c -}{hline 9}{c -}{hline 9}{c TRC} {c |} {res}rep78 price turn {txt}{c |} {c LT}{hline 7}{c -}{hline 9}{c -}{hline 9}{c RT} 1. {c |} {res} 1 4,564.5 41 {txt}{c |} 2. {c |} {res} 2 5,967.6 43.375 {txt}{c |} 3. {c |} {res} 3 6,429.2 41.0667 {txt}{c |} 4. {c |} {res} 4 6,071.5 38.5 {txt}{c |} 5. {c |} {res} 5 5,913 35.6364 {txt}{c |} {c LT}{hline 7}{c -}{hline 9}{c -}{hline 9}{c RT} 6. {c |} {res} . 6,430.4 37.6 {txt}{c |} {c BLC}{hline 7}{c -}{hline 9}{c -}{hline 9}{c BRC} {com}. restore {txt} {com}. mata: X = st_data(., ("price", "turn")) {res}{txt} {com}. mata: ID = st_data(., "rep78") {res}{txt} {com}. mata: mm_collapse(X, 1, ID) {res} {txt} 1 2 3 {c TLC}{hline 43}{c TRC} 1 {c |} {res} 1 4564.5 41{txt} {c |} 2 {c |} {res} 2 5967.625 43.375{txt} {c |} 3 {c |} {res} 3 6429.233333 41.06666667{txt} {c |} 4 {c |} {res} 4 6071.5 38.5{txt} {c |} 5 {c |} {res} 5 5913 35.63636364{txt} {c |} 6 {c |} {res} . 6430.4 37.6{txt} {c |} {c BLC}{hline 43}{c BRC} {com}. mata: w = st_data(., "weight") {res}{txt} {com}. mata: mm_collapse(X, w, ID) {res} {txt} 1 2 3 {c TLC}{hline 43}{c TRC} 1 {c |} {res} 1 4608.601613 41.11935484{txt} {c |} 2 {c |} {res} 2 6230.200895 43.59932911{txt} {c |} 3 {c |} {res} 3 7003.15439 41.80276852{txt} {c |} 4 {c |} {res} 4 6240.01355 39.81842818{txt} {c |} 5 {c |} {res} 5 6287.482192 35.74011742{txt} {c |} 6 {c |} {res} . 6736.482783 38.01827126{txt} {c |} {c BLC}{hline 43}{c BRC} {com}. mata: mm_collapse(X, w, ID, &mm_median()) {res} {txt} 1 2 3 {c TLC}{hline 22}{c TRC} 1 {c |} {res} 1 4934 42{txt} {c |} 2 {c |} {res} 2 5104 44{txt} {c |} 3 {c |} {res} 3 4816 42{txt} {c |} 4 {c |} {res} 4 5798 42{txt} {c |} 5 {c |} {res} 5 5719 36{txt} {c |} 6 {c |} {res} . 4453 38{txt} {c |} {c BLC}{hline 22}{c BRC} {com}. mata: mm_collapse(X, w, ID, &mm_quantile(), .25) {res} {txt} 1 2 3 {c TLC}{hline 22}{c TRC} 1 {c |} {res} 1 4195 40{txt} {c |} 2 {c |} {res} 2 4060 41{txt} {c |} 3 {c |} {res} 3 4482 40{txt} {c |} 4 {c |} {res} 4 4890 35{txt} {c |} 5 {c |} {res} 5 4425 35{txt} {c |} 6 {c |} {res} . 4424 35{txt} {c |} {c BLC}{hline 22}{c BRC} {txt} {title:Conformability} {pstd} {space 1}{cmd:mm_collapse(}{it:X}{cmd:,} {it:w}{cmd:,} {it:id}{cmd:,} {it:f}{cmd:,} {it:...}{cmd:)}, {p_end} {pstd} {cmd:_mm_collapse(}{it:X}{cmd:,} {it:w}{cmd:,} {it:id}{cmd:,} {it:f}{cmd:,} {it:...}{cmd:)}, {p_end} {it:X}: {it:n x k} {it:w}: {it:n x} 1 or 1 {it:x} 1 {it:id}: {it:n x} 1 {it:f}: 1 {it:x} 1 {it:...}: (depending on {it:f}) {it:result}: {it:g x }(1 + {it:k}), where {it:g} is the number of subgroups {title:Diagnostics} {pstd}{cmd:mm_collapse()} and {cmd:_mm_collapse()} cannot be used with built-in functions (use wrappers). {pstd}{cmd:mm_collapse()} and {cmd:_mm_collapse()} return {cmd:J(0, 1 + cols(}{it:X}{cmd:), .)} if {it:X} and {it:id} are void. {title:Source code} {pstd} {help moremata_source##mm_collapse:mm_collapse.mata} {title:Author} {pstd} Ben Jann, University of Bern, jann@soz.unibe.ch {title:Also see} {psee} Online: help for {bf:{help collapse}}, {bf:{help mf_panelsetup:[M-5] panelsetup()}}, {bf:{help moremata}}