{smcl}
{hline}
{cmd:help groupfunction}{right:v2.2 28/Jan/2021}
{hline}
{title:Title}
{p2colset 5 24 26 2}{...}
{p2col :{cmd:groupfunction} {hline 1}} Replaces several basic collapse functions (mean, sum, variance, standard deviation, first, max, min). The command is several orders of magnitude faster than collapse when summarizing multiple vectors on larger datasets.}{p_end}
{p2colreset}{...}
{title:Syntax}
{p 8 23 2}
{opt groupfunction} [if] [in] [aw pw fw] {cmd:,}
{opt by(varlist)}
{opt mean(varlist)}
{opt sum(varlist)}
{opt rawsum(varlist)}
{opt first(varlist)}
{opt min(varlist)}
{opt max(varlist)}
{opt count(varlist)}
{opt sd(varlist)}
{opt variance(varlist)}
{opt gini(varlist)}
{opt theil(varlist)}
{opt xtile(varlist)}
{opt nq(int)}
{opt missing}
{opt norestore}
{title:Description}
{pstd}
{cmd:groupfunction} Replaces several collapse functions (mean, sum, variance, first, max, min). The command is several orders of magnitude faster than collapse
{title:Options}
{phang}
{opt by(varlist)} Grouping for reporting estimates.
{phang}
{opt xtile(varlist)} Coupled with nq(), it creates variable with percentiles and adds it to the by() option.
{phang}
{opt nq(int)} Option only works when xtile() is specified. It indicates the number of quantiles.
{phang}
{opt mean(varlist)} Calculates means of specified variables.
{phang}
{opt sum(varlist)} Calculates total sum of specified variables. If weights are specified it will give population expanded total.
{phang}
{opt rawsum(varlist)} Calculates total sum of specified variables. If weights are specified it will ignore weights.
{phang}
{opt first(varlist)} Provides first observation by groups specified in by option.
{phang}
{opt min(varlist)} Provides minimum value, by groups, of vectors specified.
{phang}
{opt max(varlist)} Provides maximum value, by groups, of vectors specified.
{phang}
{opt count(varlist)} Provides observation count, by groups, of vectors specified.
{phang}
{opt sd(varlist)} Calculates standard deviations of specified variables.
{phang}
{opt variance(varlist)} Calculates variance of specified variables.
{phang}
{opt gini(varlist)} Calculates Gini coefficient of specified variables.
{phang}
{opt theil(varlist)} Calculates Theil coefficient of specified variables.
{phang}
{opt missing} This option is only relevant for sum and rawsum. If an entire group in by() is missing for the sum/rawsum variable the output will be missing instead of zero.
{phang}
{opt norestore} Drops all non-relevant variables before calculations to improve memory management.
{phang}
{opt slow} Use this option if you run into memory issues, it will get values one by one.
{phang}
{opt merge} Requests that values are not to be collapsed, it instead merges the new vectors to the dataset in memory.
{title:Example}
{p 8 12}{stata "sysuse auto, clear" :. sysuse auto, clear}{p_end}
{p 8 12}{stata "groupfunction [aw=weight], mean(price) min(weight) by(foreign)" :. groupfunction [aw=weight], mean(price) min(weight) by(foreign)}{p_end}
{title:Example 2 (Time comparisons)}
clear all
set more off
set obs 300000
version 13
set seed 458267
gen regions = int(runiform()*20)
replace region = 1 if inrange(region,0,3)
replace region = 2 if inrange(region,4,5)
//Income per capita
forval z=1/200{
gen x_`z' = region*runiform()*4000 + rnormal()*200
}
forval z=1/300{
gen y`z' = runiform()*100 + (rnormal()*20)^2
}
gen weight = abs(rnormal())
//time collapse
preserve
timer on 1
collapse (mean) y* x_* [aw=w], by(region)
timer off 1
restore
//time fcollapse (Weights not supported)
preserve
timer on 2
fcollapse (mean) y* x_*, by(region)
timer off 2
restore
//time groupfunction
preserve
timer on 3
groupfunction [aw=weight], mean(y* x_*) by(region)
timer off 3
restore
timer list
{title:Authors}
{pstd}
Paul Corral{break}
The World Bank - Poverty and Equity Global Practice {break}
Washington, DC{break}
pcorralrodas@worldbank.org{p_end}
{pstd}
Minh C. Nguyen{break}
The World Bank - Poverty and Equity Global Practice {break}
Washington, DC{break}
pcorralrodas@worldbank.org{p_end}
{pstd}
Joao Pedro Azevedo{break}
The World Bank - Poverty and Equity Global Practice {break}
Washington, DC{break}
jazevedo@worldbank.org{p_end}
Raul Andrés Castañeda provided valuable suggestions for this command.