{smcl} {* 3aug2004}{...} {hline} help for {hi:collapse2}} {hline} {title:Augmented version of {cmd:collapse} to make dataset of means, medians, etc.} {p 8 17 2}{cmd:collapse2} {it:clist} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {cmd:by(}{it:varlist}{cmd:)} {cmd:cw} {cmd:fast}] {p 4 4 2}where {it:clist} is either {p 8 17 2}[{cmd:(}{it:stat}{cmd:)}] {it:varlist} [ [{cmd:(}{it:stat}{cmd:)}] {it:...} ]{p_end} {p 8 17 2}[{cmd:(}{it:stat}{cmd:)}] {it:target_var}{cmd:=}{it:varname} [{it:target_var}{cmd:=}{it:varname} {it:...}] [ [{cmd:(}{it:stat}{cmd:)}] {it:...}] {p 4 4 2}or any combination of the {it:varlist} or {it:target_var} forms, and {it:stat} is one of {p 8 20 2}{cmd:first}{space 7}first{p_end} {p 8 20 2}{cmd:last}{space 8}last{p_end} {p 8 20 2}{cmd:firstnm}{space 5}first non-missing{p_end} {p 8 20 2}{cmd:lastnm}{space 6}last non-missing{p_end} {p 8 20 2}{cmd:sd}{space 10}standard deviations{p_end} {p 8 20 2}{cmd:sum}{space 9}sums{p_end} {p 8 20 2}{cmd:rawsum}{space 6}sums ignoring optionally specified weight{p_end} {p 8 20 2}{cmd:count}{space 7}number of nonmissing observations{p_end} {p 8 20 2}{cmd:max}{space 9}maximums{p_end} {p 8 20 2}{cmd:min}{space 9}minimums{p_end} {p 8 20 2}{cmd:median}{space 6}medians{p_end} {p 8 20 2}{cmd:p1}{space 10}1st percentile{p_end} {p 8 20 2}{cmd:p2}{space 10}2nd percentile{p_end} {p 8 20 2}{it:...}{space 9}3rd -- 49th percentiles{p_end} {p 8 20 2}{cmd:p50}{space 9}50th percentile (same as {cmd:median}){p_end} {p 8 20 2}{it:...}{space 9}51st -- 97th percentiles{p_end} {p 8 20 2}{cmd:p98}{space 9}98th percentile{p_end} {p 8 20 2}{cmd:p99}{space 9}99th percentile{p_end} {p 8 20 2}{cmd:iqr}{space 9}interquartile range{p_end} {p 4 4 2} If {it:stat} is not specified, {cmd:mean} is assumed. {p 4 4 2} {cmd:aweight}s, {cmd:fweight}s, {cmd:pweight}s, and {cmd:iweight}s are allowed; see help {help weights} and see note concerning weights below. Statistic {cmd:sd} (standard deviation) is not allowed with {cmd:pweight}s. {p 4 4 2} {it:varlist} and {it:varname} in {it:clist} may contain time-series operators; see help {help varlist}. {title:Description} {p 4 4 2} {cmd:collapse2} is an extension of Stata's built-in {cmd:collapse} command, which converts the data in memory into a dataset of means, sums, medians, etc. It differs only in offering four additional "aggregation" operators: {cmd:first}, {cmd:last}, {cmd:firstnm}, and {cmd:lastnm}, described above. These are useful, for example, if you want to create a cross-section data set from annual data, with {it:initial} population as a variable. You must first {cmd:tsset} your data to use these additional operators. {p 4 4 2}For most details on the command, see help {help collapse}. {title:Examples} {p 4 8 2}{cmd:. collapse2 (first) startpop=pop (last) endpop=pop, by(country)} {p 4 8 2}{cmd:. collapse2 (firstnm) startknownpop=pop (lastnm) endknownpop=pop, by(country)} {p 4 8 2}{cmd:* Extract first available observation for pop and year it is for.}{p_end} {p 4 8 2}{cmd:. gen popdatastart = year if pop < .}{p_end} {p 4 8 2}{cmd:. collapse2 (firstnm) popdatastart startpop=pop, by(country)}{p_end} {title:Author} {p 4}David Roodman{p_end} {p 4}Center for Global Development {p_end} {p 4}Washington, DC{p_end} {p 4}droodman@cgdev.org{p_end} {title:Also see} Manual: {hi:[R] collapse} {p 4 13 2} Online: help for {help collapse}, {help contract}, {help egen}, {help statsby}, {help summarize}, {help tabdisp}, {help table}