{smcl} {* *! version 0.1.0 14Mar2022}{...} {viewerdialog gstats_hdfe "dialog gstats_hdfe"}{...} {vieweralsosee "[R] gstats_hdfe" "mansection R gstats_hdfe"}{...} {viewerjumpto "Syntax" "gstats_hdfe##syntax"}{...} {viewerjumpto "Description" "gstats_hdfe##description"}{...} {viewerjumpto "Statistics" "gstats_hdfe##statistics"}{...} {title:Title} {p2colset 5 20 23 2}{...} {p2col :{cmd:gstats hdfe} {hline 2}} Absorb HDFE (residualize variables) {p_end} {p2colreset}{...} {pstd} {it:Important}: Please run {stata gtools, upgrade} to update {cmd:gtools} to the latest stable version. {pstd} {it:Warning}: {opt gstats hdfe} is in beta; see {help gstats hdfe##missing:missing features}. (To enable beta, define {cmd:global GTOOLS_BETA = 1}.) {marker syntax}{...} {title:Syntax} {p 8 17 2} {cmd:gstats hdfe} {varlist} {ifin} [{it:{help gstats hdfe##weight:weight}}] [ {cmd:,} {opth absorb(varlist)} {c -(}{opth gen(newvarlist)}{c |}{opt prefix(str)}{c |}{cmd:replace}{c )-} {it:{help gstats hdfe##table_options:options}} ] {pstd} If none of {cmd:gen()}, {cmd:prefix()}, or {cmd:replace} are specified then {it:target}{cmd:=}{it:source} syntax must be supplied instead of {varlist}: {p 8 17 2} {it:target_var}{cmd:=}{varname} [{it:target_var}{cmd:=}{varname} {it:...}] {pstd} {cmd:gstats hdfe} (alias {cmd:gstats residualize}) provides a fast way of absorbing high-dimensional fixed effects (HDFE). It saves the number of levels in each absorbed variable, accepts weights, and optionally takes {opt by()} as an argument (in this case ancillary information is not saved by default and must be accessed via {opt mata()}). Missing values in the source and absorb variables are skipped row-size (the latter can be optionally retained via {opt absorbmissing}). {synoptset 23 tabbed}{...} {marker table_options}{...} {synopthdr} {synoptline} {syntab :Specify Targets} {synopt:{opth pre:fix(str)}}Generate all variables with prefix (e.g. residualized {it:x} saved to {it:prefix_x}, etc). {p_end} {synopt:{opth gen:erate(newvarlist)}}List of targets; must specify one per source. {p_end} {synopt:{opt replace}}Replace variables as applicable. (If no targets are specified, this replaces the sources.) {p_end} {synopt:{opt wild:parse}}Allow rename-style syntax if {it:target}{cmd:=}{it:source} is specified (e.g. {it:x*}{cmd:=}{it:prefix_x*}). {p_end} {syntab :HDFE Options} {synopt:{opth by(varlist)}}Group by variables. {p_end} {synopt:{opt mata:save}[{cmd:(}{it:str}{cmd:)}]}Save {opt by()} info (and absorb info by group) in mata object (default name is {bf:GtoolsByLevels}) {p_end} {synopt:{opt absorbmi:ssing}}Treat missing absorb levels as a group instead of dropping them. {p_end} {synopt:{opth algorithm(str)}}Algorithm used to absorb HDFE: CG (conjugate gradient), MAP (alternating projections), SQUAREM (squared extrapolation), IT (Irons and Tuck). {p_end} {synopt:{opth maxiter(int)}}Maximum number of algorithm iterations (default 100,000). Pass {it:.} for unlimited iterations. {p_end} {synopt:{opth tol:erance(real)}}Convergence tolerance (default 1e-8). {p_end} {synopt:{opth trace:iter}}Trace algorithm iterations. {p_end} {synopt:{opth stan:dardize}}Standardize variables before algorithm. {p_end} {syntab:Gtools Options} {synopt :{opt compress}}Try to compress strL {cmd:by()} variables to str#. {p_end} {synopt :{opt forcestrl}}Skip binary {cmd:by()} variables check and force gtools to read strL {cmd:by()} variables. {p_end} {synopt :{opt v:erbose}}Print info during function execution. {p_end} {synopt :{opt bench}{it:[(int)]}}Benchmark various steps of the plugin. Optionally specify depth level. {p_end} {synopt :{opth hash:method(str)}}Hash method for {cmd:by()} variables (default, biject, or spooky). Intended for debugging. {p_end} {synopt :{opth oncollision(str)}}Collision handling (fallback or error). Intended for debugging. {p_end} {synoptline} {p2colreset}{...} {p 4 6 2} {marker weight}{...} {p 4 6 2} {opt aweight}s, {opt fweight}s, and {opt pweight}s are allowed (see {manhelp weight U:11.1.6 weight} for more on the way Stata uses weights). {marker description}{...} {title:Description} {pstd} {opt gstats hdfe} (alias {opt gstats residualize}) is designed as a utility to embed in programs that require absorbing high-dimensional fixed effects, optionally taking in weights. The number of non-missing observations and the number of levels in each absorb variable are returned (see {it:{help gstats hdfe##results:stored results}}). {pstd} Mainly as a side-effect of being a {cmd:gtools} program, {opt by()} is also allowed. In this case, the fixed effects are absorbed sepparately for each group defined by {opt by()}. Note in this case the number of non-missing observations and the number of absorb levels varies by group. This is {bf:NOT} saved by default. The user can optionally specify {opt mata:save}[{cmd:(}{it:str}{cmd:)}] to save information on the by levels, including the number of non-missing rows per level and the number of levels per absorb variable per level. {pstd} {opt mata:save}[{cmd:(}{it:str}{cmd:)}] by default is stored in {opt GtoolsByLevels} but the user may specify any name desired. Run {opt mata GtoolsByLevels.desc()} for details on the stored objects (also see {it:{help gstats hdfe##results:stored results}} below). {marker examples}{...} {title:Examples} {pstd} See the {browse "http://gtools.readthedocs.io/en/latest/usage/gstats_hdfe/index.html#examples":online documentation} for examples. {marker results}{...} {title:Stored results} {pstd} {cmd:gstats hdfe} stores the following in {cmd:r()}: {synoptset 15 tabbed}{...} {p2col 5 20 24 2: Macros}{p_end} {synopt:{cmd:r(algorithm)}} algorithm used for HDFE absorption{p_end} {p2colreset}{...} {synoptset 15 tabbed}{...} {p2col 5 20 24 2: Scalars}{p_end} {synopt:{cmd:r(N) }} number of non-missing observations {p_end} {synopt:{cmd:r(J) }} number of {opt by()} groups {p_end} {synopt:{cmd:r(minJ) }} largest {opt by()} group size {p_end} {synopt:{cmd:r(maxJ) }} smallest {opt by()} group size {p_end} {synopt:{cmd:r(iter) }} (without {opt by()}) iterations of absorption algorithm {p_end} {synopt:{cmd:r(feval)}} (without {opt by()}) function evaluations in absorption algorithm {p_end} {p2colreset}{...} {synoptset 15 tabbed}{...} {p2col 5 20 24 2: Matrices}{p_end} {synopt:{cmd:r(nabsorb)}} (without {opt by()}) vector with number of levels in each absorb variable{p_end} {p2colreset}{...} {pstd} With {opt mata:save}[{cmd:(}{it:str}{cmd:)}], the following data is stored in the mata object: string matrix nj non-missing observations in each -by- group string matrix njabsorb number of absorbed levels in each -by- group by each absorb variable real scalar anynum 1: any numeric by variables; 0: all string by variables real scalar anychar 1: any string by variables; 0: all numeric by variables string rowvector byvars by variable names real scalar kby number of by variables real scalar rowbytes number of bytes in one row of the internal by variable matrix real scalar J number of levels real matrix numx numeric by variables string matrix charx string by variables real scalar knum number of numeric by variables real scalar kchar number of string by variables real rowvector lens > 0: length of string by variables; <= 0: internal code for numeric variables real rowvector map map from index to numx and charx real rowvector charpos position of kth character variable string matrix printed formatted (printf-ed) variable levels (not with option -silent-) {marker missing}{...} {title:Missing Features} {pstd} Check whether it's mathematically OK to apply SQUAREM. In general it's meant for contractions but my understanding is that it can be applied to any monotonically convergent algorithm. {pstd} Improve convergence criterion. Current criterion may not be sensible. {marker author}{...} {title:Author} {pstd}Mauricio Caceres{p_end} {pstd}{browse "mailto:mauricio.caceres.bravo@gmail.com":mauricio.caceres.bravo@gmail.com }{p_end} {pstd}{browse "https://mcaceresb.github.io":mcaceresb.github.io}{p_end} {title:Website} {pstd}{cmd:gstats} is maintained as part of the {manhelp gtools R:gtools} project at {browse "https://github.com/mcaceresb/stata-gtools":github.com/mcaceresb/stata-gtools}{p_end} {marker acknowledgment}{...} {title:Acknowledgment} {pstd} {opt gtools} was largely inspired by Sergio Correia's {it:ftools}: {browse "https://github.com/sergiocorreia/ftools"}, and this specific function was inspired by Sergio Correia's {it:reghdfe}: {browse "https: //github.com/sergiocorreia/reghdfe"}. {p_end} {pstd} The OSX version of gtools was implemented with invaluable help from @fbelotti; see {browse "https://github.com/mcaceresb/stata-gtools/issues/11"}. {p_end} {marker references}{...} {title:References} {pstd} See {browse "http://gtools.readthedocs.io/en/latest/usage/gstats_hdfe/index.html#references":online documentation} for the list of references. {title:Also see} {pstd} help for {help gtools}