{smcl}
{hline}
{hi:help xtbalance2}{right: v. 1.0 - 29. January 2021}
{hline}
{title:Title}

{p 4 4}{cmd:xtbalance2} - Create a balanced subsample from unbalanced panel data.{p_end}

{title:Syntax}

{p 4 13}{cmd:xtbalance2} [{varlist}] [if] [in] {cmd:,}
{cmdab:gen:erate(newvarname)}
[{cmdab:o:ptimisation(string)}
]{p_end}

{p 4 4}Data has to be {cmd:xtset} before using {cmd:xtbalance2}; see {help xtset}.
{it:varlist} may contain time-series operators, see {help tsvarlist}. 
If {it:varlist} is empty, {cmd:xtbalance2} uses all variables in the dataset.{p_end}

{title:Contents}

{p 4}{help xtbalance2##description:Description}{p_end}
{p 4}{help xtbalance2##options:Options}{p_end}
{p 4}{help xtbalance2##saved_vales:Saved Values}{p_end}
{p 4}{help xtbalance2##examples:Examples}{p_end}
{p 4}{help xtbalance2##about:About}{p_end}

{marker description}{title:Description}

{p 4 8}{cmd:xtbalance2} creates a balanced subsample from an unbalanced dataset.
{cmd:xtbalance2} does not drop any observations, it creates a variable indicating if 
an observations (or row) is part of the balanced subsample.{p_end}

{p 4 8}{cmd:xtbalance2} tries to maximise either the number of cross-sectional units (ids),
time periods or total number of observations. 
To do so it uses a simple algorithm which finds the largest subsquare in a matrix.
Theoretically it is possible to obtain more than one solution to the maximisation problem.
In this case {cmd:xtbalance2} creates an indicator variable for each solution.{p_end}

{marker options}{title:Options}

{p 4 8}{cmdab:gen:erate(}{it:newvarname}{cmd:)} Name of the indicator variable.{p_end}

{p 4 8}{cmdab:o:ptimisation(}{it:T|N|NT}{cmd:)} which dimension to be optimised/maximised. Default is length of the time series (T). 
{it:N} maximises the number of cross-section units and {it:NT} uses the maximum number of observations 
from {it:N} and {it:T}.{p_end}

{marker saved_vales}{title:Saved Values}

{col 4} Scalars
{col 8}{cmd: r(NumMax)}{col 27} Number of possible solutions for maximum number of N|T|NT. If larger than 1, potential problems can arise.

{marker examples}{title:Examples}

{p 8}{stata "use http://www.stata-journal.com/software/sj12-1/st0246/manu_prod, clear"}.{p_end}

{p 4 8}To maximise the observations with respect to the number of cross-sectional units (N_g):{p_end}

{p 8}{stata xtbalance2 , generate(balanceN) optimisation(N) }{p_end}

{p 4 8}{cmd:xtbalance2} creates a new variable called {cmd:balanceN} which takes the value 1 
if the observations is included in the new balanced panel and 0 otherwise.{p_end}

{p 4 8}We can do the same but maximise the observations with respect to the number of 
time periods:{p_end}

{p 8}{stata xtbalance2 , generate(balanceT) optimisation(T) }{p_end}

{p 4 8}To restrict optimisation to a specific set of variables:{p_end}

{p 8}{stata xtbalance2 lO lL lY, generate(balanceT2) optimisation(T) }{p_end}


{marker about}{title:Author}

{p 4}Jan Ditzen (Free University of Bozen-Bolzano){p_end}
{p 4}Email: {browse "mailto:jan.ditzen@unibz.it":jan.ditzen@unibz.it}{p_end}
{p 4}Web: {browse "www.jan.ditzen.net":www.jan.ditzen.net}{p_end}

{p 4 8}{cmd:xtbalance2} was created for the use in {help xtcse2}. 
All remaining errors are my own.{p_end}

{p 4 8}Please cite as follows:{break}
Ditzen, J. 2021. xtbalance2:  Create a balanced subsample from unbalanced panel data.
{p_end}

{p 4 8}The latest versions can be obtained via {stata "net from https://github.com/JanDitzen/xtbalance2"} 
.{p_end}

{marker ChangLog}{title:Changelog}
{p 4 8}This version: 1.0 - 29 January 2021{p_end}

{title:Also see}
{p 4 4}See also: {help xtset}{p_end}