{smcl}
{* *! Version 1.1.0 by Francisco Perales 04-February-2013}{...}
{bf:help mundlak}
{hline}


{title:Title}

    {bf:mundlak} -  Estimates random-effects regressions adding group-means of independent variables to the model

{title:Syntax}

{p 8 12}{cmd:mundlak} {depvar} {indepvars} {ifin} [, {it:options}]


{it:options}		   description
{hline}
Main
 {cmdab:u:se}({varlist})      adds group-means of selected independent variables only
 {cmdab:perc:entage}{it:(#}) 	   sets the minimum percentage of total variance due to within-group variation required of an independent variable to be used
 {cmdab:nocomp:arison} 	   supresses the display of a comparison random-effects model with no added variables
 {cmdab:hyb:rid} 	   transforms the independent variables into group-mean deviations
 {cmdab:f:ull} 		   prints the full output for the estimated models
 {cmdab:st:ats}{it:(list})	   allows users to select the model statistics to be reported
 {cmdab:se} 	           asks for standard errors for the parameters on model variables to be reported
 {cmdab:t} 	           asks for p-values for the parameters on model variables to be reported
 {cmdab:p} 	           asks for t-values for the parameters on model variables to be reported
 {cmdab:k:eep} 	           asks for any variables created by the command to be kept in the dataset

{hline}


{title:Description}

{p 0 4}	{cmd:mundlak} estimates random-effects regression models ({cmd:xtreg, re}) adding group-means
 of variables in {indepvars} which vary within groups. This technique was proposed by Mundlak (1978)
 as a way to relax the assumption in the random-effects estimator that the observed variables are uncorrelated
 with the unobserved variables. Additionally, the degree of statistical significance of the estimated coefficients
 on the group means can be used to test whether such assumption holds for individual regressors. See also Chapter 10
 in Wooldridge (2010) and Chapter 11 in Greene (2011). The command {cmd:mundlak} requires the data to be {cmd:xtset}. If no
 variables vary within-groups, {cmd:mundlak} estimates the standard random-effects model with no additional
 variables and displays a warning message. The names of the added group-mean variables will begin with the prefix {cmd:mean__}
 followed by the original variable name. Note that the estimates from both the standard random-effects
 model and the Mundlak model are kept in Stata's background memory and can be accessed via {cmd:estimates dir}
 for further usage.

	Original random-effects model:		Y{it:ij} = A + B{it:1}*X{it:ij} + B{it:2}*Z{it:i} + v{it:ij}

	Mundlak model:				Y{it:ij} = A + B{it:1}*X{it:ij} + B{it:2}*Z{it:i} + B{it:3}*X_bar{it:i} + v{it:ij}


{title:Options}
	
{p 0 4}{cmdab:u:se}({varlist}) specifies the variables for which group-means will be added in the model. The
 default is to use all the variables within the provided list of independent variables which vary within groups,
 unless such variation is insufficient. The variables specified in this option do not need to be among those in
 {varlist}, although that would be most common. If the variables specified in this option do not vary within groups,
 {cmd:mundlak} will display an error message.

{p 0 4}{cmdab:perc:entage}{it:(#}) suppresses the inclusion in the model of group-means of variables for which within-group
 variance accounts for a percentage of the total variance lower than {it:#}. When {cmd:percentage}{it:(#})
 is not specified {cmd:mundlak} operates as if {it:#} was 0. However, note that when 0% of the total variance of a given
 variable is within-groups, the group-mean of such variable cannot be included in the regression due to collinearity.
 If {cmd:use} ({varlist}) is also specified, {cmd:mundlak} will evaluate the percentage of the total variance which is
 within-groups for the variables set in this option, and will only include their group-means in the Mundlak
 model if they satisfy the criteria in {cmd:percent}{it:(#}).

{p 0 4}{cmdab:nocomp:arison} prevents the display of results from the original random-effects model. By default,
 {cmd:mundlak} displays the results from both the original random-effects model and the Mundlak model which
 includes the additional independent variables.

{p 0 4}{cmdab:hyb:rid} transforms the original independent variables into group-mean deviations, in
 addition to adding their group-means as additional independent variables. In practice, when this option
 is used {cmd:mundlak} estimates a 'hybrid model' equivalent to that described in Chapter 2 of Allison (2009). This
 can be expressed as:
 
						Y{it:ij} = A + B{it:1}*(X{it:ij}-X_bar{it:i}) + B{it:2}*Z{it:i} + v{it:ij}
 
{p 0 4} The names for the added group-mean differenced variables will begin with the prefix {cmd:diff__} followed by the original variable name.
 
{p 0 4}{cmdab:f:ull} asks for the full regression output for both the original random-effects model and the
 Mundlak model to be displayed. When {cmd:full} is specified together with {cmd:nocomp} only the full output
 for the Mundlak model is displayed.
 
{p 0 4}{cmdab:st:ats} allows users to specify the model summary statistics to be reported. These can be any scalars from Stata's {cmd:xtreg, re} routine.

{p 0 4}{cmdab:se} asks for the standard errors for the parameters on model variables to be reported. Note that specifying the option {cmd:full} overcomes this.

{p 0 4}{cmdab:t} asks for the t-values for the parameters on model variables to be reported. Note that specifying the option {cmd:full} overcomes this.

{p 0 4}{cmdab:p} asks for the p-values for the parameters on model variables to be reported. Note that specifying the option {cmd:full} overcomes this.

{p 0 4}{cmdab:k:eep} asks for the new variables (i.e. group-means and group-mean deviations) to be kept in the dataset.


{title:Examples}

{p 4 8}{inp:. webuse nlswork, clear}{p_end}

{p 4 8}{inp:. xtset idcode year}{p_end}

{p 4 8}{inp:. mundlak ln_wage age south race}

{p 4 8}{inp:. mundlak ln_wage age south race, use(age)}

{p 4 8}{inp:. mundlak ln_wage age south race, percentage(45)}

{p 4 8}{inp:. mundlak ln_wage age south race, nocomparison}

{p 4 8}{inp:. mundlak ln_wage age south race, hybrid}

{p 4 8}{inp:. mundlak ln_wage age south race, full}

{p 4 8}{inp:. mundlak ln_wage age south race, stats(N N_g rho r2_o r2_w r2_b)}

{p 4 8}{inp:. mundlak ln_wage age south race, se t p}

{p 4 8}{inp:. mundlak ln_wage age south race, keep}

{p 4 8}{inp:. describe mean__*}


{title:Also see}

	Online: {manhelp xtreg R}


{title:References}
 
 
 Allison, P. D. (2009) {it:"Fixed-Effects Regression Models"} Thousand Oaks
 
 Greene, W. (2011) {it:"Econometric Analysis (7th edition)"} Prentice Hall
 
 Mundlak, Y. (1978) "On the Pooling of Time Series and Cross-section Data" {it:Econometrica}, 46: 69–85
 
 Wooldridge, J. M. (2010) {it:"Econometric Analysis of Cross Section and Panel Data (2nd edition)"} MIT Press
 
 
{title:Author}

    Francisco Perales
    School of Social Science
    The University of Queensland
    Brisbane
    QLD 4072
    Australia
    f.perales@uq.edu.au