{smcl} {* *! version 1.0.0 08aug2022}{...} {cmd:help rrp} {hline} {title:Title} {p 8 20 2} {hi:rrp} {hline 2} Rescaled Regression Prediction (RRP) using two samples{p_end} {title:Syntax} {p 8 17 2} {cmd:rrp} {indepvars} {ifin} {weight}{cmd:,} {cmdab:impute(}{newvar}{cmd:)} {cmdab:proxies(}{varlist}{cmd:)} {cmdab:first(}{it:{help estimates_store:model}}{cmd:)} {cmdab:partialrsq()} {cmdab:r:obust} {cmdab:cl:uster(}clustvar{cmd:)} {title:Description} {pstd}{cmd:rrp} implements a Rescaled Regression Prediction (RRP) using two samples in two steps. First it creates a new variable, by imputing the dependent variable in the current sample, using the stored first-stage regression, fitted in the sample that contains the dependent variable and the proxies. The samples can be in different datasets or can be appended, indexed by a sample identifier. The command requires the proxy variables in the two samples (first-stage regression and in {hi:proxies()}) to have the same name (order does not matter). The user needs to correctly input the partial R-squared (see example). The command returns the results of the second-stage regression and creates the new imputed variable. {title:Options} {phang} {cmdab:impute(}{newvar}{cmd:)} is used to select the name of the new imputed variable {p_end} {phang} {cmdab:proxies(}{varlist}{cmd:)} specifies the variables, common at both datasets, used as proxies for imputing the dependent variable. {p_end} {phang} {cmdab:first(}{it:{help estimates_store:model}}{cmd:)} specifies the first-stage regression in the dataset that contains the dependent variable. {p_end} {phang} {cmdab:partialrsq(}{cmd:)} contains the partial R-squared. It can be a value or a stored scalar. {p_end} {phang} {cmdab:r:obust} is used to calculate standard errors that are robust to the presence of arbitrary heteroskedasticity. {p_end} {phang} {cmdab:cl:uster(}clustvar{cmd:)} is used to calculate standard errors that are robust to both arbitrary heteroskedasticity and allow intra-group correlation. {p_end} {title:Example} Design {phang2}{cmd:. drop _all}{p_end} {phang2}{cmd:. matrix C = (2, .5 \ .5, 2)}{p_end} {phang2}{cmd:. mat A = cholesky(C)}{p_end} {phang2}{cmd:. set obs 100}{p_end} {phang2}{cmd:. gen sample = 1}{p_end} {phang2}{cmd:. gen c1= invnorm(uniform())}{p_end} {phang2}{cmd:. gen c2= invnorm(uniform())}{p_end} {phang2}{cmd:. mat a1 = A[1,1...]}{p_end} {phang2}{cmd:. matrix score x = a1 }{p_end} {phang2}{cmd:. matrix a2 = A[2,1...]}{p_end} {phang2}{cmd:. matrix score w = a2 }{p_end} {phang2}{cmd:. gen y = 1 + 1*x + .5*w + rnormal(0,4)}{p_end} {phang2}{cmd:. gen zA = 1 + 0.5*y - .0*w + rnormal(0,2)}{p_end} {phang2}{cmd:. gen zB = 1 + 0.3*y - .0*w + rnormal(0,2)}{p_end} {phang2}{cmd:. set obs 300}{p_end} {phang2}{cmd:. replace sample = 2 in 101/300}{p_end} {phang2}{cmd:. replace c1= invnorm(uniform()) in 101/300}{p_end} {phang2}{cmd:. replace c2= invnorm(uniform()) in 101/300}{p_end} {phang2}{cmd:. mat a1 = A[1,1...]}{p_end} {phang2}{cmd:. matrix score x = a1 in 101/300, replace }{p_end} {phang2}{cmd:. matrix a2 = A[2,1...]}{p_end} {phang2}{cmd:. matrix score w = a2 in 101/300, replace }{p_end} {phang2}{cmd:. replace y = 1 + 1*x + .5*w + rnormal(0,4) in 101/300}{p_end} {phang2}{cmd:. replace zA = 1 + 0.5*y - .0*w + rnormal(0,2) in 101/300}{p_end} {phang2}{cmd:. replace zB = 1 + 0.3*y - .0*w + rnormal(0,2) in 101/300}{p_end} {phang2}{cmd:. drop c1 c2 }{p_end} {phang2}{cmd:. replace x=. if sample==1}{p_end} {phang2}{cmd:. replace y=. if sample==2}{p_end} First-stage regression and partial R-squared calculation {phang2}{cmd:. reg y w if sample==1}{p_end} {phang2}{cmd:. scalar R2_A = e(r2)}{p_end} {phang2}{cmd:. reg y zA zB w if sample==1}{p_end} {phang2}{cmd:. est store stage1}{p_end} {phang2}{cmd:. scalar R2_B = e(r2)}{p_end} {phang2}{cmd:. scalar Rsq = (R2_B-R2_A)/(1-R2_A)}{p_end} Imputation and second-stage estimation {phang2}{cmd:. rrp x w if sample==2, impute(yhat) proxies(zA zB) partialrsq(Rsq) first(stage1)}{p_end} {title:Saved results} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Scalars}{p_end} {synopt:{cmd:e(N)}}number of observations{p_end} {synopt:{cmd:e(N_clust)}}number of clusters{p_end} {synopt:{cmd:e(df_m)}}model degrees of freedom{p_end} {synopt:{cmd:e(df_r)}}residual degrees of freedom{p_end} {synopt:{cmd:e(F)}}F statistic{p_end} {synopt:{cmd:e(r2)}}R-squared{p_end} {synopt:{cmd:e(rmse)}}root mean squared error{p_end} {synopt:{cmd:e(rank)}}rank of e(V){p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Macros}{p_end} {synopt:{cmd:e(cmd)}}{cmd:rrp}{p_end} {synopt:{cmd:e(cmdline)}}command as typed{p_end} {synopt:{cmd:e(depvar)}}name of imputed dependent variable{p_end} {synopt:{cmd:e(title)}}title in estimation output{p_end} {synopt:{cmd:e(clustvar)}}name of cluster variable{p_end} {synopt:{cmd:e(vcetype)}}title used to label Std. Err.{p_end} {synopt:{cmd:e(vce)}}vcetype specified{p_end} {synopt:{cmd:e(properties)}}{cmd:b V}{p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Matrices}{p_end} {synopt:{cmd:e(b)}}coefficient vector{p_end} {synopt:{cmd:e(V)}}variance-covariance matrix of the estimators{p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Functions}{p_end} {synopt:{cmd:e(sample)}}marks estimation sample{p_end} {p2colreset}{...} {title:Reference} {phang} Crossley, T.F., Levell, P., and Poupakis, S. (2022). Regression with an Imputed Dependent Variable, {it:Journal of Applied Econometrics} https://doi.org/10.1002/jae.2921. {p_end} {title:Author} {pstd}Stavros Poupakis{p_end} {pstd}University College London{p_end} {pstd}s.poupakis@ucl.ac.uk{p_end}