{smcl}
{* septiembre 22, 2013 @ 02:49:50}{...}
{hline}
help for {hi:genspec}
{hline}

{title:Title}

{p 8 20 2}
    {hi:genspec} {hline 2} A General-to-Specific modelling algorithm

{title:Syntax}

{p 8 20 2}
{cmdab:genspec} {it:{help varnames:depvar}} {it:{help varnames:indepvars}} {ifin} [{it:{help weight}}] [{cmd:,} {it:options}]

{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab :Options}
{synopt :{cmd:vce(}{it:vcetype}{cmd:)}}determines the type of standard error (robust, cluster, bootstrap, or jackknife) to be reported in
the estimated regression model
{p_end}
{...}
{synopt :{cmd:xt(be|fe|re)}}specifies that the model is based upon panel data, and whether a random-effects (RE), fixed-effects (FE),
or between-effects (BE) model should be estimated. {help xtset} must be specified prior to using this option
{p_end}
{...}
{synopt :{opt ts}}specifies that the model is based upon time-series data. {help tsset} must be specified prior to using this option
{p_end}
{...}
{synopt :{opt nodiag:nostic}}turns off the initial diagnostic tests for model misspecification; this should be used with caution
{p_end}
{...}
{synopt :{cmdab:tlimit(}#{cmdab:)}}sets the critical t-value for diagnostic tests (by default this value is 1.96)
{p_end}
{...}
{synopt :{cmdab:num:search(}#{cmdab:)}}defines the number of search paths to follow in the algorithm (5 by default), if a large dataset is used,
fewer search paths may be preferred
{p_end}
{...}
{synopt :{opt nopart:ition}}uses the full sample of data in all search paths, and does not run out of sample testing
{p_end}
{...}
{synopt :{opt noserial}}requests that no serial correlation test is performed on panel data models; this option should only
be specified with the {cmd:xt} option
{p_end}
{...}
{synopt :{opt verbose}}requests full program output for each search path explored
{p_end}
{...}
{synoptline}
{p2colreset}


{title:Description}

{p 6 6 2}
{hi:genspec} is an algorithm for general-to-specific model prediction in Stata.  It is designed to search a large number of variables, and from these
select the 'best' model based upon a criteria of relevance and explanatory power. From a user-defined general unrestricted model, or `GUM', (often
comprised of all independent variables the user considers potentially important, plus nonlinearities and lags), {cmd:genspec} 
searches for the best possible final model among optimal subsets of the general model, as per the general-to-specific modelling process described in 
the econometric literature. The user passes the GUM to {cmd:genspec} as a {it:{help varnames:depvar}} and a group of {it:{help varnames:indepvars}} which 
are potentially important elements in the GUM.  The initial GUM is tested for congruence, and then multiple search paths are followed.  A potential 
final specification is reached when no further restrictions of the GUM remain congruent, and/or no further insignificant variables remain.

{p 6 6 2}
{hi:genspec} allows the user to run the model prediction algorithm for time-series, cross-sectional, or panel data models.  The {hi:genspec} command runs a
series of linear regressions when searching for the final (specific) model, so is a wrapper for either the {cmd:regress} or {cmd:xtreg} command.  In 
the case of time series or panel data models, the user must specifiy the {cmd:ts} or {cmd:xt} option, and {help tsset} or {help xtset} the data 
respectively. For panel data models, the user written {help xtserial} command is used.  This option does not accept {help fvvarlist:factor variable} 
operators. If factor variable operators are used with the {cmd:xt} option, {cmd:noserial} should be specified.  The {hi:genspec} command accepts 
{help fvvarlist:factor variables} of the form # and c#, however does not accept the i{c 46} operator.  For users who wish to include a full set of 
dummy variables, these should be generated and passed as {it:{help indepvars}} {c 150} perhaps via Stata's {help tab:tab, gen()} command.

{p 6 6 2}
For further details regarding the functionality of {cmd:genspec} or general-to-specific modelling in general, refer to
{it: General to Specific Modelling in Stata} available at: {browse "https://sites.google.com/site/damiancclarke/research#TOC-Work-in-Progress":https://sites.google.com/site/damiancclarke/research}.


{marker examples}{...}
{title:Examples}

    {hline}
{pstd}Search the auto dataset for the significant predictors of car price{break}

{phang2}{cmd:. sysuse auto}{p_end}
{phang2}{cmd:. genspec price mpg rep78 headroom trunk weight length foreign turn displace}{p_end}

    {hline}

{pstd}Search the National Longitudinal (panel) Survey for significant predictors of log wages{p_end}

{pstd}Setup{p_end}
{phang2}{cmd:. webuse nlswork}{p_end}
{phang2}{cmd:. genspec ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south msp nev_mar union, xt(fe) numsearch(2)}{p_end}

    {hline}

{pstd}Predict variables for Hoover and Perez (1999)'s time-series model 5{p_end}

{pstd}Setup{p_end}
{phang2}{cmd:. webuse set http://users.ox.ac.uk/~ball3491/}{p_end}
{phang2}{cmd:. webuse gets_data}{p_end}
{phang2}{cmd:. qui ds y* u* time, not}{p_end}
{phang2}{cmd:. local xvars `r(varlist)'}{p_end}
{phang2}{cmd:. local lags l.dcoinc l.gd l.ggeq l.ggfeq l.ggfr l.gnpq l.gydq l.gpiq l.fmrra l.fmbase l.fm1dq l.fm2dq l.fsdj l.fyaaac l.lhc l.lhur l.mu l.mo}{p_end}

{phang2}{cmd:. genspec y5 `xvars' `lags' l.y5 l2.y5 l3.y5 l4.y5, ts}{p_end}

    {hline}


{marker results}{...}
{title:Saved results}

{pstd}
{cmd:genspec} saves the following in {cmd:e()}:

{synoptset 10 tabbed}{...}
{p2col 5 20 24 2: Scalars}{p_end}
{synopt:{cmd:e(fit)}}Bayesian Information Criterion of final specification {p_end}

{synoptset 10 tabbed}{...}
{p2col 5 20 24 2: Macros}{p_end}
{synopt:{cmd:e(cmd)}}List of variables from the final specification {p_end}
	

{marker references}{...}
{title:References}

{marker Clarke2013}{...}
{phang}
Clarke D.C., 2013.
{browse "https://sites.google.com/site/damiancclarke/research":{it:General to Specific Modelling in Stata}.}
Manuscript.

{marker Drukker2003}{...}
{phang}
Drukker D.M., 2003.
{browse "http://www.stata-journal.com/article.html?article=st0039":{it: Testing for serial correlation in linear panel-data models}},
Stata Journal 3(2): 168-177.

{marker HooverPerez1999}{...}
{phang}
Hoover, K.D. and S.J. Perez., 1999.
{browse "http://ideas.repec.org/p/fth/caldec/97-27.html":{it: Data mining reconsidered: encompassing the general-to-specific approach to specification search}},
Econometrics Journal 2: 167-191.
{p_end}


{title:Acknowledgements}

    {p 4 4 2} I thank Marta Dormal, Dr. Bent Nielsen,  Dr. Nicolas Van de Sijpe and George Vega Yon for useful
		comments and advice.  I also thank the Comisi{c o'}n Nacional de Investigaci{c o'}n Cient{c i'}fica
		y Tecnol{c o'}gica of the Government of Chile who supported my research during the writing of this
		program. 


{title:Also see}

{psee}
Online:  {manhelp regress_postestimation R: regress postestimation}, {manhelp regress_postestimationts R:regress postestimation times series}, {manhelp xtreg_postestimation XT: xtreg postestimation}



{title:Author}

{pstd}
Damian C. Clarke, Department of Economics, University of Oxford. {browse "mailto:damian.clarke@economics.ox.ac.uk":damian.clarke@economics.ox.ac.uk}
{p_end}