{smcl}
{cmd:help genyhats}
{hline}
{title:Title}
{p2colset 5 20 22 2}{...}
{p2col :genyhats {hline 2}}Generates {it:y_hat} affinity measures for PTV analysis{p_end}
{p2colreset}{...}
{title:Syntax}
{p 8 16 2}
{opt genyhats} yhatname1: {indepvars}_1 [ || yhatname2: {indepvars}_2 ... ]
[{cmd:,} {it:options}]{p_end}
or
{p 8 16 2}
{opt genyhats} {varlist} [{cmd:,} {it:options}]
{p_end}
{synoptset 21 tabbed}{...}
{synopthdr}
{synoptline}
{synopt :{opt dep:var(varname)}}the dependent variable for which affinities are being estimated{p_end}
{synopt :{opt con:textvars(varlist)}}the variables identifying different electoral contexts{p_end}
{synopt :{opt sta:ckid(varname)}}a variable identifying different "stacks", for which y-hats will be
separately generated if {cmd:genyhats} is called after stacking{p_end}
{synopt :{opt nos:tack}}override the default behavior that treats each stack as a separate context
(has no effect if the command is used before the data have been stacked.){p_end}
{synopt :{opt ypr:efix}}the prefix for new variable(s) generated from those in {it:varlist} (default
is to use a prefix of "y_"){p_end}
{synopt :{opt log:it}}use a logit model instead of linear regression (the default){p_end}
{synopt :{opt adj:ust(mean | constant | no )}}adjust the y_hat by subtracting the mean (default) or subtracting the constant term. Alternatively, make no adjustment.{p_end}
{synopt :{opt rep:lace}}drop all {it:indepvars} after the generation of y-hats{p_end}
{synopt :{opt eff:ects(window | rtf | csv | html)}}display a summary table of stack-specific effects from the regression used to generate a y-hat{p_end}
{synopt :{opt efm:t}}change the coefficients reported by the effects() option{p_end}
{synopt :{opt output}}(not recommended) directly flush the results of each stack-specific regression into the standard output.{p_end}
{synoptline}
{title:Description}
{pstd}
{cmd:genyhats} generates (multiple) {it:y-hat} affinitie(s) for {it:depvar} based on each (set of)
{it:indepvars}, saving them into the corresponding {it:yhatname} separately for each combination
of {it:stackvars}. Multiple y-hat variables can be generated by specifying multiple models
separated by {bf:||}, but each of them has to involve affinities with the same depvar so,
if issued before stacking, {cmd:genyhats} will have to be issued repeatedly (once for each of
the item-specific depvars that will, after stacking, become a single generic depvar).{break}
{pstd}
Optionally, a variable list can be used instead of the "...: ...||" primary syntax. When a variable
list is employed with {cmd:genyhats} each variable in the list is treated as a single independent
variable in a set of separate predictions of {it:depvar}. With this syntax the y-hat affinity
variable will be named by prefixing the independent variable with "y_" or such other prefix as may
be established by the {cmd:yprefix()} option.{break}
{pstd}
The two syntaxes may be combined in that any appearance of || causes the previous variables to
be treated as a variable list of which the first (unless it was followed by ":") will provide the
suffix for the new variable name, being prefixed by "y_" or such other prefix as may be established
by the {cmd:yprefix()} option.
{pstd}
The {cmd:genyhats} command estimates the effect of each (set of) indep(s) on the depvar, separately for
each stack if the data are stacked (unless {cmd:nostack} is optioned) and separately for each context
if the {cmd:contextvars()} option was employed. It uses Stata's {cmd:predict} command to produce predicted
values of the depvar for each case. These sets of so-called "y-hats" are each adjusted by subtracting
the mean from the prediction equation (separately for each stack and context, if present) -
unless some other adjustment is optioned by means of the {cmd:adjust} option - and saved under the
appropriate variable name as described above. Estimation is by OLS unless {cmd:logit} is optioned.{break}
NOTE that, if the y_hat is not adjusted, the stack-specific mean will be included in the estimated y-hats,
creating inconsistencies as between stacks and contexts that can cause large anomalies in subsequent
analyses using these variables. As a result, in published work the choice of subtracting the mean has mostly
been employed (and is the default option in {cmd:genyhats}). However, the option of subtracting the constant term
is also available.
{pstd}
The {cmd:genyhats} command can be issued before or after stacking. If issued after stacking, by default
it treats each stack as a separate context to take into account along with any higher-level contexts.
This yields the same y-hat estimates as would have been created for separate unstacked depvars. However,
the {cmd:nostack} option can be employed to force {cmd:genyhats} to ignore the stack-specific contexts.
In addition, {cmd:genyhats} can be employed with or without distinguishing between higher-level contexts, if
any, (with or without the {cmd:contextvars()} option) depending on what makes methodological sense. If issued
after stacking the command need only be issued once for the (generic) depvar instead of separately for
each unstacked depvar. This makes {cmd:genyhats} simpler to use and saves creating a mass of temporary
variables which hugely increase the size of the (often already very large) stacked file, but takes longer
because estimation is performed with a much larger dataset, selecting a different stack on each pass.
{pstd}
NOTE that when used in subsequent analyses (for instance in regression models) estimated
coefficients for y-hat variables are not readily interpretable. In the absence of error
variance and multicolinearity, each coefficient calculated for a y-hat independent variable predicting
the ptv dependent variable would be +1.0. The actual values of these coefficients thus constitute a
quasi-measure of covariance - like a partial correlation coefficient. However, standard errors (along with
beta coefficients from OLS) retain their customary meanings.
{title:Options}
{phang}
{opt depvar(varname)} if specified, the variable for which affinities are estimated
(default is {it:ptv}).{p_end}
{phang}
{opt contextvars(varlist)} if specified, the variables whose combinations of values identify
different electoral contexts (by default all cases are treated as part of a single context).{p_end}
{phang}
{opt nostack} if present, overrides the default behavior of treating each stack as a separate context
(has no effect if the {cmd:genyhats} command is issued before stacking).{p_end}
{phang}
{opt stackid(varname)} if specified, a variable identifying different "stacks", for which
y-hats will be separately generated. The default is to use the "genstacks_stack"
variable if the {cmd:genyhats} command is issued after stacking.{p_end}
{phang}
{opt yprefix} if specified, provides a prefix for y-hat affinities generated for each variable in
a variable list (the default is "y_"). NOTE that the prefix, whether default or provided, can be
overridden by explicitly specifying the y-hat variable name before a colon introducing the
variable(s) to be used in estimating this y-hat.{p_end}
{phang}
{opt logit} if specified, invokes a logit model instead of linear regression (the default).{p_end}
{phang}
{opt adjust( constant | mean | no )} if specified, adjusts the y_hat by subtracting the constant
term (default) or subtracting the mean. Alternatively, make no adjustment. Note that, when a logit
model is optioned, the adjustment takes place on propensity values, and then mapped back to probability
values.{p_end}
{phang}
{opt replace} if specified, drops all {it:indepvars} for all specified models after the generation
of y-hats.{p_end}
{phang}
{opt effects(window | rtf | csv | html)} if specified, displays a table (in publication format) that
summarizes the different effects of the same predictors in different stacks. The {cmd:window} option
flushes the table to the standard output, while the other option saves the table in an external file,
according to the chosen file format. By default, z-values are reported, along with significance stars.
The {cmd:efmt()} option can be used to change the coefficients reported in the table.{p_end}
{phang}
{opt efmt} if specified, changes the coefficient reported in tables generated by {cmd:effects}.
{efmt()} accepts two types of values: either {cmd:beta} (in order to obtain beta coefficients) or
any format string that is accepted by the {bf:{help estout##cells:cells()}} option of the {cmd:estout} command.
As an example, {cmd:efmt(b(fmt(3)star))} displays b coefficients with three decimal digits and
significance stars.{p_end}
{phang}
{opt output} if specified, the results of each stack-specific regression used to generate a y-hat
are directly flushed into the standard output.{p_end}
{title:Examples:}
{pstd}The following command generates two y-hat variables for {it:ptv} (the default dependent
variable), based on working conditions and issues, with observations clustered by {it:t102};
and drops the original independent variables. In this example {cmd:stackid} is set to the
variable {it:stack}, implying that the name "stack" was specified in the {cmd:stackid()} option
of a previous {cmd:genstacks} command, or that the data were reshaped in some other fashion (eg
using stata's {bf:{help reshape:reshape}} command with "stack" specified for the {cmd:j()} option.
{phang2}
{cmd:. genyhats ywork: work_* || yissues: q56-q67, context(t102) stackid(stack) replace}{p_end}
{pstd}The following command generates four y-hat variables for {it:chosen} (a binary dependent
variable), one each for {it:age, income, union} and {it:educ}, with observations clustered by
{it:t102}. Because the dependent variable is binary a logit model is used. The {cmd:yprefix}
option ensures that the resulting y-hat variables will be named {it:yl_age, yl_income,
yl_union} and {it:yl_educ}, possibly to distinguish them from similar variables created by OLS.
Because {cmd:replace} is not optioned, all variables will be retained. Because the {cmd:stackid}
variable is not defined, either {cmd:genstacks} is operating on unstacked data or on stacked data
whose stacks are identified by the default "genstacks_stack" variable created by {cmd:genstacks}.
{phang2}
{cmd:. genyhats age income union educ, depvar(chosen) contextvars(t102) yprefix(yl_) logit}{p_end}
{title:Generated variables}
{pstd}
{cmd:genyhats} saves the following variables:
{synoptset 27 tabbed}{...}
{synopt:{it:yhatname1 [yhatname2...]}} a set of y-hat (predicted) variables,
each one either named before the colon introducing the corresponding (set of) indepvar(s) or (if no
colon was employed) constructed from the corresponding indep by prefixing it with "y_" or whatever
prefix may have been set using the {cmd:yprefix} option.{p_end}