help genyhats


genyhats -- Generates y_hat affinity measures for PTV analysis


genyhats yhatname1: indepvars_1 [ || yhatname2: indepvars_2 ... ] [, options] or

genyhats varlist [, options]

options Description ------------------------------------------------------------------------- depvar(varname) the dependent variable for which affinities are being estimated contextvars(varlist) the variables identifying different electoral contexts stackid(varname) a variable identifying different "stacks", for which y-hats will be separately generated if genyhats is called after stacking nostack override the default behavior that treats each stack as a separate context (has no effect if the command is used before the data have been stacked.) yprefix the prefix for new variable(s) generated from those in varlist (default is to use a prefix of "y_") logit use a logit model instead of linear regression (the default) adjust(mean | constant | no ) adjust the y_hat by subtracting the mean (default) or subtracting the constant term. Alternatively, make no adjustment. replace drop all indepvars after the generation of y-hats effects(window | rtf | csv | html) display a summary table of stack-specific effects from the regression used to generate a y-hat efmt change the coefficients reported by the effects() option output (not recommended) directly flush the results of each stack-specific regression into the standard output. -------------------------------------------------------------------------


genyhats generates (multiple) y-hat affinitie(s) for depvar based on each (set of) indepvars, saving them into the corresponding yhatname separately for each combination of stackvars. Multiple y-hat variables can be generated by specifying multiple models separated by ||, but each of them has to involve affinities with the same depvar so, if issued before stacking, genyhats will have to be issued repeatedly (once for each of the item-specific depvars that will, after stacking, become a single generic depvar).

Optionally, a variable list can be used instead of the "...: ...||" primary syntax. When a variable list is employed with genyhats each variable in the list is treated as a single independent variable in a set of separate predictions of depvar. With this syntax the y-hat affinity variable will be named by prefixing the independent variable with "y_" or such other prefix as may be established by the yprefix() option.

The two syntaxes may be combined in that any appearance of || causes the previous variables to be treated as a variable list of which the first (unless it was followed by ":") will provide the suffix for the new variable name, being prefixed by "y_" or such other prefix as may be established by the yprefix() option.

The genyhats command estimates the effect of each (set of) indep(s) on the depvar, separately for each stack if the data are stacked (unless nostack is optioned) and separately for each context if the contextvars() option was employed. It uses Stata's predict command to produce predicted values of the depvar for each case. These sets of so-called "y-hats" are each adjusted by subtracting the mean from the prediction equation (separately for each stack and context, if present) - unless some other adjustment is optioned by means of the adjust option - and saved under the appropriate variable name as described above. Estimation is by OLS unless logit is optioned. NOTE that, if the y_hat is not adjusted, the stack-specific mean will be included in the estimated y-hats, creating inconsistencies as between stacks and contexts that can cause large anomalies in subsequent analyses using these variables. As a result, in published work the choice of subtracting the mean has mostly been employed (and is the default option in genyhats). However, the option of subtracting the constant term is also available.

The genyhats command can be issued before or after stacking. If issued after stacking, by default it treats each stack as a separate context to take into account along with any higher-level contexts. This yields the same y-hat estimates as would have been created for separate unstacked depvars. However, the nostack option can be employed to force genyhats to ignore the stack-specific contexts. In addition, genyhats can be employed with or without distinguishing between higher-level contexts, if any, (with or without the contextvars() option) depending on what makes methodological sense. If issued after stacking the command need only be issued once for the (generic) depvar instead of separately for each unstacked depvar. This makes genyhats simpler to use and saves creating a mass of temporary variables which hugely increase the size of the (often already very large) stacked file, but takes longer because estimation is performed with a much larger dataset, selecting a different stack on each pass.

NOTE that when used in subsequent analyses (for instance in regression models) estimated coefficients for y-hat variables are not readily interpretable. In the absence of error variance and multicolinearity, each coefficient calculated for a y-hat independent variable predicting the ptv dependent variable would be +1.0. The actual values of these coefficients thus constitute a quasi-measure of covariance - like a partial correlation coefficient. However, standard errors (along with beta coefficients from OLS) retain their customary meanings.


depvar(varname) if specified, the variable for which affinities are estimated (default is ptv).

contextvars(varlist) if specified, the variables whose combinations of values identify different electoral contexts (by default all cases are treated as part of a single context).

nostack if present, overrides the default behavior of treating each stack as a separate context (has no effect if the genyhats command is issued before stacking).

stackid(varname) if specified, a variable identifying different "stacks", for which y-hats will be separately generated. The default is to use the "genstacks_stack" variable if the genyhats command is issued after stacking.

yprefix if specified, provides a prefix for y-hat affinities generated for each variable in a variable list (the default is "y_"). NOTE that the prefix, whether default or provided, can be overridden by explicitly specifying the y-hat variable name before a colon introducing the variable(s) to be used in estimating this y-hat.

logit if specified, invokes a logit model instead of linear regression (the default).

adjust( constant | mean | no ) if specified, adjusts the y_hat by subtracting the constant term (default) or subtracting the mean. Alternatively, make no adjustment. Note that, when a logit model is optioned, the adjustment takes place on propensity values, and then mapped back to probability values.

replace if specified, drops all indepvars for all specified models after the generation of y-hats.

effects(window | rtf | csv | html) if specified, displays a table (in publication format) that summarizes the different effects of the same predictors in different stacks. The window option flushes the table to the standard output, while the other option saves the table in an external file, according to the chosen file format. By default, z-values are reported, along with significance stars. The efmt() option can be used to change the coefficients reported in the table.

efmt if specified, changes the coefficient reported in tables generated by effects. {efmt()} accepts two types of values: either beta (in order to obtain beta coefficients) or any format string that is accepted by the cells() option of the estout command. As an example, efmt(b(fmt(3)star)) displays b coefficients with three decimal digits and significance stars.

output if specified, the results of each stack-specific regression used to generate a y-hat are directly flushed into the standard output.


The following command generates two y-hat variables for ptv (the default dependent variable), based on working conditions and issues, with observations clustered by t102; and drops the original independent variables. In this example stackid is set to the variable stack, implying that the name "stack" was specified in the stackid() option of a previous genstacks command, or that the data were reshaped in some other fashion (eg using stata's reshape command with "stack" specified for the j() option.

. genyhats ywork: work_* || yissues: q56-q67, context(t102) stackid(stack) replace

The following command generates four y-hat variables for chosen (a binary dependent variable), one each for age, income, union and educ, with observations clustered by t102. Because the dependent variable is binary a logit model is used. The yprefix option ensures that the resulting y-hat variables will be named {it:yl_age, yl_income, yl_union} and yl_educ, possibly to distinguish them from similar variables created by OLS. Because replace is not optioned, all variables will be retained. Because the stackid variable is not defined, either genstacks is operating on unstacked data or on stacked data whose stacks are identified by the default "genstacks_stack" variable created by genstacks.

. genyhats age income union educ, depvar(chosen) contextvars(t102) yprefix(yl_) logit

Generated variables

genyhats saves the following variables:

yhatname1 [yhatname2...] a set of y-hat (predicted) variables, each one either named before the colon introducing the corresponding (set of) indepvar(s) or (if no colon was employed) constructed from the corresponding indep by prefixing it with "y_" or whatever prefix may have been set using the yprefix option.