help genyhats-------------------------------------------------------------------------------

Titlegenyhats -- Generates

y_hataffinity measures for PTV analysis

Syntax

genyhatsyhatname1:indepvars_1 [ || yhatname2:indepvars_2 ... ] [,options] or

genyhatsvarlist[,options]

optionsDescription -------------------------------------------------------------------------depvar(varname)the dependent variable for which affinities are being estimatedcontextvars(varlist)the variables identifying different electoral contextsstackid(varname)a variable identifying different "stacks", for which y-hats will be separately generated ifgenyhatsis called after stackingnostackoverride the default behavior that treats each stack as a separate context (has no effect if the command is used before the data have been stacked.)yprefixthe prefix for new variable(s) generated from those invarlist(default is to use a prefix of "y_")logituse a logit model instead of linear regression (the default)adjust(mean|constant|no)adjust the y_hat by subtracting the mean (default) or subtracting the constant term. Alternatively, make no adjustment.replacedrop allindepvarsafter the generation of y-hatseffects(window|rtf|csv|html)display a summary table of stack-specific effects from the regression used to generate a y-hatefmtchange the coefficients reported by the effects() optionoutput(not recommended) directly flush the results of each stack-specific regression into the standard output. -------------------------------------------------------------------------

Description

genyhatsgenerates (multiple)y-hataffinitie(s) fordepvarbased on each (set of)indepvars, saving them into the correspondingyhatnameseparately for each combination ofstackvars. Multiple y-hat variables can be generated by specifying multiple models separated by||, but each of them has to involve affinities with the same depvar so, if issued before stacking,genyhatswill have to be issued repeatedly (once for each of the item-specific depvars that will, after stacking, become a single generic depvar).Optionally, a variable list can be used instead of the "...: ...||" primary syntax. When a variable list is employed with

genyhatseach variable in the list is treated as a single independent variable in a set of separate predictions ofdepvar. With this syntax the y-hat affinity variable will be named by prefixing the independent variable with "y_" or such other prefix as may be established by theyprefix()option.The two syntaxes may be combined in that any appearance of || causes the previous variables to be treated as a variable list of which the first (unless it was followed by ":") will provide the suffix for the new variable name, being prefixed by "y_" or such other prefix as may be established by the

yprefix()option.The

genyhatscommand estimates the effect of each (set of) indep(s) on the depvar, separately for each stack if the data are stacked (unlessnostackis optioned) and separately for each context if thecontextvars()option was employed. It uses Stata'spredictcommand to produce predicted values of the depvar for each case. These sets of so-called "y-hats" are each adjusted by subtracting the mean from the prediction equation (separately for each stack and context, if present) - unless some other adjustment is optioned by means of theadjustoption - and saved under the appropriate variable name as described above. Estimation is by OLS unlesslogitis optioned. NOTE that, if the y_hat is not adjusted, the stack-specific mean will be included in the estimated y-hats, creating inconsistencies as between stacks and contexts that can cause large anomalies in subsequent analyses using these variables. As a result, in published work the choice of subtracting the mean has mostly been employed (and is the default option ingenyhats). However, the option of subtracting the constant term is also available.The

genyhatscommand can be issued before or after stacking. If issued after stacking, by default it treats each stack as a separate context to take into account along with any higher-level contexts. This yields the same y-hat estimates as would have been created for separate unstacked depvars. However, thenostackoption can be employed to forcegenyhatsto ignore the stack-specific contexts. In addition,genyhatscan be employed with or without distinguishing between higher-level contexts, if any, (with or without thecontextvars()option) depending on what makes methodological sense. If issued after stacking the command need only be issued once for the (generic) depvar instead of separately for each unstacked depvar. This makesgenyhatssimpler to use and saves creating a mass of temporary variables which hugely increase the size of the (often already very large) stacked file, but takes longer because estimation is performed with a much larger dataset, selecting a different stack on each pass.NOTE that when used in subsequent analyses (for instance in regression models) estimated coefficients for y-hat variables are not readily interpretable. In the absence of error variance and multicolinearity, each coefficient calculated for a y-hat independent variable predicting the ptv dependent variable would be +1.0. The actual values of these coefficients thus constitute a quasi-measure of covariance - like a partial correlation coefficient. However, standard errors (along with beta coefficients from OLS) retain their customary meanings.

Options

depvar(varname)if specified, the variable for which affinities are estimated (default isptv).

contextvars(varlist)if specified, the variables whose combinations of values identify different electoral contexts (by default all cases are treated as part of a single context).

nostackif present, overrides the default behavior of treating each stack as a separate context (has no effect if thegenyhatscommand is issued before stacking).

stackid(varname)if specified, a variable identifying different "stacks", for which y-hats will be separately generated. The default is to use the "genstacks_stack" variable if thegenyhatscommand is issued after stacking.

yprefixif specified, provides a prefix for y-hat affinities generated for each variable in a variable list (the default is "y_"). NOTE that the prefix, whether default or provided, can be overridden by explicitly specifying the y-hat variable name before a colon introducing the variable(s) to be used in estimating this y-hat.

logitif specified, invokes a logit model instead of linear regression (the default).

adjust(constant|mean|no)if specified, adjusts the y_hat by subtracting the constant term (default) or subtracting the mean. Alternatively, make no adjustment. Note that, when a logit model is optioned, the adjustment takes place on propensity values, and then mapped back to probability values.

replaceif specified, drops allindepvarsfor all specified models after the generation of y-hats.

effects(window|rtf|csv|html)if specified, displays a table (in publication format) that summarizes the different effects of the same predictors in different stacks. Thewindowoption flushes the table to the standard output, while the other option saves the table in an external file, according to the chosen file format. By default, z-values are reported, along with significance stars. Theefmt()option can be used to change the coefficients reported in the table.

efmtif specified, changes the coefficient reported in tables generated byeffects. {efmt()} accepts two types of values: eitherbeta(in order to obtain beta coefficients) or any format string that is accepted by thecells()option of theestoutcommand. As an example,efmt(b(fmt(3)star))displays b coefficients with three decimal digits and significance stars.

outputif specified, the results of each stack-specific regression used to generate a y-hat are directly flushed into the standard output.

Examples:The following command generates two y-hat variables for

ptv(the default dependent variable), based on working conditions and issues, with observations clustered byt102; and drops the original independent variables. In this examplestackidis set to the variablestack, implying that the name "stack" was specified in thestackid()option of a previousgenstackscommand, or that the data were reshaped in some other fashion (eg using stata'sreshapecommand with "stack" specified for thej()option.

. genyhats ywork: work_* || yissues: q56-q67, context(t102)stackid(stack) replaceThe following command generates four y-hat variables for

chosen(a binary dependent variable), one each forage, income, unionandeduc, with observations clustered byt102. Because the dependent variable is binary a logit model is used. Theyprefixoption ensures that the resulting y-hat variables will be named {it:yl_age, yl_income, yl_union} andyl_educ, possibly to distinguish them from similar variables created by OLS. Becausereplaceis not optioned, all variables will be retained. Because thestackidvariable is not defined, eithergenstacksis operating on unstacked data or on stacked data whose stacks are identified by the default "genstacks_stack" variable created bygenstacks.

. genyhats age income union educ, depvar(chosen) contextvars(t102)yprefix(yl_) logit

Generated variables

genyhatssaves the following variables:

yhatname1 [yhatname2...]a set of y-hat (predicted) variables, each one either named before the colon introducing the corresponding (set of) indepvar(s) or (if no colon was employed) constructed from the corresponding indep by prefixing it with "y_" or whatever prefix may have been set using theyprefixoption.