help mfpigen Patrick Royston -------------------------------------------------------------------------------

Title

mfpigen -- Modelling interactions between pairs of covariates

Syntax

mfpigen [, options] : regression_cmd [yvar] mainvarlist [if] [in] [ weight] [, regression_cmd_options]

where

regression_cmd may be clogit, cnreg, glm, intreg, logistic, logit, mlogit, nbreg, ologit, oprobit, poisson, probit, qreg, regress, rreg, stcox, stpm2 (if installed), streg, xtgee.

options Description ------------------------------------------------------------------------- against(against_var) variable to plot interaction function against alpha(alpha_list) significance level(s) for selecting FP functions of continuous predictors df(df_list) degrees of freedom for FP functions of continuous predictors forward(#) forward selection of interaction(s) (linear terms only) fplot([%]list) define plotting values for an interaction linadj(xvarlist_lin) adjust for linear effects of variables in xvarlist_lin interactions(intlist) adjust for predefined interactions mfpadj(xvarlist_mfp) adjust for effects of variables in xvarlist_mfp, as selected by mfp nomfp prevent MFP being applied to variables in mainvarlist noverbose suppresses the display of interaction results outcome(outcome) outcome for prediction (regression_cmd = mlogit only) plotopts(plot_options) options for graph twoway pvalue(#) P-value for screening interactions select(select_list) significance level for selecting variables se standard error of predicted functions (see fplot()) mfp_options options for mfp, excluding select(), alpha(), df() (which are described separately, see above) regression_cmd_options options for regression_cmd -------------------------------------------------------------------------

All weight types supported by regression_cmd are allowed; see weight.

yvar is not allowed for streg, stcox and stpm2; for these commands, you must first stset your data.

Description

mfpigen is designed to investigate interactions between each pair of covariates in mainvarlist. Typically these are continuous covariates, but linear effects of binary or categorical covariates are allowed. Factor variables are supported. Fractional polynomials are used to model the main effects of continuous variables. The statistical significance of each interaction between pairs of selected FP (or linear) functions is reported.

For each pair of variables in mainvarlist, mfpigen applies mfp to the remaining variables in mainvarlist and also to variables defined by mfpadj(xvarlist_mfp) to select a `confounder model' which is used to adjust an interaction model for possible confounding by other covariates. Variables defined by linadj(xvarlist_lin) are included as linear in the confounder part of the model, and are included in every model. Variables in mainvarlist and xvarlist_mfp are subject to FP transformation if required, as determined by mfp, whereas those in xvarlist_lin are modelled as linear. The best-fitting FP functions of each pair of variables modelled with an interaction and of variables in the confounder model, including the adjustment variables, are selected simultaneously in single runs of mfp.

Options

against(against_var) defines the variable against which interaction function(s) are to be plotted. See fplot() for more details.

alpha(alpha_list) sets the significance levels for testing between FP models of different degrees. The rules for alpha_list are the same as for df_list in the df() option. The default nominal p-value (significance level, selection level) is 0.05 for all variables.

df(df_list) sets the df for each predictor in mainvarlist and (if the mfpadj() option is used) in xvarlist_mfp. See df() for further details. Models with all terms linear are specified as df(1).

forward(#) performs forward selection of interaction(s) at significance level #. This option applies only to models with all terms linear, therefore use of the forward() option implies df(1). The procedure searches for the most significant interaction. If it is is significant at the # level, the interaction is reported and the procedure continues to search for anothher interaction. The process stops when no further significant interactions are found.

fplot([%]list) plots the interaction between the last pair of items in mainvarlist, say, item1 and item2. Typically, both items are continuous variables. list is a set of values of item1. The fitted function of item2 is evaluated at each value in list and plotted against item2. The functions are adjusted for other variables in the selected model, if any. Examples:

. mfpigen, fplot(30 40 50 60) : regress y age bmi . mfpigen, select(0.05) fplot(30 40 50 60) : regress y sex chol age bmi

If list is preceded by a percent sign (%) then its values are interpreted as centiles of the distribution of item1. If list is only a percent sign, default centiles of 25, 50 and 75 are used. Examples:

. mfpigen, fplot(%10 50 90) : regress y age bmi . mfpigen, select(0.05) fplot(%) : regress y sex chol age bmi

A second possibility is for item1 to be a factor variable. Then list consists of factor levels of item1, and fplot(%) means plot at all available levels. Example:

. mfpigen, fplot(1 2 3) : stcox i.grade age

A third possibility is for item1 to be of the form (varlist), i.e. a list of variables enclosed in parentheses. varlist could comprise any combination of binary, categorical or continuous variables. list defines values of each variable in varlist at which the function of item2 is to be plotted. For example, fplot(0 0 0 1 1 0 1 1) might define the four possible combinations of two binary variables, each of which takes the value 0 or 1. This would plot four fitted curves against item2, one for each combination of the two binary variables. Example:

. mfpigen, fplot(0 0 0 1 1 0 1 1): regress y (sex treat) age

An abbreviated syntax is available. If the pairs of values in list are enclosed within parentheses, all combinations of the values are generated. For example, fplot(0 0 0 1 1 0 1 1) could be abbreviated as fplot((0 1)(0 1)). All combinations of three such binary variables could be specified as fplot((0 1)(0 1)(0 1)), much easier than spelling out the required 2 ^ 3 = 8 pairs = 16 values. Examples:

. mfpigen, fplot((0 1)(0 1)(0 1)): regress y (sex treat group) age . mfpigen, fplot((25 50)(10 100)): regress y (age pgr) bmi

Items within parentheses do not have to be 0 and 1; for example, they could be values of a continuous variable. However, there must be exactly two values within each pair of parentheses. More general combinations of values should be spelled out explicitly using the standard syntax.

item2 could consist of a single variable, as already discussed, or take the form (varlist). varlist might be an FP transformation created outside mfpigen. For example, to plot an interaction between sex and an FP2 function of age with powers (-2, 2) centered on age 50, we could code:

. fracgen age -2 2, center(50) . mfpigen, fplot(0 1) adjust(no) against(age): regress y sex (age_1 age_2)

fracgen creates FP-transformed variables called age_1 and age_2, centered on age 50, that is, such that the mean of each of age_1 and age_2 is zero. The adjust(no) option of mfpigen prevents mfp from re-centering the already-centered variables age_1 and age_2. mfpigen computes the interaction between sex and both of age_1 and age_2. The example assumes that sex is coded as 0 and 1, but this coding is not mandatory. Note the use of the option against(age). Without this option, the plots would be against the first member of varlist, in this case, age_1. We would be unlikely to want this.

As well as being plotted, the fitted functions are saved under the names _fit1, _fit2, ... .

interactions(intlist) adjusts all investigated models for predefined interactions specified by intlist. The syntax of intlist is var11 var12 [, var21 var22 ...]. Each pair of variables is translated to model terms of the form c.var11##c.var12 if var11 and var12 are both continuous. If either of the variables is an FP transformation with more than one term, the terms are included in parentheses, for example to include an interaction between an FP2 function of age and binary sex, we would specify interactions((age_1 age_2) i.sex), where age_1 age_2 are the FP2 transformed terms for age. The interaction terms are included as linear terms in all interaction models investigated.

Note that continuous variables should be entered as they are and categorical predictors preceded by i., for example, interactions((age_1 age_2) i.race).

linadj(xvarlist_lin) includes xvarlist_lin as confounder variables in all the fitted models. They are always modelled as linear and are not subject to selection. xvarlist_lin may include factor variables.

mfpadj(xvarlist_mfp) includes xvarlist_mfp as confounder variables in all the fitted models. Members of xvarlist_mfp are subject to selection and to determination of FP functional form by mfp, according to the options used for model selection (see the alpha(), df() and select() options). xvarlist_mfp may include factor variables.

nomfp prevents MFP being applied to variables in mainvarlist, and prevents them being candidates for an adjustment model. The default is to select these variables using mfp, if necessary with FP transformation.

noverbose suppresses the display of interaction results. This is useful when you are building up a model including multiple interactions and you wish to see which interaction has the lowest P-value.

outcome(outcome) specifies the outcome in mlogit models for which the linear predictor is to be calculated. For details of the syntax, see the description of outcome() in mlogit postestimation.

plotopts(plot_options) are options for the graph of fitted function to be used by graph twoway.

pvalue(#) defines the P-value to be used for screening interactions. Interactions that are not significant at the # level are not displayed, thus reducing clutter in the output. Default # is 1, meaning results for all interactions are displayed. Note that the pvalue() option has no effect on estimation, it is merely for convenience when inspecting many interactions for "interesting" ones.

se requests standard errors of the fitted functions provided by the fplot() options. These are saved under the names _sefit1, _sefit2, ... .

select(select_list) sets the nominal p-values (significance levels) for variable selection by backward elimination. A variable is dropped if its removal causes a non-significant increase in deviance. The rules for select_list are the same as those for df_list in the df() option. Using the default selection level of 1 for all variables forces them all into the model. Setting the nominal p-value to be 1 for a given variable forces it into the model, leaving others to be selected or not. The nominal p-value for elements of mainvarlist or xvarlist_mfp bound by parentheses is specified by including (xvarlist) or (xvarlist_mfp) in select_list. Note that variables in xvarlist_lin may not be included in select_list.

showmfp displays each mfp command that is run by mfpigen, and its results. This is to enable you to check that the commands are correct and as expected.

regression_cmd_options are any options for regression_cmd.

mfp_options are any options for mfp, excluding alpha(), df() and select().

Methodology

The algorithm provided in mfpigen can be summarized as follows. Suppose we have continuous variables z1 and z2 and potential confounders x:

1. Apply MFP to z1, z2 and x with significance level a* for selecting members of x and FP functions of continuous variables. Force z1 and z2 into the model and apply the FP function selection procedure to them. This step requires a single run of MFP.

2. Calculate multiplicative interaction terms between the FP transformations selected for z1 and z2, or between untransformed z1 and z2 if no FP transformation is needed. For example, if both variables need FP2 transformation, four interaction terms are created.

3. Refit the model selected on x, z1, z2 with the interaction terms included. Test the latter in the usual way using a likelihood ratio test. If k interaction terms are added to the model, the interaction chisquare test has k d.f. For example, if FP2 functions were selected for both z1 and z2 then k = 2 × 2 = 4.

4. Consider all pairs of predictors for possible interaction, irrespective of the statistical significance of their main effects in the MFP model. If z1 and/or z2 is binary or forced to be linear, the procedure simplifies to the usual situation. If z1 and/or z2 are categorical, joint tests on all dummy variables are performed. An option is to treat the dummy variables as separate predictors.

5. Check all interactions for artefacts and ignore any that fail the check. See section 7.4.2 of Royston & Sauerbrei (2008) for further details.

6. If more than one interaction is detected, apply a forward stepwise procedure to extend the main-effects model.

There is one main difference between this algorithm, MFPIgen, and MFPI (Royston & Sauerbrei 2004, 2009). In MFPI, the confounder model x is selected independently of z1 and z2, whereas in MFPIgen, a joint model is selected. The reason for the difference is that MFPI is principally intended for use with data from a randomized trial in which the effect of the treatment covariate z1 is by design independent of other covariate effects. Therefore, adjustment by x is less important. In observational studies, however, it may be necessary fully to adjust the effects of z1 and z2 for confounders before investigating their interaction.

Since MFPIgen addresses dozens of potential interactions, multiple testing is an issue. Results must be checked in detail and interpreted cautiously as hypothesis-generating only.

Examples

. mfpigen, alpha(0.2): logit y x1 x2 x3 x4 x5

. mfpigen: stcox x1 x2 x3 x4 x5, stratify(group)

. mfpigen, select(0.05) dfdefault(2) linadj(x1 x4 x5) mfpadj(x6 x7) fplot(%33 67) se: logit y x2 x3

. mfpigen, select(0.05) fplot(0 1) dfdefault(2) alpha(1): regress mpg price headroom trunk weight length turn displacement foreign gear_ratio

Author

Patrick Royston MRC Clinical Trials Unit London, UK pr@ctu.mrc.ac.uk

References

Royston, P., and W. Sauerbrei. 2008. Multivariable model-building. A pragmatic approach based on fractional polynomials for modelling continuous variables, pp. 172-181. Chichester, John Wiley and Sons.

Also see

Manual: [R] fracpoly, [R] mfp

Online: mfp, fracpoly, mfpi (if installed)