help mfpigenPatrick Royston -------------------------------------------------------------------------------

Title

mfpigen-- Modelling interactions between pairs of covariates

Syntax

mfpigen[,options]:regression_cmd[yvar]mainvarlist[if] [in] [weight] [,regression_cmd_options]

regression_cmdmay beclogit,cnreg,glm,intreg,logistic,logit,mlogit,nbreg,ologit,oprobit,poisson,probit,qreg,regress,rreg,stcox,stpm2(if installed),streg,xtgee.

optionsDescription -------------------------------------------------------------------------against(against_var)variable to plot interaction function againstalpha(alpha_list)significance level(s) for selecting FP functions of continuous predictorsdf(df_list)degrees of freedom for FP functions of continuous predictorsforward(#)forward selection of interaction(s) (linear terms only)fplot([%]list)define plotting values for an interactionlinadj(xvarlist_lin)adjust for linear effects of variables inxvarlist_lininteractions(intlist)adjust for predefined interactionsmfpadj(xvarlist_mfp)adjust for effects of variables inxvarlist_mfp, as selected bymfpnomfpprevent MFP being applied to variables inmainvarlistnoverbosesuppresses the display of interaction resultsoutcome(outcome)outcome for prediction (regression_cmd=mlogitonly)plotopts(plot_options)options forgraph twowaypvalue(#)P-value for screening interactionsselect(select_list)significance level for selecting variablessestandard error of predicted functions (seefplot())mfp_optionsoptions formfp, excludingselect(),alpha(),df()(which are described separately, see above)regression_cmd_optionsoptions forregression_cmd-------------------------------------------------------------------------All weight types supported by

regression_cmdare allowed; see weight.

yvaris not allowed forstreg,stcoxandstpm2; for these commands, you must firststsetyour data.

Description

mfpigenis designed to investigate interactions between each pair of covariates inmainvarlist. Typically these are continuous covariates, but linear effects of binary or categorical covariates are allowed. Factor variables are supported. Fractional polynomials are used to model the main effects of continuous variables. The statistical significance of each interaction between pairs of selected FP (or linear) functions is reported.For each pair of variables in

mainvarlist,mfpigenapplies mfp to the remaining variables inmainvarlistand also to variables defined bymfpadj(xvarlist_mfp)to select a `confounder model' which is used to adjust an interaction model for possible confounding by other covariates. Variables defined bylinadj(xvarlist_lin)are included as linear in the confounder part of the model, and are included in every model. Variables inmainvarlistandxvarlist_mfpare subject to FP transformation if required, as determined bymfp, whereas those inxvarlist_linare modelled as linear. The best-fitting FP functions of each pair of variables modelled with an interaction and of variables in the confounder model, including the adjustment variables, are selected simultaneously in single runs ofmfp.

Options

against(against_var)defines the variable against which interaction function(s) are to be plotted. Seefplot()for more details.

alpha(alpha_list)sets the significance levels for testing between FP models of different degrees. The rules foralpha_listare the same as fordf_listin thedf()option. The default nominal p-value (significance level, selection level) is 0.05 for all variables.

df(df_list)sets the df for each predictor inmainvarlistand (if themfpadj()option is used) inxvarlist_mfp. Seedf()for further details. Models with all terms linear are specified asdf(1).

forward(#)performs forward selection of interaction(s) at significance level#. This option applies only to models with all terms linear, therefore use of theforward()option impliesdf(1). The procedure searches for the most significant interaction. If it is is significant at the#level, the interaction is reported and the procedure continues to search for anothher interaction. The process stops when no further significant interactions are found.

fplot([%]list)plots the interaction between the last pair of items inmainvarlist, say,item1anditem2. Typically, both items are continuous variables.listis a set of values ofitem1. The fitted function ofitem2is evaluated at each value inlistand plotted againstitem2. The functions are adjusted for other variables in the selected model, if any. Examples:

. mfpigen, fplot(30 40 50 60) : regress y age bmi. mfpigen, select(0.05) fplot(30 40 50 60) : regress y sex chol agebmiIf

listis preceded by a percent sign (%) then its values are interpreted as centiles of the distribution ofitem1. Iflistis only a percent sign, default centiles of 25, 50 and 75 are used. Examples:

. mfpigen, fplot(%10 50 90) : regress y age bmi. mfpigen, select(0.05) fplot(%) : regress y sex chol age bmiA second possibility is for

item1to be a factor variable. Thenlistconsists of factor levels ofitem1, andfplot(%)means plot at all available levels. Example:

. mfpigen, fplot(1 2 3) : stcox i.grade ageA third possibility is for

item1to be of the form(varlist), i.e. a list of variables enclosed in parentheses.varlistcould comprise any combination of binary, categorical or continuous variables.listdefines values of each variable invarlistat which the function ofitem2is to be plotted. For example,fplot(0 0 0 1 1 0 1 1)might define the four possible combinations of two binary variables, each of which takes the value 0 or 1. This would plot four fitted curves againstitem2, one for each combination of the two binary variables. Example:

. mfpigen, fplot(0 0 0 1 1 0 1 1): regress y (sex treat) ageAn abbreviated syntax is available. If the pairs of values in

listare enclosed within parentheses, all combinations of the values are generated. For example,fplot(0 0 0 1 1 0 1 1)could be abbreviated asfplot((0 1)(0 1)). All combinations of three such binary variables could be specified asfplot((0 1)(0 1)(0 1)), much easier than spelling out the required 2 ^ 3 = 8 pairs = 16 values. Examples:

. mfpigen, fplot((0 1)(0 1)(0 1)): regress y (sex treat group) age. mfpigen, fplot((25 50)(10 100)): regress y (age pgr) bmiItems within parentheses do not have to be 0 and 1; for example, they could be values of a continuous variable. However, there must be exactly two values within each pair of parentheses. More general combinations of values should be spelled out explicitly using the standard syntax.

item2could consist of a single variable, as already discussed, or take the form(varlist).varlistmight be an FP transformation created outsidemfpigen. For example, to plot an interaction between sex and an FP2 function of age with powers (-2, 2) centered on age 50, we could code:

. fracgen age -2 2, center(50). mfpigen, fplot(0 1) adjust(no) against(age): regress y sex (age_1age_2)

fracgencreates FP-transformed variables calledage_1andage_2, centered on age 50, that is, such that the mean of each ofage_1andage_2is zero. Theadjust(no)option ofmfpigenpreventsmfpfrom re-centering the already-centered variablesage_1andage_2.mfpigencomputes the interaction betweensexand both ofage_1andage_2. The example assumes thatsexis coded as 0 and 1, but this coding is not mandatory. Note the use of the optionagainst(age). Without this option, the plots would be against the first member ofvarlist, in this case,age_1. We would be unlikely to want this.As well as being plotted, the fitted functions are saved under the names

_fit1,_fit2, ... .

interactions(intlist)adjusts all investigated models for predefined interactions specified byintlist. The syntax ofintlistisvar11var12[,var21var22...]. Each pair of variables is translated to model terms of the formc.var11##c.var12ifvar11andvar12are both continuous. If either of the variables is an FP transformation with more than one term, the terms are included in parentheses, for example to include an interaction between an FP2 function of age and binary sex, we would specify interactions((age_1 age_2) i.sex), whereage_1 age_2are the FP2 transformed terms for age. The interaction terms are included as linear terms in all interaction models investigated.Note that continuous variables should be entered as they are and categorical predictors preceded by

i., for example,interactions((age_1 age_2) i.race).

linadj(xvarlist_lin)includesxvarlist_linas confounder variables in all the fitted models. They are always modelled as linear and are not subject to selection.xvarlist_linmay include factor variables.

mfpadj(xvarlist_mfp)includesxvarlist_mfpas confounder variables in all the fitted models. Members ofxvarlist_mfpare subject to selection and to determination of FP functional form bymfp, according to the options used for model selection (see thealpha(),df()andselect()options).xvarlist_mfpmay include factor variables.

nomfpprevents MFP being applied to variables inmainvarlist, and prevents them being candidates for an adjustment model. The default is to select these variables usingmfp, if necessary with FP transformation.

noverbosesuppresses the display of interaction results. This is useful when you are building up a model including multiple interactions and you wish to see which interaction has the lowest P-value.

outcome(outcome)specifies the outcome inmlogitmodels for which the linear predictor is to be calculated. For details of the syntax, see the description ofoutcome()inmlogit postestimation.

plotopts(plot_options)are options for the graph of fitted function to be used bygraph twoway.

pvalue(#)defines the P-value to be used for screening interactions. Interactions that are not significant at the#level are not displayed, thus reducing clutter in the output. Default#is 1, meaning results for all interactions are displayed. Note that thepvalue()option has no effect on estimation, it is merely for convenience when inspecting many interactions for "interesting" ones.

serequests standard errors of the fitted functions provided by thefplot()options. These are saved under the names_sefit1,_sefit2, ... .

select(select_list)sets the nominal p-values (significance levels) for variable selection by backward elimination. A variable is dropped if its removal causes a non-significant increase in deviance. The rules forselect_listare the same as those fordf_listin thedf()option. Using the default selection level of 1 for all variables forces them all into the model. Setting the nominal p-value to be 1 for a given variable forces it into the model, leaving others to be selected or not. The nominal p-value for elements ofmainvarlistorxvarlist_mfpbound by parentheses is specified by including(xvarlist)or(xvarlist_mfp)inselect_list. Note that variables inxvarlist_linmay not be included inselect_list.

showmfpdisplays eachmfpcommand that is run bymfpigen, and its results. This is to enable you to check that the commands are correct and as expected.

regression_cmd_optionsare any options forregression_cmd.

mfp_optionsare any options formfp, excludingalpha(),df()andselect().

MethodologyThe algorithm provided in

mfpigencan be summarized as follows. Suppose we have continuous variables z1 and z2 and potential confounders x:1. Apply MFP to z1, z2 and x with significance level a* for selecting members of x and FP functions of continuous variables. Force z1 and z2 into the model and apply the FP function selection procedure to them. This step requires a single run of MFP.

2. Calculate multiplicative interaction terms between the FP transformations selected for z1 and z2, or between untransformed z1 and z2 if no FP transformation is needed. For example, if both variables need FP2 transformation, four interaction terms are created.

3. Refit the model selected on x, z1, z2 with the interaction terms included. Test the latter in the usual way using a likelihood ratio test. If k interaction terms are added to the model, the interaction chisquare test has k d.f. For example, if FP2 functions were selected for both z1 and z2 then k = 2 × 2 = 4.

4. Consider all pairs of predictors for possible interaction, irrespective of the statistical significance of their main effects in the MFP model. If z1 and/or z2 is binary or forced to be linear, the procedure simplifies to the usual situation. If z1 and/or z2 are categorical, joint tests on all dummy variables are performed. An option is to treat the dummy variables as separate predictors.

5. Check all interactions for artefacts and ignore any that fail the check. See section 7.4.2 of Royston & Sauerbrei (2008) for further details.

6. If more than one interaction is detected, apply a forward stepwise procedure to extend the main-effects model.

There is one main difference between this algorithm, MFPIgen, and MFPI (Royston & Sauerbrei 2004, 2009). In MFPI, the confounder model x is selected independently of z1 and z2, whereas in MFPIgen, a joint model is selected. The reason for the difference is that MFPI is principally intended for use with data from a randomized trial in which the effect of the treatment covariate z1 is by design independent of other covariate effects. Therefore, adjustment by x is less important. In observational studies, however, it may be necessary fully to adjust the effects of z1 and z2 for confounders before investigating their interaction.

Since MFPIgen addresses dozens of potential interactions, multiple testing is an issue. Results must be checked in detail and interpreted cautiously as hypothesis-generating only.

Examples

. mfpigen, alpha(0.2): logit y x1 x2 x3 x4 x5

. mfpigen: stcox x1 x2 x3 x4 x5, stratify(group)

. mfpigen, select(0.05) dfdefault(2) linadj(x1 x4 x5) mfpadj(x6 x7)fplot(%33 67) se: logit y x2 x3

. mfpigen, select(0.05) fplot(0 1) dfdefault(2) alpha(1): regress mpgprice headroom trunk weight length turn displacement foreigngear_ratio

AuthorPatrick Royston MRC Clinical Trials Unit London, UK pr@ctu.mrc.ac.uk

ReferencesRoyston, P., and W. Sauerbrei. 2008. Multivariable model-building. A pragmatic approach based on fractional polynomials for modelling continuous variables, pp. 172-181. Chichester, John Wiley and Sons.

Also seeManual:

[R] fracpoly,[R] mfpOnline:

mfp,fracpoly,mfpi(if installed)