help qhapipf-------------------------------------------------------------------------------

Title

Analysis of Quantitative traits using regression and log-linear modelling> when phase is unknown

Syntax

qhapipfvarlist[if] [using] [,options]

optionsDescription ------------------------------------------------------------------------- Mainqt(varname) specifies the dependent variable.ipf(string) specifies the log-linear model for haplotype frequencies.regress(string) specifies the regression model for the quantitative trait.startspecifies that the starting posterior weights of the EM algorithm are chosen at random.displayspecifies whether to output parameter estimates.knownspecifies that phase is known.phase(varname) specifies a variable that identifies whether phase is known for a subset of subjects.acc(real) specifies the convergence threshold of the change of the full log-likelihood.ipfacc(real) specifies the convergence theshold of the change in the log-likelihood of the log-linear model.nologspecifies that the likelihood output is supressed.model(#) specifies a label for the log-linear model being fitted.lrtest(numlist) performs a likelihood ratio test between the two models saved by themodel()option.convars(string) specifies a list of variables in the constraints file.confile(string) specifies the name of the constraints file.mvspecifies that missing data will be imputed.mvdelspecifies that subjects with missing data will be deleted.hap(string) specifies the haplotype of interest.menuspecifies that the command is run through a window interface. -------------------------------------------------------------------------

DescriptionThis command models the relationship between a normally distributed continuous variable in a population-based random sample and individuals' haplotype. This command uses an EM algorithm to resolve haplotype phase. Covariates are constructed from the haplotype and used in a regression model. Additionally the EM algorithm also handles missing typings assuming MAR.

There are two distinct models the log-linear model for haplotype frequencies. Further details of this procedure are found in the stata command

hapipf. Haplotype frequencies are estimated under the assumption of Hardy-Weinberg Equilibrium.The regression model relates the haplotypes to the quantitative trait. This model is specified in

regress()with the dependent variable specified by theqt()option.The regresssion model takes a syntax to specify the dummy variables for the regression model. The syntax can specify within-loci, between-loci and between-chromosome effects.

Latest VersionThe latest version is always kept on the SSC website. To install the latest version click on the following link

ssc install qhapipf, replace.

Options

ipf(string) specifies the log-linear model for the haplotype frequency model. It requires special syntax of the forml1*l2+l3.l1*l2allows all the interactions between the first two loci and locus 3 is independent of them. This syntax is used in most books on Log-linear modelling, "-" terms and brackets are not allowed.

regress(string) specifies the regression model. The program then creates "dummy" variables for all the effects. A fuller description of this option is given in the examples.

startspecifies that the starting posterior weights of the EM algorithm are chosen at random.

displayspecifies whether to output parameter estimates.

knownspecifies that phase is known.

phase(varname)specifies a variable that identifies whether phase is known for a subset of subjects. The variable must contain 1 where phase is known and 0 where phase is unknown.

acc(real)specifies the convergence threshold of the change of the full log-likelihood.

ipfacc(real)specifies the convergence theshold of the change in the log-likelihood of the log-linear model.

model(integer)specifies a label for the log-linear model being fitted. This label is used in thelrtest()option.

lrtest(#,#)performs a likelihood ratio test between the two models saved by themodel()option.

convars(string)specifies a list of variables in the constraints file.

confile(string)specifies the name of the constraints file.

mvspecifies that the algorithm should replace missing locus data (".") with a copy of each of the possible alleles at this locus. This is performed at the same stage as the handling of the missing phase when the dataset is expanded into all possible observations. If this option is not specified but some of the alleles do contain missing data the algorithm sees the symbol "." as another allele.

hap(string)specifies the haplotype of interest. The dummy variables in the regression are all related to this haplotype. If the user does not slect a particular haplotype then one is randomly chosen.

mvdelspecifies that all subjects with missing alleles are deleted.

menuspecifies that the command is run through a window interface.

qt(varname)specifies the dependent variable in the regression model.

nologspecifies that the likelihood output is supressed.

Examples of Singlepoint AnalysesTo execute the menu interface version of this command type

. qhapipf,menu

For the examples I shall assume there are three loci a, b and c . The pairs of alleles are contained in the 6 variables a1, a2, b1, b2, c1 and c2. Let the quantitative trait variable be y.

All the models described here all assume that the saturated model is fitted for the haplotype frequencies. For a single locus

athis saturated model is specified by the optionipf(l1). Given this the regression models are specified in theregress()option and the more common models are described below. All the regression models assume that there are two alleles per locus, multiple alleles are recoded by the algorithm in terms of an allele of interest and all the rest are the reference group.The one parameter constant model is specified by

reg(1). To add an additional parameter that is the additive effect of the allele of interest the model is specified by the optionreg([l1+l1]), wherel1represents the first locus in thevarlist. This is the one-locus single-point additive model (one-locus SAM).The terms between the [] brackets represent the within locus model, in the SAM the two chromosomes are independent but have the same parameter for the allele of interest effect. If the allelic effect depended on the chromosome then there would be two parameters and this is specified by the optionreg([l1a+l1b]), this is the effect of parental imprinting is not additive. Additionally the within-locus between-chromosome interaction can be included by replacing the+symbol with*. This parameter is usually called the dominance parameter. The two models becomereg([l1*l1])andreg([l1a*l1b]),respectively.The commands to fit these models are given below.

. qhapipf a1 a2, ipf(l1) reg(1) qt(y) . qhapipf a1 a2, ipf(l1) reg([l1+l1]) qt(y) . qhapipf a1 a2, ipf(l1) reg([l1a+l1b]) qt(y) . qhapipf a1 a2, ipf(l1) reg([l1*l1]) qt(y) . qhapipf a1 a2, ipf(l1) reg([l1a*l1b]) qt(y)

To test whether locus a is associated with the quantitative trait compare the two regression models

1and[l1+l1]. qhapipf a1 a2, ipf(l1) reg([l1+l1]) model(0) qt(y) . qhapipf a1 a2, ipf(l1) reg(1) model(1) lrtest(0,1) qt(y)

Examples of Multipoint AnalysesWhen modelling more than one locus there are additionally between-loci interaction terms. The within-loci interactions are specified within the [] brackets and the between-loci interactions are specified between the [] brackets. The two-locus SAM now becomes the model

[l1+l1]+[l2+l2], where the two loci are independent specified by the ``+'' symbol between the two sets of brackets. An extension of this model would allow one between-loci interaction (or ``haplotype'' effect), this is the two-locus multipoint additive model (two-locus MAM), this model is specified by the optionreg([l1+l1]x[l2+l2]). Note that thexsymbol purely says that there is a between loci interaction and that the "haplotype" effect is additive. This would be a 4 parameter regression model: the constant term, the first locus additive effect, the second locus additive effect and an additive haplotype effect. There is one other between chromosome "haplotype" effect which is when the "haplotype" can be formed between chromosomes. This model would be specified by the optionreg([l1+l1]*[l2+l2])and now the "haplotype" effect would not be additive.The saturated model that ignores parental imprinting is specified by the option

reg([l1*l1]*[l2*l2]). This model contains between-chromosome interactions. Between-chromosome interactions can be further divided into within-loci between-chromosome interactions (dominance parameters) and all between-loci between-chromsome interactions. The full saturated model including parental imprinting is specified by the optionreg([l1a*l1b]*[l2a*l2b]).The commands to fit these models are given below

2-point SAM . qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]+[l2+l2]) qt(y) 2-point MAM . qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]x[l2+l2]) qt(y) 2-point MAM (non-additive haplotype effect) . qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]*[l2+l2]) qt(y) 2-point saturated model . qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1*l1]*[l2*l2]) qt(y) 2-point saturated model with parental imprinting . qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1a*l1b]*[l2a*l2b]) qt(y)

The algorithm calculates the haplotype frequencies internally and the log-linear model option

ipf()specifies this model. Generally it is taken to be the saturated model. It may be advantageous to use an intermediate model to reduce the number of parameters in the full joint likelihood. This can also be tested using this command using the likelihood ratio test.

AuthorAdrian Mander, MRC Human Nutrition Research, Cambridge, UK.

Email adrian.mander@mrc-hnr.cam.ac.uk

Also seeOn-line: help for hapipf (if installed), ipf (if installed).