{smcl}
{* 21August2006}{...}
{cmd:help qhapipf}
{hline}

{title:Title}

    {hi: Analysis of Quantitative traits using regression and log-linear modelling when phase is unknown}

{title:Syntax}

{p 8 17 2}
{cmdab:qhapipf} {varlist} [if] [using]
[{cmd:,} {it:options}]

{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Main}
{synopt:{opt qt:}({help varname})}specifies the dependent variable. {p_end}
{synopt:{opt ipf:}({it:string})}specifies the log-linear model for haplotype frequencies. {p_end}
{synopt:{opt reg:ress}({it:string})}specifies the regression model for the quantitative trait. {p_end}
{synopt:{opt start}}specifies that the starting posterior weights of the EM algorithm are 
chosen at random. {p_end}
{synopt:{opt dis:play}}specifies whether to output parameter estimates. {p_end}
{synopt:{opt known}}specifies that phase is known. {p_end}
{synopt:{opt phase}({help varname})}specifies a variable that identifies whether phase is known for a subset of subjects. {p_end}
{synopt:{opt acc}({it:real})}specifies the convergence threshold of the change of the full log-likelihood. {p_end}
{synopt:{opt ipfacc}({it:real})}specifies the convergence theshold of the change in the log-likelihood of the log-linear model. {p_end}
{synopt:{opt nolog}}specifies that the likelihood output is supressed. {p_end}
{synopt:{opt model}({it:#})}specifies a label for the log-linear model being fitted. {p_end}
{synopt:{opt lrtest}({it:numlist})}performs a likelihood ratio test between the two models saved by the {hi: model()} option. {p_end}
{synopt:{opt convars}({it:string})}specifies a list of variables in the constraints file. {p_end}
{synopt:{opt confile}({it:string})}specifies the name of the constraints file. {p_end}
{synopt:{opt mv}}specifies that missing data will be imputed. {p_end}
{synopt:{opt mvdel}}specifies that subjects with missing data will be deleted. {p_end}
{synopt:{opt hap}({it:string})}specifies the haplotype of interest. {p_end}
{synopt:{opt menu}}specifies that the command is run through a window interface. {p_end}
{synoptline}
{p2colreset}{...}


{title:Description}

{p 0 0}
This command models the relationship between a normally distributed continuous
variable in a population-based random sample and individuals' haplotype.
This command uses an EM algorithm to resolve haplotype phase. Covariates 
are constructed from the haplotype and used in a regression model. Additionally
the EM algorithm also handles missing typings assuming MAR.

{p 0 0}
There are two distinct models the log-linear model for haplotype
frequencies. Further details of this procedure are found in the
stata command {cmd:hapipf}. Haplotype frequencies are estimated under
the assumption of Hardy-Weinberg Equilibrium.

{p 0 0}
The regression model relates the haplotypes to the quantitative trait.
This model is specified in {cmd:regress()} with the dependent variable specified
by the {cmd:qt()} option. 

{p 0 0}
The regresssion model takes a syntax to specify the dummy variables for the regression
model. The syntax can specify within-loci, between-loci and between-chromosome effects.

{title:Latest Version}

{p 0 0}
The latest version is always kept on the SSC website. To install the latest version click
on the following link 

{stata ssc install qhapipf, replace}.

{title:Options}

{p 0 0}
{opt ipf:}({it:string}) specifies the log-linear model for the haplotype frequency model.
 It requires special syntax of
the form {hi:l1*l2+l3}. {hi:l1*l2} allows all the interactions
between the first two loci and locus 3 is independent of them.
This syntax is used in most books on Log-linear modelling, "-" terms and brackets are not allowed.

{p 0 0}
{opt reg:ress}({it:string}) specifies the regression model. 
The program then creates "dummy"
variables for all the effects. A fuller description of this option is given in the
examples.

{p 0 0}
{cmdab:start} specifies that the starting posterior weights of the EM algorithm are 
chosen at random.

{cmdab:dis:play} specifies whether to output parameter estimates.

{cmdab:known} specifies that phase is known.

{p 0 0}
{cmdab:phase}{cmd:(}{it:varname}{cmd:)} specifies a variable that identifies whether phase is known for a subset of subjects. The
variable must contain 1 where phase is known and 0 where phase is unknown.

{p 0 0}
{cmdab:acc}{cmd:(}{it:real}{cmd:)} specifies the convergence threshold of the change of the full log-likelihood.

{p 0 0}
{cmdab:ipfacc}{cmd:(}{it:real}{cmd:)} specifies the convergence theshold of the change in the log-likelihood of the log-linear model.

{p 0 0}
{cmdab:model}{cmd:(}{it:integer}{cmd:)} specifies a label for the log-linear model being fitted. This
label is used in the {hi: lrtest()} option.

{p 0 0}
{cmdab:lrtest}{cmd:(}{it:#,#}{cmd:)} performs a likelihood ratio test between the two models saved by the {hi: model()} option.

{cmdab:convars}{cmd:(}{it:string}{cmd:)} specifies a list of variables in the constraints file.

{cmdab:confile}{cmd:(}{it:string}{cmd:)} specifies the name of the constraints file.

{p 0 0}
{cmdab:mv} specifies that the algorithm should replace missing locus data (".") with a copy of each 
of the possible alleles at this locus. This is performed at the same stage as the handling of the 
missing phase when the dataset is expanded into all possible observations. If this option is not 
specified but some of the alleles do contain missing data the algorithm sees the symbol "." as 
another allele.

{p 0 0}
{cmdab:hap}{cmd:(}{it:string}{cmd:)} specifies the haplotype of interest. The dummy variables in the
  regression are all related to this haplotype. If the user does not slect
  a particular haplotype then one is randomly chosen. 

{p 0 0}
{cmdab:mvdel} specifies that all subjects with missing alleles are deleted.

{cmdab:menu} specifies that the command is run through a window interface.

{cmdab:qt}{cmd:(}{it:varname}{cmd:)} specifies the dependent variable in the regression model.

{opt nolog} specifies that the likelihood output is supressed. 

{p}

{title:Examples of Singlepoint Analyses}

To execute the menu interface version of this command type

{inp:. qhapipf,menu}

{p 0 0}
For the examples I shall assume there are three loci a, b and c . 
The pairs of alleles are contained in the 6 variables a1, a2, b1, b2, c1 and c2. 
Let the quantitative trait variable be y. 

{p 0 0}
All the models described here all assume that the saturated model is fitted for the haplotype frequencies. For a single locus {hi:a} this saturated model is specified
by the option {hi:ipf(l1)}. Given this the regression models are specified in the
{hi:regress()} option and the more common models are described below. All the regression
models assume that there are two alleles per locus, multiple alleles are recoded by
the algorithm in terms of an allele of interest and all the rest are the reference group.

{p 0 0}
The one parameter constant model is specified by {hi: reg(1)}.
To add an additional parameter that is the additive effect of the allele of interest the model
is specified by the option {hi: reg([l1+l1])}, where {hi: l1} represents the first locus in the {hi: varlist}.
This is the one-locus single-point additive model (one-locus SAM).The terms between the [] 
brackets represent the within locus model, in the SAM the two chromosomes are independent 
but have the same parameter for the allele of interest effect. If the allelic effect 
depended on the chromosome then there would be two parameters and this is specified by 
the option {hi: reg([l1a+l1b])}, this is the effect of parental imprinting is not additive. 
Additionally the within-locus between-chromosome interaction can be included by
replacing the {hi:+} symbol with {hi:*}. This parameter is usually called the dominance parameter.
The two models become {hi: reg([l1*l1])} and {hi: reg([l1a*l1b])},respectively.

{p 0 0}
The commands to fit these models are given below.

{inp:. qhapipf a1 a2, ipf(l1) reg(1) qt(y)}
{inp:. qhapipf a1 a2, ipf(l1) reg([l1+l1]) qt(y)}
{inp:. qhapipf a1 a2, ipf(l1) reg([l1a+l1b]) qt(y)}
{inp:. qhapipf a1 a2, ipf(l1) reg([l1*l1]) qt(y)}
{inp:. qhapipf a1 a2, ipf(l1) reg([l1a*l1b]) qt(y)}

To test whether locus a is associated with the quantitative trait compare the
two regression models {hi:1} and {hi:[l1+l1]}

{inp:. qhapipf a1 a2, ipf(l1) reg([l1+l1]) model(0) qt(y)}
{inp:. qhapipf a1 a2, ipf(l1) reg(1) model(1) lrtest(0,1) qt(y)}

{title:Examples of Multipoint Analyses}

{p 0 0}
When modelling more than one locus there are additionally between-loci interaction terms.
The within-loci interactions are specified within the [] brackets and the between-loci 
interactions are specified between the [] brackets.
The two-locus SAM now becomes the model {hi: [l1+l1]+[l2+l2]}, where the two loci are independent
specified by the ``+'' symbol between the two sets of brackets. 
An extension of this model would allow one between-loci interaction (or ``haplotype'' effect),
this is the two-locus multipoint additive model (two-locus MAM), this model is specified by
the option {hi: reg([l1+l1]x[l2+l2])}. Note that the {hi: x} symbol purely says that there is a
between loci interaction and that the "haplotype" effect is additive. This would be a 
4 parameter regression model: the constant term, the first locus additive effect, the second locus
additive effect and an additive haplotype effect. There is one other between chromosome "haplotype" 
effect which is when the "haplotype" can be formed between chromosomes. This model would be
specified by the option {hi: reg([l1+l1]*[l2+l2])} and now the "haplotype" effect would not be
additive.

{p 0 0}
The saturated model that ignores parental
imprinting is specified by the option {hi: reg([l1*l1]*[l2*l2])}. This model contains
between-chromosome interactions. Between-chromosome interactions can be further divided into
within-loci between-chromosome interactions (dominance parameters) and all between-loci
between-chromsome interactions. The full saturated model including parental
imprinting is specified by the option {hi: reg([l1a*l1b]*[l2a*l2b])}.

The commands to fit these models are given below

2-point SAM
{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]+[l2+l2]) qt(y)}
2-point MAM
{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]x[l2+l2]) qt(y)}
2-point MAM (non-additive haplotype effect)
{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]*[l2+l2]) qt(y)}
2-point saturated model
{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1*l1]*[l2*l2]) qt(y)}
2-point saturated model with parental imprinting
{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1a*l1b]*[l2a*l2b]) qt(y)}

{p 0 0}
The algorithm calculates the haplotype frequencies internally and the log-linear
model option {hi: ipf()} specifies this model. Generally it is taken to be
the saturated model. It may be advantageous to use an intermediate model to
reduce the number of parameters in the full joint likelihood. This can also
be tested using this command using the likelihood ratio test.

{title:Author}

{p}
Adrian Mander, MRC Human Nutrition Research, Cambridge, UK.

Email {browse "mailto:adrian.mander@mrc-hnr.cam.ac.uk":adrian.mander@mrc-hnr.cam.ac.uk}

{title:Also see}

On-line: help for {help hapipf} (if installed), {help ipf} (if installed).