------------------------------------------------------------------------------- help forswblock-------------------------------------------------------------------------------

Stepwise hapipf routine to identify the parsimonious model to describe the Hapl> otype block pattern

swblock[varlist] [,mv(string)pvalue(#)stopnoiseacc(#)ipfacc(#)storereplace]

DescriptionThis command systematically fits a series of

hapipflog-linear models that models the LD structure within a set of loci.The log-linear model is fitted using iterative proportional fitting which is available using {hi ssc} and is called

ipf(version 1.36 or later). Additionally, the user will also have to installhapipf(version 1.44 or later). This algorithm can handle very large contingency tables and converges to maximum likelihood estimates even when the likelihood is badly behaved.If you are connected to the Web you can install the latest version by clicking > here ssc install hapipf

The

varlistconsists of paired variables representing the alleles at each locus. If phase is known then the paired variables are in fact the genotypes. When phase is unknown the algorithm assumes Hardy Weinberg Equilibrium so that models are based on chromosomal data and not genotypic data.This algorithm can handle missing alleles at the loci by using the

mv()option.

Options

mv(string)specifies how the missing data will be handled, the default ismv. If thestringismv, i.emv(mv), then the missing data will be assumed to be missing at random (MAR) and the EM algorithm expands the unknown phase to consider all possible values for the missing value. The main assumption of this algorithm is that the missing data can only take the alleles observed for a given loci. Relaxing this assumption would not make any difference because alleles that are never observed usually give expected frequencies that are close to 0, however, it would increase the number of cells and hence reduce power. The only otherstringthis option takes ismvdel, i.emv(mvdel)here the missing data are assumed to be missing completely at random (MCAR) and subjects are deleted when they contain any missing data at any loci. Under this assumption complete subjects are representative of the whole dataset and hence deletion will give unbiased estimates.

stopspecifies that the search should stop when the inclusion of minimum high order LD terms do not significantly change the log likelihood. For example if none of the third order LD terms included in the model were significant then the algorithm will not fit the fourth order terms.

acc(#)specifies the tolerance ofhapipfconvergence. The default is 0.0001.

ipfacc(#)specifies the tolerance ofhapipfconvergence. The default is 1.000e- > 07.

pvalue(#)specifies the significance level for inclusion to the model; terms wi > th p>pvalue() are not eligible for inclusion.

noisespecifies that the test statistic values are included in the output

storespecifies that all the model output is saved to a file calledfresults.dt> a

replacespecifies that the oldfresults.dtacan be overwritten.

ExamplesTake a dataset with 7 loci, the pairs of alleles at locus i are the variables li_1 and li_2.

.swblock l1_1-l7_2, mv(mvdel)

mvdelwas specified as the missing data mechanism and all subjects with any mis > sing data are deleted.The following command changes the inclusion significance level to 1%

.swblock l1_1-l7_2, mv(mvdel) pvalue(0.01)

To store the results in a stata dataset do

.swblock l1_1-l7_2, mv(mvdel) pvalue(0.01) store replace

AuthorAdrian Mander, Glaxo Smithkline, Harlow, UK. Email adrian.p.mander@gsk.com

Also seeOn-line: Help for hapipf (MUST be installed), ipf (MUST be installed) hapblock (if installed).