Haplotype Block Edge Identification using hapipf
hapblock [varlist] [using] [, mv mvdel hlen(numlist) start(#) replace block(filename) ]
Description This command systematically fits a series of hapipf log-linear models that attempts to find the edge of areas containing high LD within a set of loci.
The log-linear model is fitted using iterative proportional fitting which is available using {hi ssc} and is called ipf (version 1.36 or later). Additionally, the user will also have to install hapipf (version 1.44 or later). This algorithm can handle very large contingency tables and converges to maximum likelihood estimates even when the likelihood is badly behaved.
If you are connected to the Web you can install the latest version by clicking here ssc install hapipf. The latest version of hapblock can be installed here ssc install hapblock,replace.
The varlist consists of paired variables representing the alleles at each locus. If phase is known then the paired variables are in fact the genotypes. When phase is unknown the algorithm assumes Hardy Weinberg Equilibrium so that models are based on chromosomal data and not genotypic data.
This algorithm can handle missing alleles at the loci by using the mv or mvdel option.
Options mv specifies that the algorithm should replace missing data (".") with a copy of each of the possible alleles at this locus. This is performed at the same stage as the handling of the missing phase when the dataset is expanded into all possible observations. If this option is not specified but some of the alleles do contain missing data the algorithm sees the symbol "." as another allele.
mvdel specifies that people with missing alleles are deleted.
hlen(numlist) specifies the width of the sliding window of models.
start(#) specifies the starting loci in the varlist. This is useful when the algorithm is taking a long time and hence the command can be rerun from the > loci that the algorithm ended prematurely.
replace specifies that the results file created can be overwritten.
block(filename) specifies that the calculated block sizes and p-values are save > d to a file and is named filename.dta
Examples
Take a dataset with 70 loci, the pairs of alleles at locus i are the variables li_1 and li_2.
.hapblock l1_1-l70_2, hlen(6) s(10) mvdel
This will make the following comparisons l10*l11*l12+l13*l14*l15 vs l10*l11*l12*l13*l14*l15 l11*l12*l13+l14*l15*l16 vs l11*l12*l13*l14*l15*l16 l12*l13*l14+l15*l16*l17 vs l12*l13*l14*l15*l16*l17 l13*l14*l15+l16*l17*l18 vs l13*l14*l15*l16*l17*l18 e.t.c. If you specify the mvdel missing data option then these models might not be on > the same subjects.
Author
Adrian Mander, Cambridge, UK. Email junk.ade@ntlworld.com
Also see
Related commands
HELP FILES Installation status SSC installation links
hapipf (MUST be installed) (ssc install hapipf) ipf (MUST be installed) (the above installs ipf) swblock (if installed) (ssc install swblock) gipf (if installed) (ssc install gipf).