------------------------------------------------------------------------------- help for

genass-------------------------------------------------------------------------------

Genetic Case-control Association tests

genass[varlist] [ifexp] [inrange],group(casevar)id(study_id)output(filename)[replacetextallhwallelicgenotypicdominantrecessivetrendpwld(statistic)map(marker map)tablegraph]

Description

genassis a wrapper for various genetic analysis routines which test for Hardy-Weinberg Equilibrium, perform allelic and genotypic tests of association (general, recessive and dominantmodels), and testing for trends across genotypes (additive/multiplicative models). It will carry out the tests selected for the given range of SNPs, and accumulate the results into one Stata formatted data set for subsequent review.

DependenciesBy virtue of being a wrapper command

genasshas a number of user-written ado-fies that are required for functioning. These dependencies are gencc, genhw, gtab, labmask, listtex, and pwld. If you do not have these installed thengenasswill not work. See findit for information on how to locate and install user written commands.

IMPORTANT

genassassumes that your genotype data is numerically encoded, with the wild-type allele as 1 and the mutant (rarer) allele as 2. This assumption is erroneos as, in some genes, the mutations (even though they cause disease) are selectively neutral (because fecundity is not affected), and the frequency of the alleles are therefore determined under the Neutral Model of Molecular Evolution (Kimura 1985). If you believe that the wild-type allele is in fact allele 2 and the mutant allele is 1, then the results under the recessive model become the dominant results, and vice-versa.

Options

group(casevar)specifies the variable that defines case-control status. Any observations without a case-control status will cause the program to crash.

id(study_id)specifies the variable that uniquely identifies an observatio within your dataset.

output(filename)specifies the variable that uniquely identifies an observatio within your dataset.

textspecifies that the data is to be written to a tab-delimited text file as well as being saved as a Stata formatted data set.

replacewill over-write the output dataset if it already exists.

hwspecifies that Hardy-Weinberg is to be tested in both cases and controls.

allspecifies that all statistics are to be calculated (the default option).

allelicspecifies that allelic association tests are to be performed. Note that the confidence intervals that are reported are exact and will differ from those obtained using the gencc command which uses the Cornfield approximation.

genotypicspecifies that a general genotypic association tests are to be performed. This is a Chi-squared test on a 2x3 table.

dominantspecifies that a dominant test of genotypic association tests are to be performed. The Odds-Ratio and 95% CI for the dominant model are also reported.

recessivespecifies that a recessive genotypic association tests are to be performed. The Odds-Ratio and 95% CI for the recessive model are also reported.

trendspecifies that a Trend Test for Proportions is to be performed. This is similar to the Mantel-Haenszel test for trend in Odds-Ratio across ordered groups, and is a valid test of association even if samples are not in Hardy-Weinberg equilibrium (Sassini 1997).

pwld(statistic)specifies that pair-wise linkage disequilibrium is to be calculated. By default D' is calculated, however it is possible to calculate any measure described in Devlin & Risch (1995). For further details of statistics please see pwld. Results are saved as a Stata formatted data set with the same name as specified for saving results, prefixed withpwld_.

map(filename)specifies the tab-delimited ASCII text file that contains the map information. The file should have two columns, the first row should contain a header (this defines the variables that are in the file). The column of marker names should be calledlocus, whilst the column containing the markers position in base-pairs should be calledpos. Each subsequent row should contain the marker name (as referred to in your data set) followed by the position in base-pairs (the two columns should be seperated by a tab).

tablespecifies that a html page consisting of ten tables (one for each group of tests) is to be generated. These tables can then be inserted into web-sites or copied into word-processing software for publication (depends upon Roger Newsons listtex).

graphspecifies graphs of the results are to be plotted. You MUST specify the marker map which should be a tab-delimited ASCII text file where the first column contains the locus names, and the second column contains their position in base-pairs. All graphs are generated as Portable Network Graphics format (.png) and are saved in the sub-directory graphs (which if it does not exist is created).

ResultsThe results data set is saved in the file specified under the

outputoption and is a Stata formatted data set with all specified statistics collated into one data set. All variables are labelled with a description, to see a description of the statistics that have been calculated use the describe command.If you have specified the

graphoption then a number of graphs will have been generated and saved as Protable Network Graphics files (.png), and if you have specified thepwld()option then you will have a file containing the pair-wise linkage disequilibrium statistics.There are three classes of graphs, one that plots the allele frequencies, a second group which plots the odds-ratios, and a third set which plots the -log10(p-values). If you are not familiar with interpreting such graphs then the easiest way to do so is to take your calculator and determine what -log10() of the numbers 0.05, 0.1, 0.01, 0.001 0.0001 and 0.00001 and you will see a pattern emerging.

The graphs have been saved to the sub-directory graphs, which if it did not already exist has been created. The table below details the filenames of the graphs that have been generated and what they are displaying.

File | Description > --------------------+---------------------------------------------------------- ------- allele_frq.png | Allele frequencies in cases and controls > hw_eqm.png | P-values for Hardy-Weinberg equilibrium in cases and cont > rols allele1_or.png | Odds-Ratio & 95% CI for allele 1 > allele2_or.png | Odds-Ratio & 95% CI for allele 1 > genotype_ass.png | Genotypic, dominant and recessive p-values > dominant_or.png | Odds-Ratio & 95% CI for dominant model > recessive_or.png | Odds-Ratio & 95% CI for recessive model > trend.png | P-values for trend test >

RemarksIt should be noted that the p-values that are reported by this program are uncorrected. The exact p-values reported for Hardy-Weinberg equilibrium are calculated using the method proposed by Guo and Thompson (1992). All other exact p-values are calculated using Fisher's (1970) method which mitigates against asymptotic test statistics caused by low cell counts, but does not correct for multiple testing. The issue of correcting for multiple testing can be approached in a number of ways, but is particularly problematic when correcting multiple tests of association with SNPs, which because of their syntenic nature will often demonstrate a degree of correlation.

One of the appealing features of generating a dataset of your results is that you can merge in details of a genetic map (using the

map()option) to aid in the graphing of results. In fact if you specify thegraphoption as well the graphs will be generated for you.However, because of a limitation of 80 characters in local macros, it is NOT possible to autmoatically generate the graphs with the x-axis labelled with the marker names. If you wish to generate such graphs you can used the code within the

genasscommand as a basis, but you will need to explicity speficy the location of each of the marker labels under thexlabel()options.If you are recieving an error message saying no; data in memory would be lost then it means that you have modified your data set prior to running the genass command. To rectify this simply insert a line in your do-file to

save,replaceyour data, and re-run your do-file.

ExamplesThe example below shows how to run your analysis on a range of SNPs, then load your results and list the Hardy-Weinberg Equilibrium statistics for those markers that show significant deviation from Hardy-Weinberg (at the 5% level of significance).

. genass snp*, group(status) id(status) output(gen_res) map(map.txt)graph

. use gen_res

. list hw_* if(hw_controls_p < 0.05 | hw_cases_p < 0.05)You can of course substitute the conditions and variables that are listed to any of your choice. This provides a quick and efficent way of determining which markers show association, deviate from Hardy-Weinberge Equilibrium etc.

The following example includes the

pwld()option and calculates the statistic R^2 (r-squared), the Stata file that contains the pairwise LD statistics that were calculated is then loaded into memory, sorted and listed.

. genass snp*, group(status) id(status) output(gen_res) pwld(R2)map(map.txt) graph

. use pwld_gen_res

. order col row

. sort col row

. list

ReferencesAltman D.G. (1999)

Practical Statistics for Medical ResearchChapman & Hall/CRCArmitage P (1955) Tests for Linear Trends in Proportions and Frequencies.

Biometrics3:375-386Fisher R.A. (1970)

Statistical Methods for Research Workers. 14th EditionOxford University PressGuo S.W., Thomspon E.A. (1992) Performing the exact test of Hardy-Weinberg proportion for multiple alleles.

Biometrics48:361-372Hardy G (1908) Mendelian proportions in a mixed population

Science28:49-50Kimura M (1985)

The Neutral Theory of Molecular EvolutionCambridge University PressSassini P (1997) From Genotypes to Genes: Doubling the Sample Size.

Biometrics53:1253-1261Weinberg W (1908) On the demonstration of heredity in man

Naturkunde inWurttemberg, Stuttgart64:368-382

AuthorNeil Shephard, ARC Epidemiology Unit

The University of Manchester

http://slack.ser.man.ac.uk

Please email nshephard@gmail.com if you encounter problems with this program.

Also seeOnline: help for describe epitab, gencc, genhw, graph bar, graph twoway,