snpfetch-- Retreive information from dbSNP for a set of SNP id's. The following information is retrieved:
Gene Name Gene ID Chromosome Base Pair Position Alleles Heterozygosity Locus type Orientation of Strand Species Validated SNP Minimum Probability Maximum Probability
snpfetch rs_list , bundle(integer) assembly(string)
snpfetch uses the efetch utilities provided by the NCBI (National Center for Biotechnology Information) to retrieve information for a given list of SNPs (Single Nucleotide Polymorphims).
You should use snpfetch for data management/retrieval purposes primarily. It creates several variables with set names and fills them with data. It will overwrite pre-existing data in these variables.
(example: Gene_ID, Chr_ID, Alleles).
Ideally, you'll use this plugin to annotate a dataset you have by running it on only the set of snps from the set and merging or by taking care with the names of the variables in your set.
snpfetch can handle duplicates.
Follow the command with the name of a variable specifying a list of SNPs. Your list can be numeric, ignoring the rs prefix, or a string with or without the rs prefix.
snpfetch will work with both and download a large amount of data for each set.
The option bundle can be used to adjust how many records Stata will attempt to download at once. snpfetch works by downloading a set of records for your list of snps, parsing each set out, add the data to your dataset, and move onto the next set. This was necessary due to string length limitations within Stata.
The option assembly can be used to specify different reference assemblies to retrieve data from. If not specified, the assembly is set to "GRCh37.p5" (This is the latest human reference assemblies).
Important! If you are retrieving data for organisms other than humans, you must specify the name of the assembly.
For more information on assemblies, see:
GRCh37.p5 - http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/
HuRef - http://huref.jcvi.org/