genefetch -- Retreive information from Pub Med for a set of Entrez Gene id's or Gene Names. The following information is retrieved:
Gene Name Gene ID Full Name Species Location From (start bp) To (end bp) Chromosome
genefetch Gene_Names_OR_IDs , bundle(integer) organism(string)
genefetch uses the efetch utilities provided by the NCBI (National Center for Biotechnology Information) to retrieve information for a given list of Gene ID's or Gene Names (Single Nucleotide Polymorphims).
You should use genefetch for data management/retrieval purposes primarily. It creates several variables with set names and fills them with data. It will overwrite pre-existing data in these variables (example: Gene_ID, Chr_ID).
Ideally, you'll use this plugin to annotate a dataset you have by running it on only the set of snps from the set and merging or by taking care with the names of the variables in your set.
genefetch can handle duplicates.
Method 1: Providing Gene Names
There are two ways to use genefetch. One is to provide a list of gene names. If you do this, genefetch will retrieve gene id's by performing searches on Pubmed's gene database and retrieving the top result.
(!Important!) This method has the potential to retrieve incorrect information. However, as long as you use the HGNC approved symbol for your gene, genefetch should always retrieve the correct record.
(!Important!) If you do provide a list of Gene Names and are working with a species other than homo sapiens, you must use the organism option, specifying the taxonomic name (e.g. Mus musculus, Rattus norvegicus, Nomascus leucogenys). If you do not specify an organism, it is set to homo sapien.
I would recommend using a variable named gene containing a list of Gene names when using this method. This variable will remain intact when Gene information is retrieved and you can compare the Gene and Gene_Name variables to ensure that the correct record has been retrieved.
Method 2: Providing a list of Gene IDs
The second way to use genefetch is to provide it with a list of Entrez Gene ID's. This method will always pull in the correct Gene information provided the id exists.
Follow the command with the name of a variable specifying a list of SNPs. Your list can be numeric, ignoring the rs prefix, or a string with or without the rs prefix. genefetch will work with both and download a large amount of data for each set.
The option bundle can be used to adjust how many records Stata will attempt to download at once. genefetch works by downloading a set of records for your list of snps, parsing each set out, add the data to your dataset, and move onto the next set. This was necessary due to string length limitations within Stata.