{smcl} {hline} {hi:help htmltab2stata}{right: v. 1.0 - January 2019} {hline} {title:Title} {p 4 4}{cmd:htmltab2stata} - Loading html tables into Stata.{p_end} {title:Contents} {p 4}{help htmltab2stata##syntax:Syntax}{p_end} {p 4}{help htmltab2stata##description:Description}{p_end} {p 4}{help htmltab2stata##options:Options}{p_end} {p 4}{help htmltab2stata##Examples:Examples}{p_end} {p 4}{help htmltab2stata##about:About}{p_end} {marker syntax}{title:Syntax} {p 4 13}The general syntax is:{p_end} {p 6 13}{cmd: htmltab2stata , url({it:url}) [tablenumber({it:integer}) firstrow href]}{p_end} {marker options}{title:Options} {p 4 8}{cmd:url({it:url})} the url of the html website to be processed. The url has to be a downloadable html website. {it:url} can be a webadress or a local html file. {p_end} {p 4 8}{cmd:tablenumber({it:integer})} number of table within the html document. Default is 1, i.e. the first table is processed.{p_end} {p 4 8}{cmd:firstrow} Use firstrow of table as variable names.{p_end} {p 4 8}{cmd:href} Links enclosed in {cmd:} are added to the content transferred to Stata. {p_end} {marker description}{title:Description} {p 4 4}{cmd:htmltab2stata} parses html code from websites. It detects tables enclosed with the html {cmd:} environment and transforms the table into a Stata dataset. To do so, {cmd:htmltab2stata} parses the html code and uses {cmd:} as row identifiers and code enclosed in {cmd:
} as columns. It only transfers content which is not enclosed in {cmd:< >} to Stata, unless option {cmd:href} is used for links. Empty cells remain empty in the Stata dataset.{p_end} {marker Examples}{title:Examples} {p 4 8}For all examples, the following table in html code saved in {it:table.html} is processed: {col 12}{com}Content {col 12}Table 1 {col 12} {col 12} {col 12} {col 12} {col 12} {col 12}
CountryPopulationGDP
Country A10100
Country B205
Country C50010
{col 12}More Content {col 12}Table 2 {col 12} {col 12} {col 12} {col 12} {col 12} {col 12}
FirstnameSurnameWebpage
AdamSmithnone
AllanRichardswebpage
RichardJohnson
{reset} {p 4 8}Loading Table into Stata as a dataset:{p_end} {col 12}{stata htmltab2stata , url(table.html)} {p 4 8}Returns:{p_end} {col 12}{com}.list {col 12} {col 12} +---------------------------------+ {col 12} | myvar1 myvar2 myvar3 | {col 12} |---------------------------------| {col 12} 1. | Country Population GDP | {col 12} 2. | Country A 10 100 | {col 12} 3. | Country B 20 5 | {col 12} 4. | Country C 500 10 | {col 12} +---------------------------------+ {reset} {p 4 8}To use the first column as variable names the option {cmd:firstrow} is required.{p_end} {col 12}{stata htmltab2stata , url(table.html) firstrow} {col 12}{com}.list {col 12} {col 12} +----------------------------+ {col 12} | Country Popula~n GDP | {col 12} |----------------------------| {col 12} 1. | Country A 10 100 | {col 12} 2. | Country B 20 5 | {col 12} 3. | Country C 500 10 | {col 12} +----------------------------+ {reset} {p 4 8}To process Table 2, use the first row as variable names and add the url of the hyperlink as text to the contents:{p_end} {col 12}{stata htmltab2stata , url(table.html) firstrow tablenumber(2) href} {col 12}{com}. list {col 12} {col 12} +----------------------------------------------+ {col 12} | Firstn~e Surname Webpage | {col 12} |----------------------------------------------| {col 12} 1. | Adam Smith none | {col 12} 2. | Allan Richards www.google.com webpage | {col 12} 3. | Richard Johnson | {col 12} +----------------------------------------------+ {reset} {marker about}{title:Author} {p 4}Jan Ditzen (Heriot-Watt University){p_end} {p 4}Email: {browse "mailto:j.ditzen@hw.ac.uk":j.ditzen@hw.ac.uk}{p_end} {p 4}Web: {browse "www.jan.ditzen.net":www.jan.ditzen.net}{p_end} {marker changelog}{title:Changelog} {p 4 8}This version: 1.0{p_end}