R: readHTMLTable

Last time I have found another useful function that could save your time when playing with R. Let’s say we have some tabular data on HTML page, like here. With readHTMLTable function from XML package reading it into R cannot be simpler:

> install.packages("XML") 
> require(XML)
>
> d <- readHTMLTable("http://en.wikipedia.org/wiki/Transistor_count")
> class(d)
[1] "list"

Each table was extracted and converted into data.frame. Let’s have a look on the first one:

> table_a <- d[[1]]
> head(table_a)
            Processor Transistor count Date of introduction   Manufacturer Process   Area
1          Intel 4004            2,300                 1971          Intel   10 µm 12 mm²
2          Intel 8008           3,500                 1972          Intel   10 µm 14 mm²
3 MOS Technology 6502         3,510[1]                 1975 MOS Technology    8 μm 21 mm²
4       Motorola 6800            4,100                 1974       Motorola    6 μm 16 mm²
5          Intel 8080            4,500                 1974          Intel    6 μm 20 mm²
6            RCA 1802            5,000                 1974            RCA    5 μm 27 mm²

The header cells from the table were used as column names.

Other read functions

It’s worth mentioning that other read function (like read.table or read.csv) can access documents hosted on a server — there is no need to download it first. I wish I learnt it sooner.

Leave a Reply