Extracting tables from Wikipedia using bookmarklet

Continuing the theme of bookmarklets from my previous post, I decided to create another bookmarklet to extract Wikipedia tables into CSV files. A similar tool can be found here.

Credits to this Stack Overflow answer for the code to download JavaScript arrays as CSV files. Link to this project’s Github repository here.

Why this bookmarklet?

In some of my Data Science projects, I often require data from Wikipedia pages. Instead of manually copying, pasting and formatting the tables one by one, I thought it will be great to have a bookmarklet that helps to download the data that I need into a CSV file, that can be read in by libraries such as pandas. It’s also a great way to find out how to use a bookmarklet to download content as a file.

Features

How to get the bookmarklet?

Using the bookmarklet

Example 1

Example 2

Gotchas/To-do

For this section, see the original table from this Wikipedia page

… And its resulting CSV file:

Multi-line content

Multi-row/multi-column tables

Conclusion

This bookmarklet is convenient to extract simple tables into CSV files, with some elements of automatic naming, so that one does not have to manually rename the CSV files into interpretable names. For more complex tables, more work will need to be done.

Let me know in the Disqus section below if you’ve used this bookmarklet, and what you think of it!

· bookmarklet, automation, wikipedia, side-project
comments powered by Disqus