Friday, June 13, 2008

From PDFs to Google Earth


I've added a service to bioGUID that takes a PDF and attempts to extract latitude and longitude data from the PDF, returning those co-ordinates in either a Google Earth KML file, or in JSON format. This is one of a bunch of services that I'm adding to bioGUID to support some of the data mining that I'm doing.

To see what it can do, try this URL to get a list of localities in the paper Description of eight new species of shrub frogs (Ranidae: Rhacophorinae: Philautus) from Sri Lanka.

Then try this one to get the KML file, and open it in Google Earth. The service uses a bunch of regular expressions to try and extract latitude and longitude pairs from the text (needless to say, there are nearly as many different ways to write a latitude and longitude as there are authors).

The ultimate aim is to assemble a bunch of Open Access PDFs (say, from Zootaxa), run them through this service, then display the result on Google Earth. Think of it as a geography of taxonomy.

Oh, and the irony of me criticising GBIF for displaying poor quality data, then adding to this by providing a service to extract yet more co-ordinates of possibly doubtful validity has not entirely escaped me...

No comments: