Wednesday, November 28, 2007

Transitive reduction

Quick note to self, having stumbled on the Wikipedia page on transitive reduction. Given a graph like this:

the transitive reduction is:

Note that the original graph has an edge a -> d, but this is absent after the reduction because we can get from a to d via b (or c).


What's the point? Well, it occurs to me that a quick way of harvesting information about existing taxonomies (e.g., if we want to assemble an all embracing classification of life to hep navigate a database) is to make use of the titles of taxonomic papers, e.g., the title Platyprotus phyllosoma, gen. nov., sp. nov., from Enderby Land, Antarctica, an unusual munnopsidid without natatory pereopods (Crustacea: Isopoda: Asellota) gives us:

Crustacea -> Isopoda -> Asellota ->Platyprotus -> phyllosoma

From the paper, and or other sources we can get paths such as Asellota -> Munnopsididae -> Platyprotus and Isopoda -> Munnopsididae. Imagine that we have a set of these paths, and want to assemble a classification (for example, we want to grow the Species 2000 classification, which lacks this isopod). Here's the graph:


This clearly gives us information on the classification of the isopod, but it's not a hierarchy. The transitive reduction, however, is:



It would be fun to explore using this technique to mine taxonomic papers and automate the extraction of classifications, as well as names.

No comments: