I was thinking about scuttering and useful ways to gather information. It occurred to me that as well as following rdfs:seeAlso links as usual, it was worth scanning associated HTML for semantic links (eg FOAF autodiscovery metadata). I decided to use the blogroll at Planet RDF as a testing ground, on the assumption that if anyone was going to embed useful metadata in their HTML, it would be the hackers listed there.
I should note that when Dave puts together the blogroll, he generally insists that the RSS we point to is parseable RDF, and not just tag soup that carries no meaning to a semantic web-aware client. This is well in keeping with the theme of the site and gives us a useful starting point for scuttering.
I hacked up some rough code pretty quickly (cribbing and adapting from some Mark Pilgrim code where necessary). It visits each RSS file in the blogroll with an RDF parser and finds the channel link. It downloads the HTML from there and looks for link tags pointing to rdf/xml. Finally, it outputs a new blogroll augmented with extra rdfs:seeAlso links, and combines all the discovered RDF into a single model.
I ran the code (full log text), and it gathered a big bundle of information about the bloggers and an augmented blogroll.
I discovered that in the 33 weblogs listed:
When I've got some more hacking time, I'll get back to this dataset and do some more analysis. Smushing the blogroll against the gathered data (via the weblog property, an IFP), it'll be possible to build a little visualisation of who knows who on the Planet RDF planet, and gather some extra info to put on the site itself (thumbnail author images, for example). It'd be great to see more people put a link to their FOAF in their blog HTML.
There's a job being advertised at BBC Radio and Music Interactive in London, where I work. It involves Python, XML, CMSes, digital radio and other interesting technologies. You'd like it there.
UPDATE: Applications have now closed.
FURTHER UPDATE: for unknown reasons, this page is (at the time of writing) number one hit on google for the search term work at the bbc (and perhaps some similar terms). For those who come here looking for work, I suggest you look at the BBC Jobs site or perhaps BBC Talent, where budding writers, presenters and DJs are sought.