hackdiary

Sha1ing, smushing and aggregating FOAF

To normalise and aggregate FOAF metadata related to photographs, I needed some new code to:

So I wrote foaftool, a Java class that uses Jena. The tarball also contains a couple of servlets that can be used to transform existing content on the web.

The first servlet will transform foaf:mbox triples in FOAF data into appropriately-encoded foaf:mbox_sha1sum triples. This makes Edd's FOAF file look like this. Using an extra querystring parameter, it optionally converts foaf:depicts triples to foaf:depiction. foaf:depicts isn't actually in the official FOAF schema at the time of writing, although it is in informal use in many places as it sometimes makes for more elegant modeling. Normalising to foaf:depictions makes working with large amounts of FOAF data simpler.

Writing the smushing code was an entertaining diversion. Smushing is important when merging multiple RDF sources. Say you have two sources, edd1.rdf and edd2.rdf, showing where to find photos of Edd. When merged, the graph structure looks like this:

edd1.rdf
edd1.rdf

This is because without smushing, the anonymous nodes that both have Edd's email address are not equated. The smushed version (smushed on mbox_sha1sum using a foaftool servlet) looks like this:

edd1.rdf

With the data in normalised and merged form, I want to extract just the triples of the form "X foaf:depiction [picture uri]" and the related foaf:name and foaf:mbox_sha1sum triples. With the foaftool code, I can now merge and extract depictions from any number of RDF sources.

Comments and bugfixes are very welcome; the code has only been tested as far as the junit tests included in the tarball.

foaf java rdf Posted by Matt Biddulph at February 3, 2003 12:02 AM

→ Raw Blog: Identifying things in FOAF