Sha1ing, smushing and aggregating FOAF

February 3rd, 2003  |  Published in foaf, java, rdf  |  1 Comment

To normalise and aggregate FOAF metadata related to photographs, I needed some new code to:

  • convert foaf:mbox entries to privacy-protected foaf:mbox_sha1sum entries.
  • normalise statements of the form “PICTURE depicts PERSON” to “PERSON depiction PICTURE”.
  • smush disparate references to the same person into references to a single definition of that person.
  • extract depiction triples from a model and copy just the bare minimum of information related to those depictions

So I wrote foaftool, a Java class that uses Jena. The tarball also contains a couple of servlets that can be used to transform existing content on the web.


The first servlet will transform foaf:mbox triples in FOAF data into appropriately-encoded foaf:mbox_sha1sum triples. This makes Edd’s FOAF file look like this. Using an extra querystring parameter, it optionally converts foaf:depicts triples to foaf:depiction. foaf:depicts isn’t actually in the official FOAF schema at the time of writing, although it is in informal use in many places as it sometimes makes for more elegant modeling. Normalising to foaf:depictions makes working with large amounts of FOAF data simpler.

Writing the smushing code was an entertaining diversion. Smushing is important when merging multiple RDF sources. Say you have two sources, edd1.rdf and edd2.rdf, showing where to find photos of Edd. When merged, the graph structure looks like this:

edd1.rdf
edd1.rdf

This is because without smushing, the anonymous nodes that both have Edd’s email address are not equated. The smushed version (smushed on mbox_sha1sum using a foaftool servlet) looks like this:

edd1.rdf

With the data in normalised and merged form, I want to extract just the triples of the form “X foaf:depiction [picture uri]” and the related foaf:name and foaf:mbox_sha1sum triples. With the foaftool code, I can now merge and extract depictions from any number of RDF sources.

Comments and bugfixes are very welcome; the code has only been tested as far as the junit tests included in the tarball.

Responses

  1. Raw Blog says:

    July 11th, 2003 at 12:56 am (#)

    Identifying things in FOAF

    danbri : Identifying things in FOAF Longish piece starting with “identity management” Short version: In FOAF, we use URIs to…