<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hackdiary &#187; java</title>
	<atom:link href="http://www.hackdiary.com/category/java/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.hackdiary.com</link>
	<description></description>
	<lastBuildDate>Mon, 05 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Update: Screenscraping HTML with TagSoup and XPath</title>
		<link>http://www.hackdiary.com/2003/12/28/update-screenscraping-html-with-tagsoup-and-xpath/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=update-screenscraping-html-with-tagsoup-and-xpath</link>
		<comments>http://www.hackdiary.com/2003/12/28/update-screenscraping-html-with-tagsoup-and-xpath/#comments</comments>
		<pubDate>Sun, 28 Dec 2003 20:14:41 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=44</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p><i>UPDATE: <a href="http://blog.oroup.com/2006/11/05/the-joys-of-screenscraping/">Oliver Roup</a> has published updated code that uses the builtin XPath processor in JDK 1.5</i></p>
<p>Some emails and comments on <a href="http://www.hackdiary.com/archives/000029.html">Screenscraping HTML with TagSoup and XPath</a> alerted me to the fact that the example I gave on that page has gone out of sync with the current release of JDOM and no longer works. I&#8217;ve reworked the example using <a href="http://xml.apache.org/xalan-j">Xalan 2.5</a>.</p>
<p><span id="more-44"></span><br />
The problem seems to be that JDOM is asking the TagSoup parser for full namespace support, which it&#8217;s not able to give. This new example uses Xalan&#8217;s SAX2DOM class to make a DOM tree out of the TagSoup SAX stream, then uses the simple XPathAPI wrapper to make the XPath call.</p>
<pre class="codeblock">import java.net.URL;
import org.apache.xalan.xsltc.trax.SAX2DOM;
import org.apache.xpath.XPathAPI;
import org.apache.xpath.objects.XObject;
import org.ccil.cowan.tagsoup.Parser;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;
&nbsp;
public class example {
&nbsp;public final static void main(String[] args) throws Exception {
&nbsp;&nbsp;URL url = new URL("http://example.com");
&nbsp;&nbsp;Parser p = new Parser();
&nbsp;&nbsp;p.setFeature("http://xml.org/sax/features/namespace-prefixes",true);
&nbsp;&nbsp;// to define the html: prefix (off by default)
&nbsp;&nbsp;SAX2DOM sax2dom = new SAX2DOM();
&nbsp;&nbsp;p.setContentHandler(sax2dom);
&nbsp;&nbsp;p.parse(new InputSource(url.openStream()));
&nbsp;&nbsp;Node doc = sax2dom.getDOM();
&nbsp;&nbsp;String titlePath = "/html:html/html:head/html:title";
&nbsp;&nbsp;XObject title = XPathAPI.eval(doc,titlePath);
&nbsp;&nbsp;System.out.println("Title is '"+title+"'");
&nbsp;}
}</pre>
<p>This code example can be compiled and run with just the TagSoup classes and the Xalan 2.5 main jar on the classpath.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/12/28/update-screenscraping-html-with-tagsoup-and-xpath/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>An RDF crawler</title>
		<link>http://www.hackdiary.com/2003/04/21/an-rdf-crawler/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=an-rdf-crawler</link>
		<comments>http://www.hackdiary.com/2003/04/21/an-rdf-crawler/#comments</comments>
		<pubDate>Mon, 21 Apr 2003 14:33:33 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=33</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>I wrote an RDF crawler (aka <a href="http://rdfweb.org/topic/ScutterSpec">scutter</a>) using Java and the Jena RDF toolkit that spiders the web gathering up semantic web data and storing it in any of Jena&#8217;s backend stores (in-memory, Berkeley DB, mysql, etc). <a href="http://www.hackdiary.com/src/hackscutter-0.1.tar.gz">Download it here</a>.</p>
<p><span id="more-33"></span><br />
The system is multithreaded and so can simultaneously download from many sources while the aggregation thread does the processing. It builds a model that remembers the provenance of the RDF and takes care to delete and replace triples if it hits the same URL twice, so you can run it as often as you like to keep the data fresh without bloating the store with out-of-date information. As yet it doesn&#8217;t do anything with what it gathers; the information&#8217;s just sitting there waiting for interesting applications to be built on top of it.</p>
<p>To use it as distributed, set up a mysql database called &#8220;scutter&#8221; and set the username and password in the DBConnection setup in Scutter.java then recompile using &#8216;ant compile&#8217; (sorry, no handy config files in this 0.1 release). Run the script scutter.sh passing in as many starting-point URLs as you like. These will be added to the queue, and any rdfs:seeAlso pointers in the downloaded RDF will be recursively followed until no more unique URLs can be found. The biggest known issue at the moment is that it doesn&#8217;t do proper management to work out when it&#8217;s run out of URLs &#8211; it just stops. The standard log4j.properties file can be edited to change what gets logged &#8211; with full debugging information turned on, you get <a href="http://www.hackdiary.com/misc/scutter.log.txt">quite a lot of output</a>.</p>
<p>Plans for the future include tying <a href="http://rdfweb.org/foaf/">FOAF</a>-related processing into the aggregation such as <a href="http://www.hackdiary.com/archives/000021.html">smushing and mbox_sha1sum normalising</a>, and making a publish/subscribe-based system so that people who can&#8217;t run their own aggregators can subscribe to the RDF that&#8217;s gathered.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/04/21/an-rdf-crawler/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Screenscraping HTML with TagSoup and XPath</title>
		<link>http://www.hackdiary.com/2003/04/13/screenscraping-html-with-tagsoup-and-xpath/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=screenscraping-html-with-tagsoup-and-xpath</link>
		<comments>http://www.hackdiary.com/2003/04/13/screenscraping-html-with-tagsoup-and-xpath/#comments</comments>
		<pubDate>Sun, 13 Apr 2003 14:40:18 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=32</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Often I find I need to pull out a bit of information from a webpage to reuse inside some code. I&#8217;ve always done this from the commandline using a combination of <a href="http://www.gnu.org/software/wget/wget.html">wget</a>, <a href="http://www.w3.org/People/Raggett/tidy/">HTML TIDY</a> and <a href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a>. Recently I&#8217;ve been doing the same thing in program code using some very handy tools written in Java.</p>
<p><i>Note: the example code below has been <a href="http://www.hackdiary.com/archives/000041.html">updated</a>.</i></p>
<p><span id="more-32"></span><br />
The commandline version looks like this:</p>
<p><code>wget -O - http://example.com | tidy -asxml - | xsltproc somexsl.xsl -</code></p>
<p>where somexsl.xsl looks something like this:</p>
<p><code>&lt;?xml version='1.0'?&gt;<br />
&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'&gt;<br />
&lt;xsl:output method="text" /&gt;<br />
&lt;xsl:template match="/"&gt;<br />
&lt;xsl:value-of select="/html/head/title" /&gt;<br />
&lt;/xsl:template&gt;<br />
&lt;/xsl:stylesheet&gt;</code></p>
<p>It&#8217;s also possible to do the same thing entirely in Java. John Cowan wrote a wonderful HTML parser called <a href="http://mercury.ccil.org/~cowan/XML/tagsoup/">TagSoup</a> that outputs SAX events using a do-the-best-I-can approach (&#8220;Just Keep On Truckin&#8217;&#8221; as he describes it) that attempts to make the best job of even the nastiest badly-written HTML. It produces output in cases when HTML TIDY gives up and tells you that errors in the input must be corrected before it can continue.</p>
<p>Because the SAX events just look like XML to any downstream code, it can be plugged into an XPath processor such as <a href="http://jaxen.sf.net/">Jaxen</a>. XPath processors need DOM trees to work with (because of the backwards-and-forwards-looking nature of the language which makes streaming processing difficult). <a href="http://www.jdom.org">JDOM</a> contains a nice class called SAXBuilder that can do this SAX-to-DOM conversion, and handily Jaxen can work with JDOM trees directly. So, the Java equivalent of the commandline above is:</p>
<p><code>URL url = new URL("http://example.com");<br />
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser"); // build a JDOM tree from a SAX stream provided by tagsoup<br />
Document doc = builder.build(url);<br />
JDOMXPath titlePath = new JDOMXPath("/h:html/h:head/h:title");<br />
titlePath.addNamespace("h","http://www.w3.org/1999/xhtml");<br />
String title = ((Element)titlePath.selectSingleNode(doc)).getText();<br />
System.out.println("Title is "+title);</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/04/13/screenscraping-html-with-tagsoup-and-xpath/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Using HTTP conditional GET in java for efficient polling</title>
		<link>http://www.hackdiary.com/2003/04/09/using-http-conditional-get-in-java-for-efficient-polling/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=using-http-conditional-get-in-java-for-efficient-polling</link>
		<comments>http://www.hackdiary.com/2003/04/09/using-http-conditional-get-in-java-for-efficient-polling/#comments</comments>
		<pubDate>Wed, 09 Apr 2003 15:52:59 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=31</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re going to download a resource over HTTP from a URL more than once, there are a couple of features of HTTP you should make sure you&#8217;re using. By giving the server some metadata about what you saw when you last downloaded the resource, it can give you a status code indicating that the resource hasn&#8217;t changed and you should continue to use the version you already have.</p>
<p>This issue has been highlighted recently by the bandwidth load caused by the growth in popularity of RSS readers, which repeatedly download RSS files looking for changes. There&#8217;s a good writeup of the details at <a href="http://fishbowl.pastiche.org/archives/001132.html">The Fishbowl</a>. I didn&#8217;t find any sample Java source when I went looking recently, so here&#8217;s some code.</p>
<p><span id="more-31"></span><br />
If you&#8217;re using <a href="http://jakarta.apache.org/commons/httpclient/index.html">Jakarta Commons HttpClient</a> and you have an etag and lastModified string cached with a document then use these lines on your GetMethod instance:</p>
<p><code>GetMethod get = new UrlGetMethod(url);<br />
get.addRequestHeader(new Header("If-None-Match",etag));<br />
get.addRequestHeader(new Header("If-Modified-Since",lastModified));</code></p>
<p>then check the response code like this:</p>
<p><code>client.executeMethod(get);<br />
if(get.getStatusCode() < 300) {<br />
&nbsp;            // server gave us a document<br />
&nbsp;            HeaderElement[] etags = get.getResponseHeader("ETag").getValues();<br />
&nbsp;            if(etags.length > 0) {<br />
&nbsp;&nbsp;               String newEtag = etags[0].getName(); // stash this somewhere<br />
&nbsp;            }</p>
<p>&nbsp;            HeaderElement[] mods = get.getResponseHeader("Last-Modified").getValues();<br />
&nbsp;            if(mods.length > 0) {<br />
&nbsp;&nbsp;                String newLastModified = mods[0].getName()); // stash this somewhere<br />
&nbsp;            }<br />
} else {<br />
&nbsp;            // server didn't give us a document, no update<br />
}</code></p>
<p>The equivalent lines (taken from <a href="http://www.methodize.org/nntprss/">nntp//rss</a>) for the standard JDK java.net package are:</p>
<p><code>HttpURLConnection httpCon = ....<br />
httpCon.setRequestProperty("If-None-Match", etag);<br />
httpCon.setIfModifiedSince(lastModified);</code></p>
<p>and</p>
<p><code>if(httpCon.getResponseCode() == HttpURLConnection.HTTP_OK) {<br />
&nbsp;newEtag = httpCon.getHeaderField("ETag");<br />
&nbsp;newLastModified = httpCon.getHeaderFieldDate("Last-Modified", 0);<br />
}<br />
if(httpCon.getResponseCode() == HttpURLConnection.HTTP_NOT_MODIFIED) {<br />
&nbsp; // no change<br />
}</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/04/09/using-http-conditional-get-in-java-for-efficient-polling/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>A freetext-indexing IMAP spider</title>
		<link>http://www.hackdiary.com/2003/02/06/a-freetext-indexing-imap-spider/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-freetext-indexing-imap-spider</link>
		<comments>http://www.hackdiary.com/2003/02/06/a-freetext-indexing-imap-spider/#comments</comments>
		<pubDate>Thu, 06 Feb 2003 23:00:09 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=26</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Because the Exchange mailserver at work is frustratingly slow and doesn&#8217;t have a flexible cross-folder search option, I wanted an indexing spider for IMAP. After a bit of struggling with the <a href="http://java.sun.com/products/javamail/">javamail</a> API and almost no work at all plugging the messages into <a href="http://jakarta.apache.org/lucene/">Lucene</a> (which is impressively clean, flexible and powerful), I had some working code that will start at a folder and work down through its subfolders, indexing messages as it goes.</p>
<p><span id="more-26"></span><br />
This <a href="http://www.hackdiary.com/src/mailindex-0.4.tar.gz">tarball</a> contains the source, compiled class files and support jars, along with a <a href="http://www.mortbay.org/jetty/">Jetty</a> setup that will let you run the demo servlet without needing an install of Tomcat or any other servlet engine. Point the indexer at your IMAP host and give it a folder to start from and it will recursively build an index of subject, date, from and mail body. Run Jetty via queryserver.sh and point your browser at http://localhost:9999</p>
<p>The indexer uses the Message-ID as a primary key; it will only index mail it hasn&#8217;t seen before when it does a run. This means it will work nicely from a regular cronjob. The query code uses the standard Lucene <a href="http://jakarta.apache.org/lucene/docs/queryparsersyntax.html">query parser</a> so will support queries such as <i>+foo +bar</i>, <i>subject:fish</i> and <i>&#8220;phrase search&#8221;</i>. The spider is independent of the indexer and just fires message events at a MessageListener interface, so it might be useful for other things. The main limitation at the moment (apart from some kind of nice interface) is that the code only copes with single-part messages of type text/plain. The MailDocument class is the place to start improving that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/02/06/a-freetext-indexing-imap-spider/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Sha1ing, smushing and aggregating FOAF</title>
		<link>http://www.hackdiary.com/2003/02/03/sha1ing-smushing-and-aggregating-foaf/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=sha1ing-smushing-and-aggregating-foaf</link>
		<comments>http://www.hackdiary.com/2003/02/03/sha1ing-smushing-and-aggregating-foaf/#comments</comments>
		<pubDate>Mon, 03 Feb 2003 00:02:38 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[foaf]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=24</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>To normalise and aggregate FOAF metadata related to photographs, I needed some new code to:</p>
<ul>
<li>convert foaf:mbox entries to privacy-protected foaf:mbox_sha1sum entries.</li>
<li>normalise statements of the form &#8220;PICTURE depicts PERSON&#8221; to &#8220;PERSON depiction PICTURE&#8221;.</li>
<li>smush disparate references to the same person into references to a single definition of that person.</li>
<li>extract depiction triples from a model and copy just the bare minimum of information related to those depictions</li>
</ul>
<p>So I wrote <a href="http://www.hackdiary.com/src/foaftool-0.2.tar.gz">foaftool</a>, a Java class that uses <a href="http://www.hpl.hp.com/semweb/jena.htm">Jena</a>. The tarball also contains a couple of servlets that can be used to transform existing content on the web.</p>
<p><span id="more-24"></span><br />
The first servlet will transform foaf:mbox triples in FOAF data into appropriately-encoded foaf:mbox_sha1sum triples. This makes <a href="http://heddley.com/edd/foaf.rdf">Edd&#8217;s FOAF file</a> look <a href="http://www.hackdiary.com/foaf/apps/foafToSha1?foaf=http://heddley.com/edd/foaf.rdf">like this</a>. Using an extra querystring parameter, it optionally <a href="http://www.hackdiary.com/foaf/apps/foafToSha1?foaf=http://heddley.com/edd/foaf.rdf&#038;convertDepicts=1">converts foaf:depicts triples to foaf:depiction</a>. foaf:depicts isn&#8217;t actually in the official <a href="http://xmlns.com/foaf/0.1/">FOAF schema</a> at the time of writing, although it is in informal use in many places as it sometimes makes for more elegant modeling. Normalising to foaf:depictions makes working with large amounts of FOAF data simpler.</p>
<p>Writing the <a href="http://rdfweb.org/2001/01/design/smush.html">smushing</a> code was an entertaining diversion. Smushing is important when merging multiple RDF sources. Say you have two sources, <a href="http://www.hackdiary.com/misc/edd1.rdf">edd1.rdf</a> and <a href="http://www.hackdiary.com/misc/edd2.rdf">edd2.rdf</a>, showing where to find photos of Edd. When merged, the graph structure looks like this:</p>
<p><a href="http://www.hackdiary.com/images/edd1.png"><img class="noborder" width="500" height="166" alt="edd1.rdf" src="/images/edd1_small.png" /></a><br />
<a href="http://www.hackdiary.com/images/edd2.png"><img class="noborder" width="468" height="166" alt="edd1.rdf" src="/images/edd2_small.png" /></a></p>
<p>This is because without smushing, the anonymous nodes that both have Edd&#8217;s email address are not equated. The <a href="http://www.hackdiary.com/foaf/apps/aggregateDepictions?rdf=http://www.hackdiary.com/misc/edd1.rdf&#038;rdf=http://www.hackdiary.com/misc/edd2.rdf">smushed version</a> (smushed on mbox_sha1sum using a foaftool servlet) looks like this:</p>
<p><a href="http://www.hackdiary.com/images/eddsmush.png"><img class="noborder" width="502" height="209" alt="edd1.rdf" src="/images/eddsmush_small.png" /></a></p>
<p>With the data in normalised and merged form, I want to extract just the triples of the form &#8220;X foaf:depiction [picture uri]&#8221; and the related foaf:name and foaf:mbox_sha1sum triples. With the foaftool code, I can now <a href="http://www.hackdiary.com/foaf/apps/aggregateDepictions?rdf=http://www.picdiary.com/rss/xcom.rss&#038;rdf=http://www.picdiary.com/rss/foafmeet.rss&#038;rdf=http://www.picdiary.com/rss/barcelona_conf.rss&#038;rdf=http://www.picdiary.com/rss/pantsconkeevil.rss">merge and extract depictions</a> from any number of RDF sources.</p>
<p>Comments and bugfixes are very welcome; the code has only been tested as far as the junit tests included in the tarball.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/02/03/sha1ing-smushing-and-aggregating-foaf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Template for Java projects</title>
		<link>http://www.hackdiary.com/2003/01/24/template-for-java-projects/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=template-for-java-projects</link>
		<comments>http://www.hackdiary.com/2003/01/24/template-for-java-projects/#comments</comments>
		<pubDate>Fri, 24 Jan 2003 18:53:05 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=22</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Every time I start a new java project, no matter what size, the first thing I do is go hunting through my java directories looking for one to use as a template. Over time I&#8217;ve gathered some pretty useful <a href="http://ant.apache.org">ant</a> targets and settled on a fairly rational directory structure. Today I got round to building a skeleton set of directories and files that I can reuse in the future. Here&#8217;s a <a href="http://www.hackdiary.com/misc/ant-project-template-1.0.tar.gz">tarball</a> of the results.</p>
<p><span id="more-22"></span><br />
There are some features of the template that are specifically for deploying web apps into Tomcat. They can be easily ignored if you&#8217;re not doing that. The directory structure goes like this:</p>
<ul>
<li><em>etc</em> &#8211; put any misc files to go in WEB-INF in here and they&#8217;ll be copied over when the project is deployed. <em>web.xml</em> lives in here.</li>
<li><em>lib</em> &#8211; put the .jar files your code depends on in here and they will be automatically put into the classpath for compilation and deployed to the right place in the webapp.</li>
<li><em>src</em> &#8211; your java source.</li>
<li><em>tests</em> &#8211; the java source of your junit tests for the code in <em>src</em>.</li>
<li><em>web</em> &#8211; the web document tree for deployment.</li>
</ul>
<p>Here are the targets in the ant buildfile:</p>
<ul>
<li><em>compile</em> &#8211; compiles the java from <em>src</em> into <em>build</em>.</li>
<li><em>test</em> &#8211; compiles the source and the tests and runs every test in <em>tests</em> using junit.</li>
<li><em>clean</em> &#8211; nukes the build and testbuild directories.</li>
<li><em>with.jikes</em> &#8211; add this before any target that compiles to set jikes as the build compiler.</li>
<li><em>with.clover</em> &#8211; add this before any target that compiles to instrument the .class files with <a href="http://www.thecortex.net/clover/">clover</a>.</li>
<li><em>clover.report</em> &#8211; runs the clover reporter. Best used after running tests, as in <em>ant clean with.clover test clover.report</em>.</li>
<li><em>clover.report.html</em> &#8211; runs the clover html reporter.</li>
<li><em>deploy</em> &#8211; compiles and deploys the project into a tomcat webapp directory, copying the classes, config files from <em>etc</em>, web tree from <em>web</em> and the jars from <em>lib</em>.</li>
<li><em>ctags</em> &#8211; runs <em>ctags -R</em> over the <em>src</em> and <em>tests</em> directories. This is run automatically by the <em>compile</em> targets.</li>
<li><em>javadoc</em> &#8211; compile javadoc into <em>javadoc</em>.</li>
<li><em>run</em> &#8211; run some code from your source with the classpath all set up correctly for your libs and source.</li>
<li><em>pause</em> &#8211; pauses and waits for carriage-return before running the next target. I use this to make ant more responsive; I find that it can take several seconds for ant to start up and parse its build file, and I like to run the compiler a lot while I&#8217;m coding so that I can check my tests as often as possible. In the shell, I use this line: <em>while true; do ant with.jikes pause test; done</em>. This means that every time you hit enter, the compile-and-test run is done then ant makes itself ready for the next run.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/01/24/template-for-java-projects/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Photo-annotating bot</title>
		<link>http://www.hackdiary.com/2003/01/09/photo-annotating-bot/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=photo-annotating-bot</link>
		<comments>http://www.hackdiary.com/2003/01/09/photo-annotating-bot/#comments</comments>
		<pubDate>Thu, 09 Jan 2003 13:02:47 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[bots]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[photos]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[rest]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=19</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>A background project for a while has been to write a bot to help me annotate the fairly large number of pictures I post to <a href="http://www.picdiary.com/cgi-bin/frontpage.pl">picdiary</a> (1496 at the last count). Creating a document of <a href="http://www.picdiary.com/rss/highwalk.rss">RSS-based metadata</a> is a slightly cumbersome text-editor job every time I post a new set of pics.</p>
<p><span id="more-19"></span><br />
After much thought, coding and discussion over the summer with <a href="http://husk.org/bots/">#bots</a> people, I came to the conclusion that a good architectural style for bots is, as <a href="http://space.frot.org/techdetails.html">jo puts it</a>, &#8220;a stateful conversational interface to an aspect of the semantic web&#8221;. Querying and updating information is modeled by HTTP GET and POST respectively. The code is structured after the Model-View-Controller pattern commonly used in GUI development: the View is a represention over IRC or another messaging system, the Controller responds to messages generated by running a <a href="http://www.webgain.com/products/java_cc/">grammar-based parser</a> over the user&#8217;s input, and the Model lives on the web. The bot only keeps state relevant to the conversation (e.g. which picture we are currently talking about).</p>
<p>To get the bot up and running, I started by creating a GET-able query system based on <a href="http://www.hpl.hp.com/semweb/jena-top.html">jena</a>. It has a nice RDF query language that&#8217;s somewhat SQLish, and can use berkeleydb and rdbms backends. I modeled <a href="http://www.picdiary.com/~mattb/queries.rdf">a store of queries</a> in RDF and wrote a java servlet to run queries and return results in <a href="http://www.picdiary.com:8180/rss/query/depicts?mbox=mailto:edd@usefulinc.com">naive XML</a> or <a href="http://www.picdiary.com:8180/rss/query/depicts?mbox=mailto:edd@usefulinc.com&#038;rdf=1">RDF</a> formats.</p>
<p>With this flexible backend, I was able to write a commandline version of a bot using simple HTTP requests and the <a href="http://jakarta.apache.org/commons/digester/">Jakarta Digester</a> to extract information from the XML. Adding a new feature is mostly a case of augmenting the parser&#8217;s grammar and adding a new query to the query store.</p>
<p>Here&#8217;s a sample session with the commandline version of the bot:</p>
<pre>
> find mailto:jo@abduction.org
mailto:jo@abduction.org depicted in http://www.picdiary.com/xcom/IMG_1676.jpg
> read xcom
0: http://www.picdiary.com/xcom/IMG_1674.jpg
1: http://www.picdiary.com/xcom/IMG_1676.jpg
2: http://www.picdiary.com/xcom/IMG_1677.jpg
3: http://www.picdiary.com/xcom/IMG_1678.jpg
> current file
xcom
> use 0
Using pic with uri http://www.picdiary.com/xcom/IMG_1674.jpg
> current pic

http://www.picdiary.com/xcom/IMG_1674.jpg

> depicts
mailto:edd@usefulinc.com
mailto:ben@hammersley.com
> use 1
Using pic with uri http://www.picdiary.com/xcom/IMG_1676.jpg
> depicts
mailto:jo@abduction.org
</pre>
<p>The next step is to extend the query system to allow updates and creates, and I&#8217;ll be nearly done.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/01/09/photo-annotating-bot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Java utility class for the Wordnet namespace</title>
		<link>http://www.hackdiary.com/2003/01/02/a-java-utility-class-for-the-wordnet-namespace/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-java-utility-class-for-the-wordnet-namespace</link>
		<comments>http://www.hackdiary.com/2003/01/02/a-java-utility-class-for-the-wordnet-namespace/#comments</comments>
		<pubDate>Thu, 02 Jan 2003 23:32:14 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[wordnet]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=16</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>To support the work I&#8217;ve been doing with Wordnet and RDF, I wrote a <a href="http://www.hackdiary.com/src/WordnetNamespace.java">utility Java class</a> to handle URIs from the <a href="http://xmlns.com/wordnet/1.6/">Wordnet ontology for RDF</a> devised by <a href="http://www.w3.org/People/DanBri/">Dan Brickley</a>.</p>
<p><span id="more-16"></span><br />
The class is implemented using the <a href="http://sourceforge.net/projects/jwordnet">Java Wordnet Library</a> package, which means it requires JDK 1.4.</p>
<p>The major methods are:</p>
<ul>
<li>public Synset lookup(String uri)</li>
<p>Takes a URI string such as <a href="http://xmlns.com/wordnet/1.6/Dog">http://xmlns.com/wordnet/1.6/Dog</a> and returns a JWNL Synset object representing the wordnet synset of that URI.</p>
<li>public String uri(Synset synset)</li>
<p>Takes a JWNL Synset object and returns its canonical URI string from the wordnet namespace.
</ul>
<p>The constructor takes a filename of a JWNL config file from which to configure the JWNL system.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2003/01/02/a-java-utility-class-for-the-wordnet-namespace/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Movable Type categories in RDF</title>
		<link>http://www.hackdiary.com/2002/12/30/movable-type-categories-in-rdf/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=movable-type-categories-in-rdf</link>
		<comments>http://www.hackdiary.com/2002/12/30/movable-type-categories-in-rdf/#comments</comments>
		<pubDate>Mon, 30 Dec 2002 02:28:52 +0000</pubDate>
		<dc:creator>Matt Biddulph</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://www.hackdiary.com/?p=12</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m collating writings about my various hacks and projects in a Movable Type system, but hopefully without actually creating a blog as such. I&#8217;d rather generate ad-hoc navigation based on the categories of the items, so creating RDF-based sitemaps seems like a good idea.</p>
<p><span id="more-12"></span><br />
Step 1 is a <a href="http://www.hackdiary.com/src/metadata.tmpl">template</a> that generates RDF for each entry (eg <a href="http://www.hackdiary.com/archives/000004.rdf">the metadata for this entry</a>).<br />
Step 2 is a simple RDF widget (<a href="http://www.hackdiary.com/src/com/picdiary/rdf/servlet/Posts.java">src</a>) that builds an <a href="http://www.picdiary.com:8180/rss/servlet/com.picdiary.rdf.servlet.Posts">aggregate model</a> of the metadata for each entry.<br />
Step 3 will be to build navigation based on that. To get a flavour of the possibilities, <a href="http://bender.ilrt.bris.ac.uk:1234/brownsauce/browse?source=http%3A%2F%2Fwww.picdiary.com%3A8180%2Frss%2Fservlet%2Fcom.picdiary.rdf.servlet.Posts%3Fcachedfoo&#038;resource=http%3A%2F%2Fwww.hackdiary.com%2Farchives%2F000004.html">browse the aggregate model using BrownSauce</a>.</p>
<p>There&#8217;s an <a href="http://www.picdiary.com:8180/rss/servlet/com.picdiary.rdf.servlet.Posts?augment=1">extra mode</a> in the aggregate model system that adds an <i>inSubject</i> reverse triple for each <i>dc:subject</i> triple. Automatic inferencing of that sort might be handy for simplifying query code, and it&#8217;s something I&#8217;d like to play with, particularly if based on DAML rules.</p>
<p>Perhaps the aggregator could add other inferred metadata based on automatic analysis of the text, or other toys.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hackdiary.com/2002/12/30/movable-type-categories-in-rdf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

