June 13, 2004

Taking automated webpage screenshots with embedded Mozilla

The other day I discovered Hotlinks, a rather nice link aggregator. It collects links from sites (including those of a couple of my respected colleagues) and combines them into a good-looking summary page. I particularly like the automatic webpage thumbnails it makes, which are created using khtml2png. I couldn't get khtml2png to compile on my machine. After finding that there are now python wrappers for GtkMozEmbed, I made my own screenshotter-and-thumbnailer by embedding the Mozilla browser component using a little python script.

UPDATE: Ross Burton picked up the script and made a couple of enhancements. Miguel de Icaza posted a C# version.

To run, you'll need PyGtkMoz, Gtk and the Python Imaging Library. Because it's a GTK app, it needs an X server. To run headless, it'll need an X server like Xnest or VNC. VNC's working well for me.

The way it works is very simple:

Starting with the example.py shipped with PyGtkMoz, I created a stripped-down browser window app that loads a URL given on the commandline. By connecting the net_stop signal to a method, you can tell when network activity has finished. I couldn't find a way to be notified when rendering has finished (images decompressed, etc) so I put in a 3 second pause here.

Following clues in a GTK mailing list post, I wrote these lines to save a PNG of my widget's window:

window = self.widget.window
(x,y,width,height,depth) = window.get_geometry()
pixbuf = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,width,height)
pixbuf.get_from_drawable(window,self.widget.get_colormap(),0,0,0,0,width,height)
pixbuf.save("screenshot.png","png")

After that, producing a thumbnail just takes a few more lines of PIL code to open, thumbnail and save the PNG under a new name:

Posted by Matt Biddulph at 02:35 PM | Comments (0) | TrackBack

June 01, 2004

Too much spam, time for dogfood

Over the weekend I was hit by over 200 MT comment spams, from a range of IP addresses. Using MT-blacklist I was able to clean up the damage, but wasted at least half an hour. In temporary despair, I've removed the comment forms from individual entries and disabled the comments cgi. For now, if you want to write something about one of my entries, go get your own website or something.

I've been feeling like I want to move away from movabletype for some time now. My other personal site has been running on an RDF-based homebrew system for a couple of years now, and it looks like it's time to make hackdiary an RDF dogfood site too.

I've restarted a coding project that tailed off last year when I lost a laptop: to create my own graph-oriented content wrangler. Redland, my toolkit of choice, has moved on quite a bit over the last year, and has now gained a parser for "scribbleable" non-XML syntax and an RDQL query engine (for which I've just contributed python wrappers). These are expressive tools.

My intention is to keep it simple and avoid building Yet Another Generic CMS. So far I've got a simple templating engine that reads a config file and gathers RDF from internal and external sources for rendering in Cheetah templates. I plan to add proper two-way web features to it, and hopefully demonstrate (to myself at least) the value of treating the information relevant to my site as one big directed labelled graph.

Posted by Matt Biddulph at 12:28 AM | Comments (0) | TrackBack