April 11, 2004

Moyles-proof code

While the rest of the UK was enjoying a Good Friday lie-in, I dragged myself into Yalding House (home of BBC Radio 1) at 7.30am. I was there to see our new text message system get its first live broadcast use on the Chris Moyles show in a preview of the Ten Hour Takeover.

This is a long piece... skip to the end if you want to see how it went on the day.

The project

I work for the beeb at Radio and Music Interactive, in the software architecture team. We're responsible for the behind-the-scenes plumbing, running the systems behind things like LiveText, and providing toolkits for the guys on the apps team who build public services. When Radio 1 came up with the takeover day (ten hours of radio with music selected entirely by the listeners via text message and phone), they asked us to build them something to deal with the expected deluge of messages.

As you probably know, text messaging is huge in the UK and it's been taken up enthusiastically by the BBC's radio networks. Radio 1's incoming SMS provider have a web console that's used by the broadcast assistants and DJs in the studio. It works rather like an email inbox, and like most email clients the user interface treats the incoming messages as a chronological stream of text rather than a database to be mined for information. This works for running a regular radio show, but the takeover day needs special treatment.

Since the web console had been enough for the networks so far, this was our department's first project based on machine-processing of SMS. Knowing how important text is to the radio networks, we wanted to build a foundation of components that would support future applications. As every software engineer knows, it's hard to "build one to throw away" after the users have got their hands on it.

The build

In our team the language of choice is Python, whereas the apps team prefer Perl. Contrary to what some might expect, this mixed economy hasn't yet given us any interop problems. We're very keen on using abstractions such as asynchronous messaging, RESTful web services and relational databases to act as language-agnostic intermediaries for our data flows.

My immediate thoughts about the architecture were that it should be built from a series of loosely-coupled layers. Even solving the simple problem of how to pull in a high-volume external XML feed of text messages and redistribute it to applications would show benefits later if it was simple to hook new code into. Further layers could process the data, with a web user interface layered on top of that. Working from a specification of the XML format used by the provider, Paul Clifford quickly built a gateway that collected messages and rebroadcast them over a message bus. Using asynchronous messaging as a transport gives us a good level of resilience and clusterability for free, while keeping the logic very simple. Building on this, he then started work on code to push messages sent to Radio 1 into a database and analyse their contents.

We wanted to provide simple, useful features to the users such as freetext search, but we thought we could do more given that the incoming texts would have a certain amount of implicit structure. The programme was going to ask people to text in "Artist - Track - Name - Dedication", but we assumed that there would be a lot of variation in the accuracy of the texts and a lack of consistent punctuation to act as delimiters. We planned a system of 'fuzzy matching' to group together texts using stopwords, sounds-alike phonetic matching and statistical analysis of text prefixes. If you see enough text strings beginning with words that sound like "Bob Dylan" then you can start to guess that Bob Dylan might be an artist, rather than a track called Dylan by a band named Bob. Paul did some great work on refining these techniques, and an on-air test a few weeks before the real broadcast showed that we could pretty accurately infer that "Weezer", "Wheezer", "Weazer" and "Wheeser" were all the creators of tracks called "Buddy Holly", "Budy Holly" and "Buddy Holy".

While Paul worked on the analysis, I created a Perl wrapper around the database so that the apps team could get to work on the user interface. At the same time, I passed on a few lines of sample client code to Matt Webb to see what he could do with it. When working on a toolkit or API, I always like to have more than one client using it, to make sure that the requirements of the main project haven't kept the interface from being generic and useful in other contexts. He built a nice rolling graph of texts received per minute, tapping in at the message bus layer. As with any messaging system, adding a new message recipient affected neither the code providing the messages nor any other clients of the bus. Neil Slater and Conal Jones in the apps team did great work building a web interface with guidance from the Radio 1 team, and within a few weeks we were ready to go live.

On air

Chris Moyles likes to break things. He once managed to get listeners to send 14,000 texts at once. When I arrived, Aled the BA asked me how many messages the new system could take, and told me that Chris was going to do his best to overload it. At 8.45am, Chris started trailing the feature, and the text messages started trickling in. Then flooding. I was impressed: on a UK Bank Holiday, thousands of people were not just listening but involved enough to get on their phones and interact with the programme.

Chris and his team did a great job of understanding our system. They caught onto the freetext search right away and used it to find interesting tracks, and messages to read out from people who requested them. We put a 'quick stats' box on every screen of the app, which they used to goad on the listeners: "3000 text messages so far. That is simply not enough. I want that quadrupled!". Once there were enough messages in the system for the pattern matching to kick in, they used it to navigate through the data and see which tracks by which artists were getting the most attention. As the listeners realised that the playlist had gone out the window and they could request anything, we got some great stuff coming in. Our system even got a little mention on air as Chris teased us for matching an Elvis Costello track to a bunch of requests for Elvis.

As the programme started I was sitting nervously next-door to the studio tailing logfiles and watching process tables, but as it went on and the code coped with everything they could throw at it, I relaxed and enjoyed the show. I don't normally listen to the Radio 1 Breakfast Show but listening to this gave me new respect for the skills of a daytime DJ, sitting in a tiny room in a basement working a crowd of millions that they never see.

If you want to hear the show, there's a Listen Again stream available from the BBC until April 16th. The request section is in the last hour. Tune into Radio 1 on Monday April 12th from 10am for the ten hour version.

Posted by Matt Biddulph at 02:32 PM | Comments (1) | TrackBack

April 01, 2004

More notes on installing Debian on a Dell Latitude X200

Last year I bought a Dell Latitude X200 laptop, which was a wonderful machine. In October it was stolen from my flat in a break-in. I made do with a refurbished HP Omnibook 500 but I wasn't happy with it. When a 2nd-hand X200 came up on ebay last week at a good price, I couldn't resist snapping it up. In the time since I last installed linux on one of these machines, there have been a few developments and new releases that make installation and configuration easier.

Last time round, I had to install by resizing the installed Windows XP partition via a complicated process using Knoppix booted from the C: drive This was because none of the Debian boot disks had the requisite Firewire support to see the CDROM after the BIOS handed over control during the boot sequence. This time round, I discovered that there are now newer boot ISO images available which include Firewire. Using the Windows cd recording software, I burnt the bf2.4 image and booted from it. Once in Linux, I just needed to remove and reload the sbp2 module to get it to recognise the drive as /dev/sda (for some reason it didn't work when the module is first autoloaded). I then finished the install over the net, first using the 3c59x driver to get wired ethernet, then (once it was bootstrapped up to a level where wireless-tools would run) the orinoco_cs driver for the internal TrueMobile 1150 wireless.

The 2.6 kernel series has moved on since I last tried it, and I found 2.6.4 works very well. Here's the .config I'm currently using.

Lincoln Stein has been doing great work documenting his struggles with the X200, and provides a very useful howto and patch to modify the ACPI DSDT to allow ACPI to read your battery levels under Linux. If you're going to do this, I highly recommend patching your kernel to allow DSDTs to be loaded on the fly at boot rather than compiling them in.

ACPI S3 suspend still locks my machine dead (to the extent that it doesn't even listen to the power button and the battery needs to be removed and replaced before it'll turn on again). I'd love to get suspend working if anyone knows how. Edd tells me that he has exactly the same problem with his Sony TR1MP, which also uses an Intel i8xx chipset.

I've had a brief go at getting Software Suspend working, and had middling success. The combination of 2.6 kernel, swsusp and AGP is known to not work on several systems, and the X200 is one of them. However, if I prevent X and hotplug from loading the i830, intel_agp and agpgart modules (thus losing XVideo support in X), it works. I need to do a bit of finetuning of module reloading and device reconfiguration on resume, but it looks promising.

With the 2.6 kernel, the performance and powersaving modes of the Pentium 3M CPU can now be controlled from linux. I installed the cpudyn daemon and now my system switches automatically into performance mode when CPU usage goes over 50%, and stays in powersaving mode otherwise.

As of March 2004, there's a very neat driver for the touchpad available in Debian sid. The package is called xfree86-driver-synaptics and lets you use the touchpad as more than just an emulated PS/2 mouse. The driver enables basic features such as acceleration and tap control, and adds some very neat stuff like the ability to scroll the current window by moving up and down on the far right of the touchpad (comparable to mousewheel scroll on a wheelmouse).

My first X200 came with a nice external firewire DVD/CDRW drive, which could be hotplugged with no problems. The new one came with the docking station, which has the same drive in it, but now it only appears on the firewire bus if I boot the laptop in the docking station. If I boot without it then later dock the laptop, the drive doesn't appear even if I unload and reload the Firewire kernel modules.

Posted by Matt Biddulph at 11:06 PM | Comments (0) | TrackBack

XML Europe is nearly here

I'm off to XML Europe in a couple of weeks (better get in quick if you're thinking of going, registration closes April 9th). I've been looking at the schedule, which has lots of semantic web goodness in it, and thinking about what talks I'm going to go to. Here's my list so far

Take REST: An Analysis of Two REST APIS: Paul Prescod's always interesting on this topic. I found his writings on REST convincing when I first discovered them a couple of years ago, and it was great to meet him at last year's XML Europe in London.

Lessons From an XML Query-Qriven SVG+XHTML Web Site: always interesting to see how new technologies actually get used, rather than just sit listening to people talk through the specs. At a BOF at last year's WWW conference, Liam and others were talking about the possibility of adapting XQuery as a query language for RDF.

Topic Maps are Emerging - Why Should I Care?: because I don't know enough about topic maps despite being an RDFhead.

Semantic Blogging: Spreading the Semantic Web Meme: I've never met Steve Cayzer, but I know several people from ILRT who've worked with him and it all sounds like interesting work. He's on in the same session as my talk, followed by two more fascinating semantic web talks from Damian Steer and Dave Beckett.

Hmm, RDF/XHTML: A New RDF Syntax or RDF, XForms, and the Law - Staying Out of Gaol?

The tutorials look rather good this year (Amazon web services, .NET and XML, XSLT2, XQuery, XML schemas, etc) but my employer didn't have the money for me to attend. At least I'll get to hear Jeff Barr at his keynote.

Oh, and I just found out I've got the go-ahead to attend www2004. Anyone else going?

Posted by Matt Biddulph at 01:38 PM | Comments (3) | TrackBack