Algorithmic recruitment with GitHub

February 10th, 2010  |  Published in web  |  20 Comments

In my new job in Berlin I’ve been asked to hire some people to help prototype new, secret projects. Berlin has a superb tech scene but as I’m new in town it’s taking me a little time to get to know everyone. While that’s going on, I wrote some code to help me explore Berlin’s developer community.

When I’m hiring, one of the things I always want to see is evidence of personal projects. Over the last two years, GitHub has become an amazing treasure trove of code, with the best social infrastructure I’ve ever seen on a developer site. GitHub profiles let the user set their location, so I started with a few web searches for Berlin developers. This finds hundreds of interesting people, but how do I prioritise them?

Another thing that I look for when building a good team is someone’s personal network. I’ve always believed strongly in spending lots of time at conferences meeting passionate people who are smarter than me. A good developer can make themselves even more productive by knowing who to email, IM or DM to answer a question when they’re stuck.

A recent article by Stowe Boyd on centrality and influence in social networks reminded me of some of the network analysis we use behind the scenes calculating recommendations for the Dopplr Social Atlas. So I wrote some code to query the GitHub API and analyse the social graph of the Berlin subset of their users.

The JRuby code uses Yahoo BOSS to do the web search. After querying the GitHub API for each user’s followers it builds an in-memory graph using the Java Universal Network/Graph Framework. Then it ranks each user node in the graph using the Betweenness Centrality algorithm. You can see the simple source code on my github.

To sanity-check the results I ran it for a couple of cities I already know well: London and San Francisco. Here are the top 5 for each, which seem quite plausible to me:

San Francisco

  1. Chris Wanstrath, GitHub
  2. Tatsuhiko Miyagawa, Six Apart
  3. Leah Culver, Six Apart
  4. Square Inc
  5. Aman Gupta, ruby eventmachine maintainer

London

  1. James Darling
  2. London Ruby User Group
  3. Mark Norman Francis
  4. Dan Webb (recently moved to Twitter in SF)
  5. Carlos Villela, Thoughtworks

My choice of metric biases these lists towards connectedness and influence — it can’t measure ability. It’s only measuring GitHub users, and they are biased towards Ruby, Perl and Javascript. But seeing names there that I trust gives me confidence that it’ll help me find interesting people in Berlin.

Hopefully some of those people are reading this blog post right now. Others outside Berlin might be interested to know that Nokia does a superb job of relocating people, with everything taken care of by shipping companies and local agents. If you love the web, Javascript, mobile, user experience, social networks, location, enormous datasets and currywurst, you should get in touch.

Responses

  1. Matt Jones says:

    February 10th, 2010 at 3:50 pm (#)

    so you’re going to hire all of LRUG? ;-)

  2. Matt Biddulph says:

    February 10th, 2010 at 3:51 pm (#)

    If they’ll relocate to Berlin then I’m up for it. I never met an LRUGger I didn’t like.

  3. James Darling says:

    February 10th, 2010 at 3:55 pm (#)

    Hah. Completely abusing this data: I am better than the whole of LRUG. Fact.

  4. Mark Norman Francis says:

    February 10th, 2010 at 4:11 pm (#)

    So does this mean I’m “twinned” with Leah Culver? Hmm.

  5. James Wheare says:

    February 10th, 2010 at 4:45 pm (#)

    You’re gonna have to make this a queryable web service now to stroke the egos of everyone who didn’t make the top 5.

    More seriously though, please do a kloc analysis and deduct points for people who’s repositories mostly consist of untouched forks.

  6. James Wheare says:

    February 10th, 2010 at 4:56 pm (#)

    OK, I know it’s vulgar to criticise public figure, and I quite like James D, and you already mentioned your metric is biased; but just to illustrate the extent to which this is little more than a popularity contest: I just checked his repos and he’s only pushed about 50 lines of code to Github in the last 5 months! He is a lovely chap though :)

  7. Patrick says:

    February 10th, 2010 at 5:05 pm (#)

    So you miss the (perhaps excellent) people who would move to Berlin for a job, right?

  8. Matt Biddulph says:

    February 10th, 2010 at 5:06 pm (#)

    Patrick,

    I hope not – this is just a toy to start the process, and hopefully a way to attract some of those (perhaps excellent) people to get in touch.

  9. Patrick says:

    February 10th, 2010 at 5:26 pm (#)

    Matt, it is very likely that you will attract some of those. Thanks for the nice write up!

  10. Damien Tanner says:

    February 10th, 2010 at 5:40 pm (#)

    Good to see New Bamboo in there at #16 ;)

  11. Tom says:

    February 10th, 2010 at 6:12 pm (#)

    I read out your list (of things to love) and Christi here thought that I said “curry versed”. Clearly the Haskell books I leave lying around haven’t gone unnoticed.

    I would definitely be interested in seeing you adapt this method to include (a) people on Github and Dopplr who already travel to Berlin a lot and (b) people who actively contribute on github, not just people who are well connected.

    But I get that contributing is harder to measure, and that you probably don’t have a github-dopplr username mapping just sitting around, even with all your connections ;)

  12. Daniel Haran says:

    February 10th, 2010 at 6:16 pm (#)

    Betweenness is a great metric for finding the people that will know who’s available. Some “PeopleRank” like algorithm would be more likely to find those people that have deep expertise without being very social.

    I’d love to see results for my city (Montreal, QC) for Rails.

  13. Charles says:

    February 10th, 2010 at 6:47 pm (#)

    Not all people doing interesting, secret stuff aren’t checking it into Github (in fact, anyone doing secret stuff DEFINITELY wouldn’t). Bad source dataset to start from!

  14. Eric says:

    February 10th, 2010 at 6:58 pm (#)

    Berlin is the shittiest city i’ve ever lived in. I hope you fail in fooling people to move here.

  15. Tonći Galić says:

    February 10th, 2010 at 11:10 pm (#)

    Many connections doesn’t mean one produces quality code, but it’s sure a nice approach to finding interesting people. I’ll have a look at it, perhaps it could be applied to twitter and other SNs?

    Thanx for sharing :)

  16. Jilles van Gurp says:

    February 14th, 2010 at 10:40 am (#)

    Hey Matt, this is a great way of scouting out new talent for on our floor :-). Only thing is, many developers (me included) actually don’t commit that much code to OSS projects. I don’t code much outside of work. My way of staying sane, I need my downtime after a long day of programming on the job.

    And working for Nokia, makes on the job contributions kind of a grey area , since it is not up to me to decide on what are technically and legally Nokia owned IPR contributions to outside projects.

    Does that make me a bad engineer?

    You might want to check out ohloh.net. They don’t seem to have a geographic feature but there are a lot of nice statistics about what people do on which projects.

  17. Matt Biddulph says:

    February 14th, 2010 at 10:49 am (#)

    Jilles, nice to see someone from Nokia drop by. I hope nobody thinks that by calculating this highly biased metric I’m dismissing anyone who doesn’t score well in it. Of course there are all sorts of reasons why people don’t have code on github that don’t make them bad engineers.

    As an aside: you may not contribute to OSS but you care enough to write interesting stuff at jillesvangurp.com, another good indicator of interesting people (that the metric in this post also ignores).

  18. Recruiting smart people | s-anand.net says:

    February 14th, 2010 at 6:31 pm (#)

    [...] Biddulph talks about Algorithmic recruitment with Github. The premise is that smart programmers are at the centre of the social networks in their respective [...]

  19. In The Know v1.07 Five Links To Expand Your HR View | HR Examiner with John Sumser says:

    February 19th, 2010 at 2:43 am (#)

    [...] Algorithmic recruitment with GitHub Future talent markets will include demonstrations of skills and social network analysis. This piece (from a geek, not an HR player) shows the beginnings of that process. GitHub is a website where coders can leave examples of their work. The author steps through the use of an algorithm to find the right talent. [...]

  20. Jim Lindstrom says:

    February 19th, 2010 at 1:50 pm (#)

    I had trouble getting the java portion of this running so I adapted it to run all in ruby, using the igraph library. If anyone’s interested I’ll put it up here briefly:

    http://www.columbia.edu/~jbl2132/graph_jbl.rb

    I also ran it for New York. If you’re interested, let me know (jim at researchmob dot com).