My syndicated links thumbnails seemed like such a good idea at the time. I loved what ma.gnolia was doing, but I didn’t want to switch bookmarking services just for that. And I wanted all my content on one site anyway. But the pain, oh the pain.
The script that actually does the thumbnailing uses python and gtkmozembed. It requires an X server (for the mozilla) and I run it on my colo, which is headless, so I call it from a wrapper script that starts up an xvfb headless server and runs it. All a little fragile, I’m afraid. But it works surprisingly well. I call it from a cronned delicious syndication script (danger! ugly!) that pulls my delicious links into /links here every so often.
It was based a long time ago on mattb’s script/fragment and basically it’s all a bit nasty. I’ve tried to re-write it in various languages, but you’ll quickly find that the gtkmozembed bindings are awful – on most platforms they flat-out don’t work. My wrapper script has to munge LD_LIBRARY_PATH to get it to work under Debian, my server platform. Having to run an X-server is a pain as well. And because you’re wrapping mozilla, and not the underlying renderer, you can’t automatically bypass the security stuff, so it’s impossible to thumbnail sites with bad SSL certificates, you’ll just get a screenshot of the security confirmation dialog. I also can’t find a way of getting a ‘page loaded’ callback, so I just have to sleep some arbitrary amount of time before just blindly taking a screenshot of whatever I’m managed to render so far. That sort of thing.
At the time I wrote this script, I wanted an automated solution. I looked at various 3rd-party thumbnailing services but most of them just thumbnailed the root path of the domain you asked for, not the page itself. Most services also want you to deep-link their thumbnails rather than pulling them to a local server and serving from there, and they charge/quota by number of thumbnails served, rather than generated. I don’t know why. And I wanted thumbnails taken at the time that I bookmarked the page, not at the time you looked at the page. Picky, I know.
Were I to do it again, I think I’d seriously consider 3 alternatives:
Doing the thumbnailing on a Mac
Paul Hammond has a webkit2png script that does the same as my script, but using Webkit on a Mac. It’s almost as annoying, because you still need a windowing server, but the overheads are smaller – there are sensible callbacks so you can thumbnail faster, and it’s a more reliable environment – the bindings work. Of course, there are downsides – you need a Mac, for a start. And if you want an automated solution you’ll need a Mac connected to the internet and turned on 24 hours a day. But I have one of those under my telly now, so it’s tempting. Not sure if I’ll be able to solve the SSL certificate problem for this one, but it’s not a deal breaker.
Simon Willison found a thumbnailing service that doesn’t suck for Oxford Geeks. It’ll actually thumbnail the page itself (or at least, it did last time I poked at it) so if you don’t want the overhead of running your own server, this might work.
A simpler version
I now have a pure-C version of the thumbnailer script. I don’t use this version, but only because I wrote it as a thought experiment some time after I got everything working, and I don’t want to mess with something that works. I see no reason why the C version won’t do just as well, and it avoids most of the bindings pain. It’ll still need the wrapper, but dropping the python side of things might help.