Tom Insam

python and unicode

I like python's unicode handling. Instead of perl's situation, where file handles are assumed, by default, to be latin-1, python file handles (including STDIN/OUT) are assumed, by default, to be ASCII. Forget nasty things like '☃', in python, you can't even print 'é' without explicitly telling it how. Lovely.

In a previous life, I was trying to import content from a Movable Type blog into Hayfever. Then I wanted to write an importer from Hayfever into Wordpress. And wow the MT import format is nasty. Things that have annoyed me, in no particular order:

  • There's no charset considerations in the spec. I care deeply about explicit charsets nowadays. I'm sure the implementation does something with them, but what?

  • The DATE atom is an annoying US date, with no timezone information.

  • The whole serialization format is just nasty. The Wordpress importer, for instance, splits records on '––—nAUTHOR:', which is presumably much more reliable (in the case that there are lines of '-' in the data), but is a fairly nasty assumption that bit me quite badly for my own importer.

  • The PING and COMMENT atoms seem to contain nested sets of atoms, but this isn't indicated in any sort of general way - you just have to know that PING is special.

I'm just grouchy, I guess.

More on the NSLU2

Well, having played with a real linux distro on the NSLU2 for a while, I've reverted it to the stock firmware. It's now nothing smarter than a disk-sharing box. Sure, it was cute being able to do these interesting things with it, but after a while you realise that you never actually ssh into it, and mt-daapd, which was the real reason I wanted the ability to install software on it, eats all the CPU, takes about an hour to start up with all the music on the drive, and almost 10 minutes to connect to, so I never use it. Let's just revert to something I trust.

Aaah, technology.

Humax PVR 8000T

Again, in the 'new shiny things' category, I have got myself a Humax PVR 8000T - it's a Freeview box with an 80gig PVR built-into it, and was recommended by a friend of mine who knows about these things. I think I quite like it. Going from 5-channel terrestrial to lots-of-channels plus time-shifting plus an EPG plus scheduled programme recording is great. Not that I watch a lot of telly, but the little I do watch is now more interesting and more convenient.

Of course, after amazingly little time, the annoying things start to grate. Not that any of these are onerous - in every respect my experience is better now than it was, but still.. for instance, I can't look at the EPG when the telly is paused. Tiny, because I can push play, look at the EPG, drop out of the EPG and rewind back to where I was. Much more annoying is the fact that I can't convert time-shifted TV into a recorded programme - although you can either press 'record' to start recording into a named slot on the box, or you can rewind the current programme to any point since you last changed the channel, you can't rewind the TV to a point in the past and start recording to a named slot from that point - you can only record from 'now'. That's annoying.

On the bright side, if I ever need to upgrade the harddisk in it, it's trivially easy. And another thing that I consider to be a good thing, although others might disagree, there's nowhere on the box for a top-up (danger - ugly) slot, so I can't talk myself into paying a monthly subscription for things (something I insist I don't let myself do - I don't watch enough telly to make it worth it, and neither do I want to).

Conclusion - it's a good box. I'd refer a TiVo, of course, but then I also want a pony. And a Jacuzzi.

— Later —

Argh, more annoyances. It's only got one tuner, so you can't watch something and record something else. Fair enough. But you also can't record something and watch a recorded programme, which must be within the capability of the technology. Grr.


Random side note - half of these links are just the top Google hit or a straight wikipedia lookup for the words in the link text. I want a shortcut key to do that for me, or maye a markup method that'll link to the 'I'm feeling lucky' page for it.

New shiny thing

I was feeling a lack of toys, so I've acquired an NSLU2. This is a cute little (_really_ little) box that will plug into your external (USB2) harddisks and samba share them across an ethernet. Very nifty. For the most part, setting this stuff up was trivially easy, the only thing that annoyed me was that the box wants to format the drive ext3, so I have to do the juggling dance with a spare external drive and pour data from one to the other repeatedly, and now I'm sitting here watching 220 gigs of data move over the (100 meg) ethernet link, which is dull and slow.

Of course, out-of-the-box toys are never fun enough, and indeed, this thing runs linux and has now been hacked / upgraded / whatever, so I can ssh into it and cron rsyncs of the data to another computer, and it runs mt-daapd so the mp3 collection on it is automatically shared across the network, etc, etc. The important thing to do now is to not mess with it so much that I break it. That would be bad.

photo gallery

All I want, and I don't feel that this is a lot, is to be able to put photos on my web page from iPhoto. Because writing iPhoto plugins is a pain, this requires me to use either Flickr, which I don't want to (because I'd like to control my own photos, please, and not pay money for it), or php gallery, which has it's own issues for me, mostly that it's written in php.

As a perl (most of the time) programmer, I resent the fact that my web page is increasingly powered by php, but alas that's where all these little toys come from nowadays. I've dabbled in php myself a little now, and it seems like a bearable language, although not one I'd actually want to write serious code in. Of course, with my colo going through.. pain recently, it's tempting to restrict myself to a very simple subset of things, specifically, a subset of the stuff that the main box admin uses, because I know that it works and I don't have to think about it. No more compiling weird perl/C modules on solaris as root with bizarre things tacked onto the end of my library path for me. The other colo has debian on it, which I like, but only 64 megs of memory, which I don't. Hardware (even virtual hardware) sucks.

Of course, having got a gallery, I want to do things like have '5 most recent' pictures on the front page of the site, and this is where things fall down a little. Noone else seems to want to do this stuff - I may end up subscribing to my own RSS feed, syndicating from one bit of my site to another, which seems disgustingly wasteful.

unit tests

Tests are a blessing and a curse. They are the developer's friend - I can barely write code nowadays without a test suite. How else will I know when I've broken something? I'm very lazy - I can't be bothered to manually run through every feature, or even to start up the web server (or whatever) in most cases. Edit, save, run tests, edit again, is my preferred cycle, and for this, the tests must be as comprehensive as possible - not only testing the features, but testing the things that shouldn't work, and testing every nasty, hard-to-track-down bug you've ever found - there's nothing worse than spending hours tracking down the same bug that you fixed last month.

But for deployment, tests are just annoying. Take CPAN modules, for instance. Most of the time, yay, tests pass, install the module. But as with all things, all the interesting things happen when things break, and the tests don't pass. In my experience, if there are any failures at all, either all the tests fail, because there's a dependancy that failed to build and you can't even 'use' the module, or one or two out of 45,000 tests failed, because there's a tiiiny little broken case on whatever bizarre architecture I'm using this week, and I'm just going to force install it anyway. This would seem to be served better by a much simpler 'does the thing superficially work?' test suite used for deployment, separate from the development test suite.

I discovered the other day that you can do quite horrifying things with perl. A closure in perl is a nice concept - it's a block that can reference things in the scope that it's declared in, but that can be passed around and used in quite different scopes. For instance, suppose I wanted a function that, say, converted a string to utf8 bytes (yes, I'm obsessed with utf8). I can do this like this:

my $closure;
$closure = sub {
  my $val = shift;
  return Encode::encode("utf8", $val);
};

And call it later as:

print $closure->("héllo");

This is dead nifty. But because a closure can reference things in it's scope, and $closure is in it's scope, it can call itself, or at least, it can call the function pointed at by $closure. So we can make this function recursive:

$closure = sub {
  my $val = shift;
  if (ref $val eq "ARRAY") {
    return [ map { $closure->($_) } @$val ];
  } elsif (ref $val eq "HASH") {
    return { map { $_ => $closure->($_) } keys(%$val) };
  } else {
    return Encode::encode("utf8", $val);
  }
};

Until the assignment is complete, the inside of the closure won't work, because $closure is undefined. But by the time we call it later..

return $closure->( [ "héllo", { foo =>"bår" } ] );

..everything works.

Crazy, I tell you.