Blog · Tom Insam

Ignoring resource fork files files with subversion

31 Oct 2006

If you've ever edited files over samba, or on a fat partition, using a mac, you'll know that it scatters annoying ._foo.txt files all over the place when you save things. These files are the system's way of compensating for these filesystems not supporting 'real' resource forks, and they're a complete pain. I feel this pain especially when I'm trying to see what's changed in a subversion checkout using svn st, and it produces 30 lines of ? ._foo.pl complaints.

Fortunately, subversion allows you to ignore this stuff. Edit the file ~/.subversion/config (which is created the first time you use the subversion client), and search for the 'miscellany' section. Uncomment the line [miscellany] if it's not already, and also uncomment the line beginning global-ignores. This line is a list of glob patterns for files that should be ignored when doing a svn st, or svn add (if you svn add on a folder, it won't add any backup files in the folder, for instance). Add the pattern ._* to the end of it, and your resource fork woes are over...

Adding a metaweblog interface to django

23 Oct 2006

I tend to reimplement the CMS that drives jerakeen.org more often than I add content to it, but the current Django based incarnation seems to have decent sticking power. A lot of this is Django's magic admin interface middleware. When I add, say, a tagging engine to the site, I only need to worry about the object model and presenting it on the site itself. All the boring and much harder to write admin pages to add and remove tags just write themselves. But the other reason I'm staying with it is that I've now added so many features to it (because it's easy!) that a re-write in another language would be a huge amount of effort.

This weekend, for instance, I've added an implementation of the metaweblog API to the site, using the excellent code on allyourpixel as a base. The main source of pain is the persistent weirdness of implementing the Movable Type extensions to the metaweblog extensions to the Blogger XMLRPC API. How can you call something a metaweblog API and not allow for post excerpts, for instance? So annoying.

While implementing it, I found the TextPattern API reference to be far more useful than the official spec, mostly because it covers everything up to the Movable Type extensions, which you need if you want to edit page excerpts. The other problem I encountered was that Ecto won't talk to an endpoint over HTTPS with a self-signed certificate unless the SSL cert is in the local machine X509 database. The way it fails is incredibly unhelpful and annoying, too. The simplest way to fix it (assuming a recent macos) is to visit the endpoint in Safari. It'll complain about the certificate - click the 'always trust this site' box, and it'll stop.

PythonDaap 0.4 release

06 Aug 2006

I've put this off way too long. But Fernando Herrera has found a bug with python-daap and Tangerine (a very cool app, although subject to an annoying variety of disconnect bugs). The fix for this, combined with various safer handing of non-utf8 ID3 tags, is easily enough to encourage a 0.4 release.

Bot::BasicBot 0.7

11 Jun 2006

Updates for new PoDo::IRC
No longer do 2 server connects on startup
the connect test doesn't break itself by faking a connection first

E4X - A native XML datatype for JavaScript

30 May 2006

I gave a talk on E4X. In a Just and Decent world, I wouldn't have to write a blog entry on this, because there would be a nice front page to jerakeen.org that listed all the recent things I've done, with the option to subscribe to RSS (or whatever) feeds of various subsets. But I've been too lazy to write this so far, so I'll just link to it here until I get django to do what I want.

E4X is a lovely extension to JS (well, compared to messing with the DOM, and it's in core, so embedded users get it too), despite its crazy inconsistent syntax and annoying brokeness in Firefox. Fortunately, I don't have to care about web browser-based JS implementations, so I get to use it, and you don't..

http://jerakeen.org/talks/e4x/

JavaScript strings - a followup

12 May 2006

Having played around with the JavaScript string type some more, I think I understand why it acts as it does. I'm a Perl monkey normally, so I'm not used to the concept of immutable strings, but JavaScript strings are immutable. Playing with the === operator (approximately, 'is this the same object') gives:

js> "a" === "a";
true
js> "a" + "b" === "ab";
true
js> "ab".replace(/./, "c") === "cb";
true

but

js> new String("a") === new String("a");
false

If strings were to magically upgrade themselves to objects, they'd change behaviour - previously equivalent strings would suddenly not be equivalent. Likewise, suppose this worked:

var a = "string";
var b = "string";
a === b; # true
a.foo = 1;

Shoud a still be equivalent to b? If not, a clearly isn't immutable, as we've changed it. But if it is, then we've chanaged b at a distance - it's grown a foo attribute.

Still all very annoying, of course, but I understand why now.

JavaScript string weirdness

29 Apr 2006

Recently, I mentioned a peculiar difference between uneval and toSource. Specifically (using the SpiderMonkey JS console):

js> uneval("");
""
js> "".toSource();
(new String(""))

"" and new String("") are different types of objects. The first is the basic string type, and only really has a value. The second is a full Object, that happens to have a value. However, it turns out that if you treat a basic string type as an Object, say by putting '.' after it in an expression, the SpiderMonkey runtime will implicitly promote the string to a String. Hence, "".toSource() promotes the string object, then calls toSource on the new String object.

Annoyingly, the String Object doesn't hang around, it'll get thrown away as soon as you're done with it. This leads to the weird case that you can set attributes on a basic string type (because it'll get promoted to an Object, and Objects have attributes) but they don't stay set (because the Object you've set them on gets thrown away as soon as the set call finishes).

By the way, all of this applies very specifically to the current CVS trunk SpiderMonkey. I don't know what most web browser engines do with strings, so don't assume this applies in, say, Internet Explorer. But I'd be interested if someone wants to find out and tell me...

On blog comment spam

28 Apr 2006

Blog comment spam, the scourge of the internet. Having written yet another CMS to power jerakeen.org, I wanted comments on pages again. Django rocks hard - adding commenting was easy. And a day later, I have comment spam. Bugger.

From a purely abstract point of view, I find this interesting. There must be a spider looking for forms that look like things that can take comments. And the robots must be reasonably flexible - it's not like my CMS is an off-the-shelf. But from a more concrete, 'spam bad' point of view, it's bloody annoying.

So begins my personal battle against spam. Others have fought this battle, but of course the downside of rolling your own site is that you can't use anything off the shelf. My plan was to forget trying to recognise and filter spam, and preferably I don't want to have to moderate anything - I don't want the spam to be submitted at all. And this really can't be that hard. Unless there's a human surfing for blogs and typing in the spam themselves, this should really just be a measure of my ability to write a Turing test. Right?

My first plan was to require a form value in the comment submission, but to not include that field in the form itself - instead, I added it with client-side JavaScript. This should stop simplistic robots, at the cost of requiring JS to be turned on in the client, which is something I'm willing to live with, frankly. Alas, it didn't work. Clearly too simple - either there's a human typing spam into the box, or the robot doing the work is using something like Mozilla::Mechanize that'll do the JavaScript. Or maybe they just handle some obvious cases. After all, my 'clever' code was merely document.write("<input name=......

Or perhaps they figure it out once, and use a replay attack to hit every page? Not really a good assumption with hindsight, but never mind. I added prefixes to the form fields that were generated from the current time, and checked at submit time that the fields weren't more than an hour old. This saves me from having to store state anywhere, and gains me forms that exipre after a while, unless you reverse-engineer the timestamp format. But I'm premising the existence of some automated tool, perhaps with a little human interaction. I don't need to be perfect, I merely need to be not as bad as everyone else... But no, this failed too.

Ok, so the JavaScript is too obvious. I split it up into sections, and also write the wrong value into the form, and change it to the right one using a regular expression later (BWHAHAHAH). At the same time (and I suspect this is the important bit) I changed the names of the fields completely. Calling them 'name', 'email' and 'comment' is a bit of a giveaway, really. 'foo', 'bar' and 'baz' they are, then. Now it should be practically impossible for an automated tool to even figure out that I accept comments. Sure, you could probably think 'hmm, two small input fields, and a textarea, on a page that has an RSS feed', but I'm assuming that, for 90% of the blogs out there, this isn't needed, so no-one does it.

And yes, I've received no blog spam comments since I did this. On the other hand, I've also received no normal comments either. Hope I haven't raised the barrier too high. If the situation stays good, I may remove the client-side JavaScript requirement. Or figure out a noscript fall-back solution for people using lynx. Poor souls..