Tom Insam

It's the iPhone 3G

I guess I may as well write down my few thoughts on this iPhone thing.

On O2's offer to existing customers

To thank you for being an iPhone fan, we're offering you an early upgrade to the brand new version when it launches on 11th July 2008. You won't have to wait until the end of your existing contract, all you'll need to do is agree to a new 18-month minimum term contract

Is that an additional 18 months on top of my contract now, or merely a reset of the run to 18 months from now? Because if it's the latter, I want mine now.

An aside. When trying to tell O2 where I live, I get the exciting error [House Number must be numeric]. Um. No. Because mine isn't. Idiots.

Interestingly, the page about the iPhone upgrade makes mention of O2's «new iPhone Pay & Go SIM cards». Hmmm, interesting. Though if I can get me an iPhone 3G cheaply I'll probably end up just jailbreaking it instead. Or if the iPhone 3G isn't jailbreakable, and you can't buy old ones any more, I wonder if it'll have disgusting amounts of resale value...

On 3G

Let's (almost) gloss over the actual '3G' feature here. The standard Steve approach was used - '3G simply isn't necessary, EDGE is fine!'. Until the iPhone has 3G, at which point it's 'Look how much better 3G is!'. I have the iPhone 1, and I find EDGE just fine. My total web page download time is already faster than it was on the 3G phone I used to have, because my web browser doesn't take 20 seconds to load. 3G will be nice. But meh.

On Pricing

Gruber has a bit on the new pricing structure, and this is the interesting bit for me. What I read here is that 'subverting the old phone industry business model' didn't work. So they'll do the same thing everyone else does instead and just sell subsidised phones through carriers. So much for changing the world. But this implies that they'll do the other thing everyone else does, and sell unlocked versions of their phones for more money. If they're no longer getting a cut of the carrier revenues, why would they care any more?

Oh, and Gizmodo relays the interesting point that, sans monthy revenue to book against the iPhone, Apple may start charging for feature upgrades on SOX grounds. Except that the Apple TV gets free upgrades. The ongoing revenue thing is just an accounting 'profit from this thing is amortized over 18 months' device, no?

Disqus comments

I used to think that if I did my own thing to handle comment spam then I'd be low-hanging fruit and wouldn't have a problem. And this worked for quite a long time. But a couple of weeks ago it stopped working, and last week I turned off comments on jerakeen.org just so I didn't have to delete 30 spam comments every day. And I'm far too lazy to do anything properly to solve this.

Fortunately, Disqus have appeared recently, and they're great. I can just embed someone else's commenting framework and let them deal with the problem. The obvious downside is that the comments aren't 'really' on my page, so I won't get any google juice from them. But on the other hand, the comments aren't really on my page, so noone else will get any google juice from them. Maybe this'll make them less appealing as a spam target in the first place.

The other downside is that there's no import capability for my old comments. I've settled for just displaying the old comments in-place, with Disqus comments under them. I've also take the opportunity to make it slightly clearer when I'm syndicating comments from flickr rather than allowing them on the local site. I haven't managed to combine the count of old and new comments yet, but I'm sure I'll get it soon. Until then, pages with old-style comments will just have 2 figures for comment count. You'll live.

I went to DJUGL (pronounced 'juggle') yesterday, to watch tech talks and say hello to people. I learned the following things: (I know! Learning things! At a tech talk!)

  • IPython, an improved Python shell. Does tab-completion, amonst other things. The Django 'shell' command will use it automatically if it's installed.

  • The SEND_BROKEN_LINK_EMAILS setting - sends mail to addresses listed in the MANAGERS config variable when the Django server serves a 404. Not something I particularly want to turn on, but I liked it. I also like the way Django will send mail on every server error. The absolute fastest way to get live crash bugs fixed is to mail all the developers every time they happen.

  • There was some cool middleware that displayed profiling information. Must use it in something.

  • The django-tagging application bears looking into at some point.

Simon has talk notes up.

As is my wont, I'm in the middle of porting jerakeen.org to another back-end. This time, I'm porting it back to the Django-based Python version (it's been written in rails for a few months now). It's grown a few more features, and one of them is somewhat smarter comment parsing.

This being a vaguely technical blog, I have vaguely technical people leaving comments. And most of them want to be able to use HTML. I've seen blogs that allow markdown in comments, but I hate that - unless you're know you're writing it, it's too easy for markdown to do things like eat random underscores and italicise the rest of the sentence by accident. But at the same time, I need to let people who just want to type text leave comments.

The trick then is to turn plain text into HTML, but also allow some HTML through. Because the world is a nasty place, this means whitelisting based on tags and attributes, rather than removing known-to-be-nasty things. Glossing over the 'turn plain text into HTML' part, because it's easy, here's how I use BeautifulSoup to sanitise HTML comments, permitting only a subset of allowed tags and attributes:

# Assume some evil HTML is in 'evil_html'

# allow these tags. Other tags are removed, but their child elements remain
whitelist = ['blockquote', 'em', 'i', 'img', 'strong', 'u', 'a', 'b', "p", "br", "code", "pre" ]

# allow only these attributes on these tags. No other tags are allowed any attributes.
attr_whitelist = { 'a':['href','title','hreflang'], 'img':['src', 'width', 'height', 'alt', 'title'] }

# remove these tags, complete with contents.
blacklist = [ 'script', 'style' ]

attributes_with_urls = [ 'href', 'src' ]

# BeautifulSoup is catching out-of-order and unclosed tags, so markup
# can't leak out of comments and break the rest of the page.
soup = BeautifulSoup(evil_html)

# now strip HTML we don't like.
for tag in soup.findAll():
    if tag.name.lower() in blacklist:
        # blacklisted tags are removed in their entirety
        tag.extract()
    elif tag.name.lower() in whitelist:
        # tag is allowed. Make sure all the attributes are allowed.
        for attr in tag.attrs:
            # allowed attributes are whitelisted per-tag
            if tag.name.lower() in attr_whitelist and attr[0].lower() in attr_whitelist[ tag.name.lower() ]:
                # some attributes contain urls..
                if attr[0].lower() in attributes_with_urls:
                    # ..make sure they're nice urls
                    if not re.match(r'(https?|ftp)://', attr[1].lower()):
                        tag.attrs.remove( attr )

                # ok, then
                pass
            else:
                # not a whitelisted attribute. Remove it.
                tag.attrs.remove( attr )
    else:
        # not a whitelisted tag. I'd like to remove it from the tree
        # and replace it with its children. But that's hard. It's much
        # easier to just replace it with an empty span tag.
        tag.name = "span"
        tag.attrs = []

# stringify back again
safe_html = unicode(soup)

# HTML comments can contain executable scripts, depending on the browser, so we'll
# be paranoid and just get rid of all of them
# e.g. <!--[if lt IE 7]>h4x0r();<![endif]-->
# TODO - I rather suspect that this is the weakest part of the operation..
safe_html = re.sub(r'<!--[.n]*?-->','',safe_html)

It's based on an Hpricot HTML sanitizer that I've used in a few things.

Update 2008-05-23: My thanks to Paul Hammond and Mark Fowler, who pointed me at all manner of nasty things (such as javascript: urls ) that I didn't handle very well. I now also whitelist allowed URIs. I should also point out the test suite I use - all code needs tests!

I realised that the version of EmusicR (my Emusic download client) I've been using myself for months now wasn't actually the released version. Oops. I've added Sparkle into it (mental note - write up how to do this in PyObjc, because it's really easy and worth doing) and put up a new binary on it's code page. If anyone cares. Me, I prefer it to the real client.


In the mobile world, what have we done? We created a series of elegant technology platforms optimized just for mobile computing. We figured out how to extend battery life, start up the system instantly, conserve precious wireless bandwidth, synchronize to computers all over the planet, and optimize the display of data on a tiny screen

http://mobileopportunity.blogspot.com/2008/02/mobile-applications-rip.html

Wow. Pity they never released any of that stuff and we had to put up with Windows Mobile and Symbian.

This doesn't undermine the core argument, though. Web applications suck on mobile devices - when you're underground, or there's no signal, you just can't use them. And mobile web browsers (other than the iPhone's) are nasty. None of this matters, because they're easier to write, easier to deploy, and I don't have to faff for 3 weeks to get a developer certificate to sign my app so it'll run.

Aside - last time I tried to get a Symbian developer certificate (naturally, the free ones are IMEI-locked and you have to pay money if you want to distribute something) the website was utterly broken, took minutes per page load, and eventually told me that they weren't giving out developer certificates this week and please try again later. When it's easier to jailbreak your iPhone and write apps for it than it is to legitimately develop for Nokia phones, something is wrong.

This is why I believe that the iPhone SDK will be nothing cleverer than off-line enabling of web apps. Recent buzz is sounding like it's closer to being a Real SDK, though. I think this would be a pity. I'd much rather have a standards-based platform that could be implemented by other phone providers at this point.

Released Shelf 0.0.12

Another week, another Shelf release - this one is 0.0.12 - read the release notes or download the binary.

Loads of stuff in this one, but muttley may like the fact that you can now turn off the background poller and have Shelf look for context only when you hit a global shortcut key. This will also make life nicer for people with smaller screens who don't want this widow popping to the foreground every time it can figure out who you're looking at.

Other than that, there are lots of improvements. Shelf should be faster and make less gratuitous network requests. Feed display is prettier, and I make an effort to display recently updated feeds at the top, rather than in random order.