Hosting toy Rails and Django apps using Passenger

I like writing small self-contained applications, and I like writing then using nice high-level application frameworks like Django and Rails. Alas, I also like being able to run these services for the foreseeable future, and that’s a lot harder than writing them is. Running a single Rails or Django application consumes an appreciable chunk of the memory on my tiny colo, and I currently have about 5 projects I really want running all the time (this could easily grow to 50 if I had a sufficiently good way of hosting them). Ideally, I’d never stop hosting these things. Otherwise what’s the point?

It’s sometimes tempting to just write all my toys in PHP. I’m certain that PHP has the mind-share that it does primarily because it’s so incredibly easy to deploy. Ease of development is utterly trumped by ease of deployment for anything not written for internal use only for a large company. tar is easier to use than mongrel, so there are more deployed PHP apps than Rails apps. But I’m not that desperate. I like my nice frameworks.

I tried Heroku as an external host for my apps for a bit, and it’s great. Very easy to start things, very easy to leave them up, and the free hosting plan is perfectly adequate for your average web application. Alas, there are a couple of raw edges that only really became apparent after using them for a few weeks. Firstly, they want to charge me for using custom domains, and I’m not willing to park my apps on domains that don’t belong to me. Secondly, their service goes through odd periods of 500 errors. This doesn’t bother me – what does bother me is that there is no official reaction to any of the complaints about it on what seems to be the official mailing list. Finally, quite a lot of the things I do need cron scripts, for polling services, etc, and the heroku crons (a) aren’t very reliable that I’ve found, and (b) cost money. So I’m edging away from them recently. Would still recommend them for prototyping, not sure I’d want to host anything Real there just yet.

(An aside – I’m not unwilling to pay any money at all. I will happily pay money for things that matter. But these apps are toys. The average number of users they have is ‘1’. I’m not willing to pay a fiver a month per application to be able to host them on my domain rather than Heroku’s domain. A fiver a month for all of them at once? Sure. But the Heroku payment model assumes that you have a small number of apps that you care about, rather than a large number of apps that you don’t.)

Anyway, my current attempt at solving this problem is Phusion Passenger (via mattb), which does exactly what I want, for Rails apps. It’s an Apache 2 or nginx module, and it’s trivially easy to install, unless you’re using Debian, which I am. Short verison? It was a lot easier to totally ignore the debian packaging system except to install ruby, then build rubygems and everything else I needed from source. Sigh. I understand there are horrible philosophical differences underlying this pain. But it’s still pain.

Once installed, you can just point your domain’s DocumentRoot at a Rails app’s ‘public’ folder, and the Right Thing happens – files in public are served directly, other requests will cause a rails process to be started, and serve your app. Enough idle time, and it’ll shut down again. Magic. My favourite part is that it’ll start up the application server as the user who owns the ‘environment.rb’ file of your application, meaning that your app is running as your user, and can do things like write files into temp folders that don’t need sudo to be able to delete again.

Not all of my projects are Rails apps, though. jerakeen.org is a Django app, for instance (this week, anyway). Unexpectedly, it turns out that Passenger will do the same thing for Django apps, though it’s not as well documented. I have a file called passenger_wsgi.py in the root folder of my Django application folder. It looks something like this (if you use this, you’ll need to change the settings module name):

import sys, os
current_dir = os.path.dirname( os.path.abspath( __file__ ) )
sys.path.append( current_dir )
os.environ['DJANGO_SETTINGS_MODULE'] = 'mydjango.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

And in my Apache config file, I have this:

<VirtualHost *:80>
  ServerName jerakeen.org
  ...
  DocumentRoot /home/tomi/web/jerakeen.org
  PassengerAppRoot /home/tomi/svn/Projects/mydjango

and thus are all my toy projects now brought up and down on demand. I’m happy again. Till next week, probably. ONWARDS.

Shelf – Context for MacOS

I really miss Dashboard. It was an effort to display some context around whatever person you were interacting with at any given moment – look at an email from Paul, or open an IM chat with him and you’d see things that he’d blogged or uploaded to Flickr recently. Genius. From the screenshots, it looks practically magic, tying into incoming SMS messages, IM conversations, the RSS feed reader, etc.

Alas, I never had a fully working Dashboard setup locally, mostly because applications had to actively participate in the process – they sent things called ‘cluepackets’ to the dashboard application containing hints about the current context. Because of this design, every app involved needed its source code patched and a recompile. This was a complete pain. Obviously, had everything gone to plan, the patches would have been merged and everyone would have been happy. I presume that Dashboard failed because the bootstrapping process was so hard that no-one used it.

Anyway, inspired by both Dashboard and Aaron‘s obsession with the address book, I’ve had a stab at doing it again, but worse.

Shelf

Shelf will look at the current foreground application, and try to figure out if what you’re looking at corresponds to a person in your Address Book. Then it’ll tell you things about them.

Update 2008/01/08: I have downloadable versions of Shelf now. Go to the project page and download one.

Shelf screenshot

It’s for MacOS. Because on MacOS, I have OSA – I can interrogate most (well-written) applications about their state in a beautiful, language-agnostic and fast manner. I can ask Mail.app for the email address of the current mail. I can ask Safari what the URL of the foreground window is. I can ask Adium for the account details of the current chat. I can ask NetNewsWire for the homepage URL of the current subscription. And I can ask the system what app is in the foreground. I can also interrogate the system address book via the Cocoa bindings for same and find out what users have got that email address, or URL, or AIM screen name. And then I can take all the other information about them in their address book entry, and figure out some context. Oh, and the thing’s written in Ruby, because the Ruby scripting bridge is a thing of serious beauty and should be played with by everyone.

Good thing

So, advantages. I don’t have the bootstrapping problem, because most MacOS applications already have enough of a scripting interface that I can extract information from them. Firefox is proving to be a serious problem, alas, but I’ve hit no other apps I can’t get something useful out of.

Once I have an Addressbook record as context, I can update the interface with a picture of the person and their name/company (direct from the address book, so easy). As a ‘will this work?’ experiment, I’m parsing every referenced URL in the address book card for RSS feeds, and displaying those as context. And (because I work there) I have special-case Dopplr support that tells me where the person is in the world and where they’re going next. This means that when someone IMs me, a window pops up and tells me where they are, when they’re back, and what they’ve blogged recently. Awesome.

addressbook screenshot

The system address book is great – it has multiple email address and URLs for people, so I’m indicating things like Dopplr username by just putting the url to my traveller page in my address book entry. I can parse the username out later and use it to call the API with. This has the advantage that if I visit my Dopplr page in Safari, hey, wow, that URL is in the address book, and it knows that it’s me again. Flickr is the next obvious choice for special-casing, but the principle extends to anything.

Bad thing

Disadvantages. Firstly, urgh, I’m polling. Every 2 seconds, I ask the system for the foreground application, then ask that application (if I know how) for context. This is probably a little heavy (is it? I’m guessing..). Secondly, I have to do explicit work for every app out there. The huge advantages of Dashboard’s cluepacket approach over mine were that packets were pushed instantly on a change of context, and that a new application was responsible for sending its own cluepackets.

Actually, this is easy. My app should have a ‘change context’ OSA method that other applications can call. Smart apps can tell me when their context changes, and I’ll just poll everyone else. Once I’ve taken over the world, everyone will be pushing messages to me, and I can deprecate the poll interface. Genius.

Recently, most of the crazy apps I’ve put here have been labelled as ‘proof of concept’. This one is different. This one probably won’t even build on your computer. I’m putting things up here as a was of musing about technique. For instance, Dashboard had a far better design than this app. It had a nice pipeline thing going for it, whereas I just have a class per foreground application, this class must produce an Address Book record, then I just interrogate every context producer for information and display it. This is silly – if I’m looking at Paul’s Flickr photos page, I don’t need my app showing me the thumbnails again, I might be much more interested in where he is right now. Hell, in a perfect world, it would work out the dates of the photos I’m looking at, and show me where he was at that time.

Future

Clever things I could (and want to) do:

  • If the foreground URL doesn’t belong to a user, look for hcard markup in the source HTML and try to derive a person from that. Right now, for instance, I’ll only recognise your Flickr page as belonging to you if it’s one of the URLs against your address book card. But Flickr pages are marked up with enough hcard that I should be just able to figure it out.

  • More intelligence around context – as above, if I’m looking at a blog of a friend, I want to see other things, not their blog again.

  • Remembering connections – if I figure out a local person from a Flickr page via hcard markup rather than an Address Book URL, why not remember their Flickr username and display their photos when they email me?

Many of these features are difficult, mostly because of my core design right now – I derive an Address Book entry from the current application, then derive context from that entry. This hampers cleverness somewhat – I really need to pass around a lot more information about how I derived this person, and keep a local cache of conclusions about them. Maybe the person isn’t in my address book – I get email from people I don’t know! But their email address might correspond to a Gravatar so I could show a picture of them. Maybe the mail has some URLs in the .sig and I could find their blog. Maybe they’ve commented on my blog in the past and I’d like links to the comments. Likewise, if I find, via hcard in the source of a page, that a page is about someone I know, should I update Address Book and add URLs for them? Probably not a good idea. So I need a local store of connections as well.

Now what?

I don’t know. It’s very tempting to rewrite the thing in Python before it gets any more complex. Partially this is because the Ruby feedparser dependencies are a bugger, but mostly it’s because I don’t want my python sk1llz to atrophy down to nothing. Recently everything I do is in Ruby, and I don’t like that. Shelf also desperately needs some work done to make it asynchronous, and cache things – when I look at an email right now, it’ll hang for 5 minutes while it goes off and fetches 20 RSS feeds, every time I change the person I’m looking at. Not exactly pleasant. But the ‘find out about a person’ is really just a trivial example of the sort of things you can do once you know who they are. The ‘derive context from current machine state’ side of things is much more interesting.

Learning Ruby using Rails

Recently I got dropped in the deep end and had to learn both Ruby and Rails very quickly. I didn’t think this would be a problem – everyone raves about how easy Rails is to pick up, right? – and it wasn’t. The problem is actually arriving now, as I start trying to use Ruby for things other than Rails applications. And I can’t, because I’ve learned all sorts of nice Ruby tricks that looked like they were core language features but actually turn out to be added to the built-in Ruby objects by Rails.

For instance, I really like the 3.days convention for turning numbers into time intervals. That’s added by this extension. In fact, in digging for this, I found out just how many things Rails adds to core Ruby. I’m scared.

I’m torn. I’d like to consider messing with the built in objects confusing and dangerous. And I’ve been bitten by this before. I’ve also had problems where one module’s patching to a Ruby builtin interferes with another module’s patching of the same object. Lovely.

At the same time, though, I love it. I love both the huge convenience and readability of being able to write Time.now + 3.days, and the fact that the language lets me do this. All languages should be this consistent – none of this ‘some types are special’ crap.

There are trade-offs. I love Python, but I hate that map is a global function and not a method on arrays, and I hate that certain types are special and immutable. But I’m sure there are scary speed benefits from doing things this way.

I wonder if part of the reason that Ruby has this ‘just for Rails’ reputation is because, having learned Ruby for Rails, you can’t use that Ruby for anything else without unlearning a stack of habits?

Irritating 2038 Ruby / Rails behaviour

In a rails console:

>> (Time.now + 30.years).class
=> Time
>> (Time.now + 31.years).class
=> DateTime
>> (Time.now + 30.years).to_s
=> "Sat Oct 17 17:27:11 +0100 2037"
>> (Time.now + 31.years).to_s
=> "2038-10-17T17:27:13+01:00"

Time objects after about August 2028 can’t be expressed internally as ‘UNIX epoch (1st January 1970) plus N seconds’ where N is a 32 bit integer. Once a Time object tries to express a date after this point, it gets silently converted to a DateTime object, which presumably uses a different internal representation. It also stringifies differently, has different methods, and is just generally annoying behaviour.

When you have dates in a MySQL database, and use ActiveRecord, DATETIME columns come out of the table as either a Time or a DateTime object depending on what the expressed date is. Lovely. This bug indicates that dates before 1970 behave similarly (they can’t be expressed as unsigned epoch times either).

This is apparently desired behaviour.

Update: Apparently, DateTime objects are also much slower than Date objects.

Getting ActiveRecord objects by ID

I’m trying to speed up a rails app here, and I’ve been making some assumptions that I’ve realised may not actually be true.

Specifically, if I have a list of IDs, I’ve been assuming that

list_of_ids.map{|id| Model.find(id) }

is going to be slower than

Model.find( list_of_ids )

Presumably, the latter will only make one SQL call to fetch all the objects, but the former will make a call per ID. This is because I’m used to perl, where the ORMs are stupid, the language is fast, and the DB is always the bottleneck.

But the sort of SQL produced by the supposedly slower approach is much more cacheable. The supposedly faster approach will tend to generate a different SQL query every time, whereas a sufficiently smart cache layer could intercept the SQL calls of the later approach and just hand back the models.

Initial benchmarking seems to have the one-SQL-call approach faster anyway. It does turn out to have a disadvantage, though – Model.find( ids ) doesn’t return objects in the same order that the IDs were in, whereas the map approach does. That’s fairly easy to fix, though:

class ActiveRecord::Base
  class << self
    def find_in_order( ids )
      # return all instances with the passed ids, in the order that the ids are in
      objects = self.find( ids )
      objects = objects.sort_by{|o|ids.index(o.id)}
      return objects
    end
  end
end