Hosting toy Rails and Django apps using Passenger

I like writing small self-contained applications, and I like writing then using nice high-level application frameworks like Django and Rails. Alas, I also like being able to run these services for the foreseeable future, and that’s a lot harder than writing them is. Running a single Rails or Django application consumes an appreciable chunk of the memory on my tiny colo, and I currently have about 5 projects I really want running all the time (this could easily grow to 50 if I had a sufficiently good way of hosting them). Ideally, I’d never stop hosting these things. Otherwise what’s the point?

It’s sometimes tempting to just write all my toys in PHP. I’m certain that PHP has the mind-share that it does primarily because it’s so incredibly easy to deploy. Ease of development is utterly trumped by ease of deployment for anything not written for internal use only for a large company. tar is easier to use than mongrel, so there are more deployed PHP apps than Rails apps. But I’m not that desperate. I like my nice frameworks.

I tried Heroku as an external host for my apps for a bit, and it’s great. Very easy to start things, very easy to leave them up, and the free hosting plan is perfectly adequate for your average web application. Alas, there are a couple of raw edges that only really became apparent after using them for a few weeks. Firstly, they want to charge me for using custom domains, and I’m not willing to park my apps on domains that don’t belong to me. Secondly, their service goes through odd periods of 500 errors. This doesn’t bother me – what does bother me is that there is no official reaction to any of the complaints about it on what seems to be the official mailing list. Finally, quite a lot of the things I do need cron scripts, for polling services, etc, and the heroku crons (a) aren’t very reliable that I’ve found, and (b) cost money. So I’m edging away from them recently. Would still recommend them for prototyping, not sure I’d want to host anything Real there just yet.

(An aside – I’m not unwilling to pay any money at all. I will happily pay money for things that matter. But these apps are toys. The average number of users they have is ‘1’. I’m not willing to pay a fiver a month per application to be able to host them on my domain rather than Heroku’s domain. A fiver a month for all of them at once? Sure. But the Heroku payment model assumes that you have a small number of apps that you care about, rather than a large number of apps that you don’t.)

Anyway, my current attempt at solving this problem is Phusion Passenger (via mattb), which does exactly what I want, for Rails apps. It’s an Apache 2 or nginx module, and it’s trivially easy to install, unless you’re using Debian, which I am. Short verison? It was a lot easier to totally ignore the debian packaging system except to install ruby, then build rubygems and everything else I needed from source. Sigh. I understand there are horrible philosophical differences underlying this pain. But it’s still pain.

Once installed, you can just point your domain’s DocumentRoot at a Rails app’s ‘public’ folder, and the Right Thing happens – files in public are served directly, other requests will cause a rails process to be started, and serve your app. Enough idle time, and it’ll shut down again. Magic. My favourite part is that it’ll start up the application server as the user who owns the ‘environment.rb’ file of your application, meaning that your app is running as your user, and can do things like write files into temp folders that don’t need sudo to be able to delete again.

Not all of my projects are Rails apps, though. jerakeen.org is a Django app, for instance (this week, anyway). Unexpectedly, it turns out that Passenger will do the same thing for Django apps, though it’s not as well documented. I have a file called passenger_wsgi.py in the root folder of my Django application folder. It looks something like this (if you use this, you’ll need to change the settings module name):

import sys, os
current_dir = os.path.dirname( os.path.abspath( __file__ ) )
sys.path.append( current_dir )
os.environ['DJANGO_SETTINGS_MODULE'] = 'mydjango.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

And in my Apache config file, I have this:

<VirtualHost *:80>
  ServerName jerakeen.org
  ...
  DocumentRoot /home/tomi/web/jerakeen.org
  PassengerAppRoot /home/tomi/svn/Projects/mydjango

and thus are all my toy projects now brought up and down on demand. I’m happy again. Till next week, probably. ONWARDS.

Learning Ruby using Rails

Recently I got dropped in the deep end and had to learn both Ruby and Rails very quickly. I didn’t think this would be a problem – everyone raves about how easy Rails is to pick up, right? – and it wasn’t. The problem is actually arriving now, as I start trying to use Ruby for things other than Rails applications. And I can’t, because I’ve learned all sorts of nice Ruby tricks that looked like they were core language features but actually turn out to be added to the built-in Ruby objects by Rails.

For instance, I really like the 3.days convention for turning numbers into time intervals. That’s added by this extension. In fact, in digging for this, I found out just how many things Rails adds to core Ruby. I’m scared.

I’m torn. I’d like to consider messing with the built in objects confusing and dangerous. And I’ve been bitten by this before. I’ve also had problems where one module’s patching to a Ruby builtin interferes with another module’s patching of the same object. Lovely.

At the same time, though, I love it. I love both the huge convenience and readability of being able to write Time.now + 3.days, and the fact that the language lets me do this. All languages should be this consistent – none of this ‘some types are special’ crap.

There are trade-offs. I love Python, but I hate that map is a global function and not a method on arrays, and I hate that certain types are special and immutable. But I’m sure there are scary speed benefits from doing things this way.

I wonder if part of the reason that Ruby has this ‘just for Rails’ reputation is because, having learned Ruby for Rails, you can’t use that Ruby for anything else without unlearning a stack of habits?

Irritating 2038 Ruby / Rails behaviour

In a rails console:

>> (Time.now + 30.years).class
=> Time
>> (Time.now + 31.years).class
=> DateTime
>> (Time.now + 30.years).to_s
=> "Sat Oct 17 17:27:11 +0100 2037"
>> (Time.now + 31.years).to_s
=> "2038-10-17T17:27:13+01:00"

Time objects after about August 2028 can’t be expressed internally as ‘UNIX epoch (1st January 1970) plus N seconds’ where N is a 32 bit integer. Once a Time object tries to express a date after this point, it gets silently converted to a DateTime object, which presumably uses a different internal representation. It also stringifies differently, has different methods, and is just generally annoying behaviour.

When you have dates in a MySQL database, and use ActiveRecord, DATETIME columns come out of the table as either a Time or a DateTime object depending on what the expressed date is. Lovely. This bug indicates that dates before 1970 behave similarly (they can’t be expressed as unsigned epoch times either).

This is apparently desired behaviour.

Update: Apparently, DateTime objects are also much slower than Date objects.

Getting ActiveRecord objects by ID

I’m trying to speed up a rails app here, and I’ve been making some assumptions that I’ve realised may not actually be true.

Specifically, if I have a list of IDs, I’ve been assuming that

list_of_ids.map{|id| Model.find(id) }

is going to be slower than

Model.find( list_of_ids )

Presumably, the latter will only make one SQL call to fetch all the objects, but the former will make a call per ID. This is because I’m used to perl, where the ORMs are stupid, the language is fast, and the DB is always the bottleneck.

But the sort of SQL produced by the supposedly slower approach is much more cacheable. The supposedly faster approach will tend to generate a different SQL query every time, whereas a sufficiently smart cache layer could intercept the SQL calls of the later approach and just hand back the models.

Initial benchmarking seems to have the one-SQL-call approach faster anyway. It does turn out to have a disadvantage, though – Model.find( ids ) doesn’t return objects in the same order that the IDs were in, whereas the map approach does. That’s fairly easy to fix, though:

class ActiveRecord::Base
  class << self
    def find_in_order( ids )
      # return all instances with the passed ids, in the order that the ids are in
      objects = self.find( ids )
      objects = objects.sort_by{|o|ids.index(o.id)}
      return objects
    end
  end
end