I bake a lot of my site – everything except the blog, in fact, which is hosted on Tumblr. I do, however, also pull all the blog pages down and render them to another domain, as an emergency “Tumblr has died again” measure. I can repoint a DNS record and I’m entirely stand-alone. In theory.

Brent Simmons talks about how he bakes his blog, and it’s very similar to mine. However, he does seem to have a bit of cleverness in there that I’m jealous of:

I still get to write using MarsEdit, by the way. It talks to WEBrick running on my laptop.

I have an idea for this sketched out – clearly I have to finish implementing it. The obvious thing to do it to just expose the folder full of raw blog post sources via a metaweblog API. But it’s such a horrible protocol to write for. Not that it’s complicated. And I love XML-RPC. But it’s just grown so organically that there’s not really one authoritative source of “these are the methods you need”. You just need to keep adding extensions from various people till all your clients work.

Also, I want to make life difficult for myself and write it in PHP, so it’s easy to deploy.

Deploying sites with Apache, Jekyll, Sass and Git

After the huge rambling thing I did last time, let’s see if I can be a little more focussed. This is how I deploy the static bits of my site.

I keep the source files in git – I have a copy in this github for convenient pointing/examples. I build the site with Jekyll, which uses _config.yml as its config file. This file is blank. This is because I like the defaults. In accordance with Jekyll defaults, the layout templates for the site are in _layouts.

In this directory, I run jekyll --auto . /tmp/jekyll, which processes the whole directory tree into /tmp/jekyll (just to keep it out of the way) – if a file starts with a YAML header it’ll process it as a Jekyll source file, otherwise it’ll just copy it straight across. The process also stays running (that’s the --auto), watching for changes. This way I can edit source files and see changes instantly.

I build several domains-worth of pages at once using this tree. tom corresponds to http://tom.movieos.org and toys to http://toys.movieos.org. To preview this on my local computer, I have a few entries in /etc/hosts: tom.movieos.local toys.movieos.local

and a local apache config file containing:

<Directory "/tmp/jekyll/">
    Options Indexes MultiViews FollowSymLinks
    AllowOverride None
    Order allow,deny
    Allow from all

<VirtualHost *:80>
  ServerName tom.movieos.local
  DocumentRoot /tmp/jekyll/tom

<VirtualHost *:80>
  ServerName toys.movieos.local
  DocumentRoot /tmp/jekyll/toys

This way, I can open http://tom.movieos.local. in my browser, and see the local processed files. Useful!

Just to keep things interesting, I also don’t use CSS, I like to write sass. Now, there’s a branch of Jekyll that knows about sass, but I can’t get it to work – instead I use compass. config.rb in my site root is the compass configuration file. I run compass -w in the root of the repository and, like Jekyll, it will run and watch for changes, and rebuild the CSS files from the .sass files when I change them. Then Jekyll will see that they have changed and update the built site. More overhead, but it works very well.

When I’m happy, or whenever I’ve made some interesting changes, obviously I check things into the local git repository. Then I push the changes to the bare repository that lives on my colo. In that repository, I have a post-update hook:

unset GIT_DIR 
  && cd /home/tomi/web/movieos 
  && git pull 
  && compass 
  && jekyll . /home/tomi/web/movieos_generated

So. I unset GIT_DIR because I’m about to talk to a different repository and don’t want to confuse things. Then I go to the directory where I have the source files checked out, and git pull to get them up to date. A single run of compass generates the CSS files in that directory, then I run jekyll to build the generated site.

Finally, Apache serves them. Rather than hard-code all my domains as I have locally, I’m more complicated on the server. I have a wildcard DNS entry that points *.movieos.org to the server. My apache config looks like this:

<VirtualHost *:80>
  # This virtualhost matches _anything_ in the movieos.org domain
  ServerName wildcard.movieos.org
  ServerAlias *.movieos.org
  ServerAdmin tom@jerakeen.org

  # not used
  DocumentRoot /home/tomi/web/movieos_generated/tom
  UseCanonicalName Off

  # match the first atom of the requested hostname, and serve files
  # out the directory with that name
  RewriteEngine on
  RewriteCond %{http_host} .
  RewriteCond %{http_host} ^([^.]+).movieos.org [NC]
  RewriteRule ^(.*) /home/tomi/web/movieos_generated/%1$1 [L]

  DefaultType text/plain
  DirectoryIndex index.php index.html

This way, http://tom.movieos.org/ is served by files in /home/tomi/web/movieos_generated/tom, and likewise for all my other subdomains.

So that’s it. A post-commit hook runs two sorts of processors over the source, building it into a directory tree that’s served using Apache rewrite rules. Complicated and maybe fragile, but only at deploy time – because they’re static files, once they’re there, they stay there.

Tools for the management of content

For movieos, I’m going to try to avoid building anything like the terrifying custom CMS that powered jerakeen.org, because, although writing web apps is what I do, I’ve always felt terribly hampered by the power of the thing. Once you have a custom CMS, it’s so easy to add more features to it. Every feature you add is probably something that you can’t get anywhere else. Every time I re-wrote the thing that powered jerakeen.org, I had to reimplement more features. Every time I tried to get out of running it myself and host it on a wordpress or something, I had to give up because I couldn’t get the new thing to respect some weirdness I’d done with permalinks five years ago, or it wouldn’t handle both my blog and my code pages from the same templates, so it would be too much trouble to manage.

The custom code is also a pain to run. PHP isn’t my thing, so I tend to write personal projects in Ruby or Python, depending on whim, and that doesn’t make them easy to deploy. I have to run dedicated daemons and reverse proxy through to them from apaches, which would be fair enough for some real application somewhere, but there’s not a lot of memory in this colo box. I’d rather not allocate half of it to mongrel processes. PHP looks ugly, but I’m going to learn it one of these days just because for projects of a certain size, ability to ship easily trumps just about every other consideration.

Hence Tumblr for blogging this time. I’m attracted to it for the same reason I suspect many of my friends are – it’s very easy to put content into it, because of the first-class support it has for different post types, and it’s very hard to customise it to any significant degree. You get to pick a template, and you get to decide if your posts get permalinks with words in, or permalinks with just numbers. End of story. Simple URLs mean that if I need to leave it at some point, it’ll be easy, though I may have dug myself into a bit of a hole with the domain name.

I can’t keep everything in Tumblr, though. There’s a few other pages that I want to be able to manage. In the interests of starting with the simplest thing I can make work, these are currently just HTML files served out of an apache for now, and I’m going to add intelligence to them using JavaScript rather than server-side stuff whenever possible. That being said, it’s nice to have something in the way of a templating engine for them, so I looked at a few modern page baking solutions, after a friend mentioned webby to me as something I should look at: webgen, jekyll, nanoc and webby.

In the end, I went for jekyll. Not for any particular reason, even, it just worked. I check all my raw pages into git, and a post-update hook on the colo updates the web checkout and runs compass and jekyll across the raw files, generating my pretty HTML.

Apache can serve it just fine, it doesn’t rely on a database being up to work, and I can be confident that I can survive a slashdotting. It’s also extremely simple. Let’s see how long I can resist the urge to tinker with it..

All change

I’ve been using the nick ‘jerakeen’ for at least… hmmm… 13 years? A long time, anyway. It was a pretty good name, I think – it’s decently unique, quite easy to spell, it’s a slightly-obscure and yet nerdy reference, and (crucially) pretty unique; It’s been pretty easy to maintain myself as the top google hit for the word.

But that was 10 years ago. For a while now I’ve considered this whole ‘handle’ thing quite childish. And there are other Jerakeens now and the mis-addressed twitters and delicious links are getting annoying, not to mention the fact that I’d quite like my work to be associated with me. It’s clearly time to do what all the other Serious People have done, and just use my actual name everywhere.

Thus, I’m retiring jerakeen as a nick/handle/whatever it’s called. In hindsight, using the same name for myself and my domain was a mistake, so I’m retiring this domain entirely as well. I’m renaming myself on all the services that will safely let me do so, creating new accounts on most services that don’t, and just putting up with it on the few services (hi flickr!) that I’m pretty much tied to.

So, the new me can be found as

I’m going to move my web output to the movieos.org domain, which I’ve had knocking around unused for a while now. It’s in a terribly rough state, but I don’t expect I’ll break it too much. This gets me a new email address as well, which will probably be the most

This site will obviously stay around. Permalinks are important. But I’m going to bake it out to flat files and retire the terrifying CMS that powers it. Likewise, I assume I’ll keep watching the old accounts for a few months in case anything still gets @jerakeen-ed to me. And I’m sure I’ll forget things. But in so far as much as you have to pick a line and say ‘this is when I’m changing my name’? This is it.

Syndicating GeoRSS

Maybe I talk to Aaron too much, but I’m obsessed with geodata this week. So today the crazy mess of code that is jerakeen.org understands lat/longs on pages, and presents little ‘has a location’ links next to the tags (I’ve tagged this one with ‘where I’m sitting’ as an example). When I syndicate flickr photos, I’m pulling any geotagging data across as well, so you’ll be able to see lots of geotagged photos as well.

In the same vein, I’ve added georss to the Dopplr journal feeds. Items are tagged with the lat/long of the tip or city that they’re about. Trip items get their location from the city the trip is to. Not every item will have a geotag, but everything that I can tie to a location will do.

Feedparser (which I use to pull my Dopplr feed into jerakeen.org) needs a patch to be able to parse the GML properly (that patch doesn’t apply clean to the 4.1 release on code.google.com, but can be bullied into working pretty easily). So I also pull the lat/longs of my Dopplr updates into jerakeen.org.

I have no idea what I’m going to do with this data. It just seemed a shame to leave it lying around unexposed. I’ve put it into my RSS feeds as well (I love django), so the stream feed can be dropped into Google Maps for prettiness. And eventually I’ll make the ‘has a location’ links do something more interesting, I guess.