Tom Insam

Page

It’s been pointed out that I could consider Twitter a microblogging platform, and that many people do. But (a) I prefer to keep twitter an ambient socialability thing, and (b) the 140 character limit is just too annoyingly constraining.

Page

No longer syndicating anything into the tumblr feed. Seems like a mis-use of it, frankly. Let’s see if I can’t put real things here.

In a rails console:

>> (Time.now + 30.years).class
=> Time
>> (Time.now + 31.years).class
=> DateTime
>> (Time.now + 30.years).to_s
=> "Sat Oct 17 17:27:11 +0100 2037"
>> (Time.now + 31.years).to_s
=> "2038-10-17T17:27:13+01:00"

Time objects after about August 2028 can't be expressed internally as 'UNIX epoch (1st January 1970) plus N seconds' where N is a 32 bit integer. Once a Time object tries to express a date after this point, it gets silently converted to a DateTime object, which presumably uses a different internal representation. It also stringifies differently, has different methods, and is just generally annoying behaviour.

When you have dates in a MySQL database, and use ActiveRecord, DATETIME columns come out of the table as either a Time or a DateTime object depending on what the expressed date is. Lovely. This bug indicates that dates before 1970 behave similarly (they can't be expressed as unsigned epoch times either).

This is apparently desired behaviour.

Update: Apparently, DateTime objects are also much slower than Date objects.

I like to run my own jabber server, so that I can be contacted as tom@jerakeen.org. Also, I'm a sucker for punishment. I've run serveral different Jabber servers over the last year or so, and yesterday I started toying with ejabberd. It was probably the easiest to set up of any of the servers I've tried, and I recommend it.

I'm running Debian etch, and installing the daemon was a matter of:

sudo apt-get install ejabberd

Once installed, edit /etc/ejabberd/ejabberd.cfg. A '%' at the beginning of a line is a comment, and lines finish with a '.' character. This config file is read only once, and the settings are put into the ejabbed server database on startup. Unfortunately, that's probably already happened, so uncomment the override_acls. directive - this makes the server re-read the ACL settings from this file on next startup.

I'll assume that you own the 'example.com' domain and want the JID 'user@example.com'. Uncomment the line below '%% Admin user'. It wants to be something like

%% Admin user
{acl, admin, {user, "user", "example.com"}}.

Change the line below '%% Hostname' to set the hostname of the server:

%% Hostname
{hosts, ["example.com"]}.

You may want to look through the rest of the settings. But don't bother, they're all very boring. Now restart the server, to pick up the new settings:

sudo ejabberdctl restart

ejabberdctl can also register your admin / jabber user if you've turned off anonymous registration:

sudo ejabberdctl register user example.com <password>

Right, you're done. Assuming that the DNS A record for example.com resolves to the machine you've been playing with (it doesn't have to, see below), you now have a Jabber server with an admin user. You can visit http://example.com:5280/admin to administer your server, but there's not a huge amount to do there.

DNS SRV records

If the A record for example.com doesn't resolve to your server you can still run a server for example.com by pointing DNS SRV records to your server. In fact, you should do this anyway, in the same way that your email will arrive if the A record for your domain points to the mail server, but MX records are still a good idea.

Assuming your Jabber server runs on a machine called jabber.example.com, you'll want the following scary DNS records:

_xmpp-client._tcp 900 IN SRV 5 0 5222 jabber.example.com.
_xmpp-server._tcp 900 IN SRV 5 0 5269 jabber.example.com.
_jabber._tcp      900 IN SRV 5 0 5269 jabber.example.com.

You can check that they're been set properly using this excellent tool, but it'll probably take a while for the DNS updates to propagate. If you have the dig command line tool, you can also try

dig -t srv _xmpp-client._tcp.example.com

to ask your local DNS server for one of the SRV records.

Alternatives

You don't have to use ejabberd. Viable alternatives are:

  • djabberd - lovely if you know Perl and want to extend/hack on a Jabber server. Unfortunately it's somewhat tricky to configure out of the box, isn't in Debian, and needs various things checked out from subversion repositories if you want to do esoteric things like preserve your friends roster across daemon restarts or have messages queued when you're offline.
  • jabberd - I really don't want to trust an internet server written in C any more. It was the original/first Jabber server, if this makes you approve of it more.
  • Not running your own Jabber server - Very worth considering. Unlike running your own mail server or web server, it's very hard to change your mind later and have someone else host it. I know of very few 3rd party Jabber hosting providers. Yet. Running your own server is purely a vanity thing, but hosting your own email domain used to be a vanity thing too. However, one company that will host your Jabber server for you is..
  • Google apps for your domain - One of the apps Google provide is a chat (Jabber) server. You can ignore everything else they do and just use the Jabber server part, assuming you have enough DNS access to your domain to point the SRV records to it.

I'm trying to speed up a rails app here, and I've been making some assumptions that I've realised may not actually be true.

Specifically, if I have a list of IDs, I've been assuming that

list_of_ids.map{|id| Model.find(id) }

is going to be slower than

Model.find( list_of_ids )

Presumably, the latter will only make one SQL call to fetch all the objects, but the former will make a call per ID. This is because I'm used to perl, where the ORMs are stupid, the language is fast, and the DB is always the bottleneck.

But the sort of SQL produced by the supposedly slower approach is much more cacheable. The supposedly faster approach will tend to generate a different SQL query every time, whereas a sufficiently smart cache layer could intercept the SQL calls of the later approach and just hand back the models.

Initial benchmarking seems to have the one-SQL-call approach faster anyway. It does turn out to have a disadvantage, though - Model.find( ids ) doesn't return objects in the same order that the IDs were in, whereas the map approach does. That's fairly easy to fix, though:

class ActiveRecord::Base
  class << self
    def find_in_order( ids )
      # return all instances with the passed ids, in the order that the ids are in
      objects = self.find( ids )
      objects = objects.sort_by{|o|ids.index(o.id)}
      return objects
    end
  end
end

URLs suck

jerakeen.org contains just about everything I produce, and I try to use sensible urls - every page has a unique url, of the form /category/slug/ or /category/yyyy/mm/dd/slug/. The index for a category is just /category/. Pages have tags, and you can see all pages belonging to a tag at /tags/tagname/ as well. You can search using /search?q=search_term. Finally, you can get a feed of any category at a /category-name/feed page.

Enough back-story. This system is just about sensible, but only because it's so simple, and even then there are odd bits. For instance, why category/feed? Why not feed/category? Why does the search page have a CGI parameter in it, but not the tag pages? Suppose I wanted an RSS feed of pages tagged with 'python', what would that URL look like? /feed/tag/python? /tag/python/feed/? Feed types are not sub-headings under categories. The single-page permalinks make sense, but why do they encode only the category, date and slug? Why not the tags?

Essentially, I want to describe several dimensions of filtering as well as a view type in a single URL, and I'm feeling constrained by the requirement to have a linear path. I want to describe lists of pages filtered by category, tags, search terms and dates, and I'd like to view this list of page as an HTML file, or as RSS, or Atom, or JSON... Essentially, I'd like the innards of my site to be a pipeline - I perform searches, get a list of pages, sort them, then render the list. Each step has little to do with the other steps. I also shouldn't have to do anything special to my code to add an RSS feed to a tagged page list - it should have a feed automatically just because it's a list of pages.

At this point, something similar to ?category=blog&tag=python&view=atom makes a lot more sense as a sensible URL. It's actually describing intent properly. I could always just put each of those words into a normal URL path, but there's an implicit assumption that the ordering means something there, and it's not true. There are many way of ordering parameters, of course, so the uniqueness of the URL is broken to a certain extent, but anyone trying to uniquely identify URLs really should be normalizing parameter order anyway.

This leaves me with a few problems. Parameter-using URLs are certainly a lot uglier than path-based ones. Google seem to be ok with parameters in URLs now, but only up to a point, and I have three parameters on the trivial example above already. As an alternative, blech suggests something akin to Perl's hash interpolation - an url like /category/blog/tag/python/view/html, which is an interesting idea, but still falls prey to the ordering problem. The ordering of the path atoms in an URL implies a strong hierarchy that doesn't exist here.

Essentially, URLs suck. They're not dimensional enough for my needs already, and this site is utterly trivial.