URLs suck

06 Aug 2007

jerakeen.org contains just about everything I produce, and I try to use sensible urls - every page has a unique url, of the form /category/slug/ or /category/yyyy/mm/dd/slug/. The index for a category is just /category/. Pages have tags, and you can see all pages belonging to a tag at /tags/tagname/ as well. You can search using /search?q=search_term. Finally, you can get a feed of any category at a /category-name/feed page.

Enough back-story. This system is just about sensible, but only because it's so simple, and even then there are odd bits. For instance, why category/feed? Why not feed/category? Why does the search page have a CGI parameter in it, but not the tag pages? Suppose I wanted an RSS feed of pages tagged with 'python', what would that URL look like? /feed/tag/python? /tag/python/feed/? Feed types are not sub-headings under categories. The single-page permalinks make sense, but why do they encode only the category, date and slug? Why not the tags?

Essentially, I want to describe several dimensions of filtering as well as a view type in a single URL, and I'm feeling constrained by the requirement to have a linear path. I want to describe lists of pages filtered by category, tags, search terms and dates, and I'd like to view this list of page as an HTML file, or as RSS, or Atom, or JSON... Essentially, I'd like the innards of my site to be a pipeline - I perform searches, get a list of pages, sort them, then render the list. Each step has little to do with the other steps. I also shouldn't have to do anything special to my code to add an RSS feed to a tagged page list - it should have a feed automatically just because it's a list of pages.

At this point, something similar to ?category=blog&tag=python&view=atom makes a lot more sense as a sensible URL. It's actually describing intent properly. I could always just put each of those words into a normal URL path, but there's an implicit assumption that the ordering means something there, and it's not true. There are many way of ordering parameters, of course, so the uniqueness of the URL is broken to a certain extent, but anyone trying to uniquely identify URLs really should be normalizing parameter order anyway.

This leaves me with a few problems. Parameter-using URLs are certainly a lot uglier than path-based ones. Google seem to be ok with parameters in URLs now, but only up to a point, and I have three parameters on the trivial example above already. As an alternative, blech suggests something akin to Perl's hash interpolation - an url like /category/blog/tag/python/view/html, which is an interesting idea, but still falls prey to the ordering problem. The ordering of the path atoms in an URL implies a strong hierarchy that doesn't exist here.

Essentially, URLs suck. They're not dimensional enough for my needs already, and this site is utterly trivial.