Unsubscribed from LMN Tactical Newsletter

You really want your template language to automatically escape all strings unless they’re flagged as ‘I know this contains HTML and I know what I’m doing’. This stops many trivial forms of cross-site-scripting attacks.

You probably also want certain columns of your database to be annotated in such a way that your CMS doesn’t accidentally display them to users.

the Blogger, Movable Type and MetaWeblog XMLRPC APIs

The first XMLRPC blog API was the Blogger API which was very limited – you could create and edit very simple representations of posts. This was extended by both Movable Type and MetaWeblog, adding more complicated post types (you could set titles, keywords categories, etc) and the ability to get a list of posts available on the server. All of these APIs have their own namespaces, their own variable name prefixes, and varying support from clients. An example

  • metaWeblog.getCategories ( MetaWeblog API )

    metaWeblog.getCategories (blogid, username, password) returns struct

    The struct returned contains one struct for each category,
    containing the following elements: description, htmlUrl and rssUrl.

  • mt.getCategoryList ( Movabletype API )

    Description: Returns a list of all categories defined in the weblog.

    Parameters: String blogid, String username, String password

    Return value: on success, an array of structs containing String
    categoryId and String categoryName; on failure, fault.

Because you don’t know which clients will be making which calls, you need to implement both. Further, the MetaWeblog API doesn’t support post excerpts or extended content, but the MovableType API extends certain MetaWeblog calls with extra parameters to get and set these properties.

Essentially, if you have a blog that supports excerpts, keywords – anything other than posts with a title and content – then you need to implement all of the APIs. Then you find that some clients have arbitrarily decided that the blogid parameter must be an integer, despite the specs explicitly giving it a ‘String’ type.

It all works in the end, though.

This post has been spillover from a post on the Zimki blog about implementing the MetaWeblog API on Zimki. As the Zimki blog has a very simple post structure at the moment (just titles and content right now, although annoyingly we use a markup engine that no GUI client understands) I could get away with a nice basic implementation of just the core MetaWeblog API, and so a rant about the complexity of the APIs didn’t really seem to fit there. Having written it, though, I felt it should go somewhere, and I did run into all these issues writing the jerakeen.org MetaWeblog server.

using python to access subversion repositories

I’m experimenting with a simple source code browser for jerakeen.org. Right now it’s trivial – just a list of folders and links to files, but what I’m aiming for is pages showing the check-in history of various folders, when they were last changed, etc – essentially, the sort of boring stuff I’d get for free were I to use svnweb or trac or something.

As usual, though, that’s not the point. I’d hate to have a web site that consisted of several different apps, written in different languages, needing hundreds of different apache modules, and all looking different – or needing different templates if I wanted to give them similar appearances. I’m not very good at design and building templates, so as a crazy insane developer, it’s easier for me to write a subversion browser than it would be to bully trac into looking the way I want it.

So, the pysvn bindings – Python bindings to the subversion client library. They’re lovely.

import pysvn
client = pysvn.Client()
projects = client.ls("https://jerakeen.org/svn/tomi/Projects")
for project in projects:
    print " * %s"%project.name

The logic behind the pages under /source isn’t much more complex than that. There’s no caching, I don’t have to have a local checkout, and it’s easily fast enough for a little website like this. The (fairly sparse) docs don’t make it sufficiently clear, to my mind, that you can point the client at a remote repository instead of a local checkout, but you can.

Another trick (hack) I use is providing a ‘short name’ method to the directory entry objects. I pass the objects returned from the ls call directly to the django template, but you’re not allowed to do anything clever in template space (the templates are touched by those designer people – can’t trust ‘em). To make it easier to print a human-readable name for the entries, I poke a short method into their namespace:

def short_name(self):
    offset = self.name.rfind('/') + 1
    return self.name[offset:]

PysvnDirent.short = short_name

Then the template needs a simple

<h2>files here</h2>
{% for file in files %}
  <p><a href="{{ file.name }}">{{ file.short }}</a></p>
{% endfor %}

Evil. I’m clearly still too much of a perl programmer…

Adding a metaweblog interface to django

I tend to reimplement the CMS that drives jerakeen.org more often than I add content to it, but the current Django based incarnation seems to have decent sticking power. A lot of this is Django’s magic admin interface middleware. When I add, say, a tagging engine to the site, I only need to worry about the object model and presenting it on the site itself. All the boring and much harder to write admin pages to add and remove tags just write themselves. But the other reason I’m staying with it is that I’ve now added so many features to it (because it’s easy!) that a re-write in another language would be a huge amount of effort.

This weekend, for instance, I’ve added an implementation of the metaweblog API to the site, using the excellent code on allyourpixel as a base. The main source of pain is the persistent weirdness of implementing the Movable Type extensions to the metaweblog extensions to the Blogger XMLRPC API. How can you call something a metaweblog API and not allow for post excerpts, for instance? So annoying.

editing jerakeen.org using ecto

While implementing it, I found the TextPattern API reference to be far more useful than the official spec, mostly because it covers everything up to the Movable Type extensions, which you need if you want to edit page excerpts. The other problem I encountered was that Ecto won’t talk to an endpoint over HTTPS with a self-signed certificate unless the SSL cert is in the local machine X509 database. The way it fails is incredibly unhelpful and annoying, too. The simplest way to fix it (assuming a recent macos) is to visit the endpoint in Safari. It’ll complain about the certificate – click the ‘always trust this site’ box, and it’ll stop.

the movable type import format

In a previous life, I was trying to import content from a Movable Type blog into Hayfever. Then I wanted to write an importer from Hayfever into WordPress. And wow the MT import format is nasty. Things that have annoyed me, in no particular order:

  • There’s no charset considerations in the spec. I care deeply about explicit charsets nowadays. I’m sure the implementation does something with them, but what?

  • The DATE atom is an annoying US date, with no timezone information.

  • The whole serialization format is just nasty. The WordPress importer, for instance, splits records on ‘––—nAUTHOR:’, which is presumably much more reliable (in the case that there are lines of ‘-‘ in the data), but is a fairly nasty assumption that bit me quite badly for my own importer.

  • The PING and COMMENT atoms seem to contain nested sets of atoms, but this isn’t indicated in any sort of general way – you just have to know that PING is special.

I’m just grouchy, I guess.

Blogging and Content Management

I’ve been toying with architectures for the Ultimate Content Management Application, a bit of vaporware that’s suffering from Second System Effect before I even come up with a coherent plan, and to do this I’ve been looking at content management systems. Well, ok, I’ve been trying to look at content management systems, because almost everything I find that a) calls itself a content management system and b) is free, is not a CMS, it’s a bad slashdot clone that lets you take the dates off the entries. I’m sure there’s more to content management than there is to a blog, but I can’t find any evidence to the contrary. Of course, there might be expensive things out there that do what I’m imagining, but nothing I can download. Which is no bloody good, I want to play with a Real App.

This is in weird contrast to this megnut which complains that most blogging engines are bad content management systems that let you display date-based lists of entries and call themselves blogs. Weird. So, are there any CMSes out there I can play with? Preferably written in perl…

thanks to blech for the megnut link.