Facebook’s Open Compute Project

On a large scale like this — not a small open-source project by good-willed individuals — “opening” something is almost always an effort to commoditize it, leveling the playing field as much as possible and marginalizing competitive advantages that others might have had.

[..]

Nobody “opens” the parts of their business that make them money, maintain barriers to competitive entry, or otherwise provides significant competitive advantages. That’s why Android’s basic infrastructure is “open”, but all of Google’s important applications and services for it aren’t — Google doesn’t care about the platform and doesn’t want it to matter.

marco. It’s interesting that open-source is now considered a weapon against other companies. Though I guess it was always intended to be such a weapon…

Talking to Jekyll using MarsEdit

Brent Simmons wrote a plea for Baked Weblogs the other week. It resonated – I’m the sort of nerd who obsessively re-writes his blogging engine more often than he actually uses it to blog things with, so I’ve been through a lot of solutions, and I keep coming back to baking.

Anyway. Brent wrote another piece in which he mentions,

I still get to write using MarsEdit, by the way. It talks to WEBrick running on my laptop.

Now, currently I blog in Tumblr, but I have the next generation version all set up and ready to cut across to when I feel like it, and it’s based on Jekyll, like the rest of my site is. I want to be able to post to it from MarsEdit! How hard could it be to build something that will let me?

Actually, it turns out to be really annoying. I’m extremely unimpressed by the MetaWeblog API. HOWEVER, finally today I have a releasable / working version of jekyll-metaweblog, a stand-alone ruby webrick server that will expose a Jekyll source tree via MetaWeblog and let you post, edit, delete, upload images, etc, etc, from MarsEdit (and hopefully anything else that supports MetaWeblog).

Get the code and play.

An aside – Talking about baking, Brent writes

[Aaron] also wrote that he doesn’t care about performance. If getting fireballed were a thing back in 2002, he might have cared about performance. If he had seen system X go down for a day, he might have cared about performance. It’s interesting that performance — or robustness — arguably wasn’t an issue in 2002, but it is now.

The Wikipedia page for ‘slashdot effect’ goes back to at least September 2001. Performance did matter. Aaron’s position is actually a lot closer to mine:

Honestly, I don’t care about performance. I don’t care about performance! I care about not having to maintain cranky AOLserver, Postgres and Oracle installs. I care about being able to back things up with scp. I care about not having to do any installation or configuration to move my site to a new server. I care about being platform and server independent. I care about full-featured HTTP implementations, including ETags, Content-Negotiation and If-Modified-Since. (And I know that nobody else will care about it enough to actually implement it in a frying solution.)

Baking has many problems, of course, but it has (for me) one huge overriding advantage – if I get bored of my codebase and want to build something else (this happens a lot), my blog doesn’t go away. It just stops getting new content. Much safer. It’s easy to build a dynamic site that’ll cope with being Fireballed and still host it on a single system. It’s hard to have to host 50 megs of mongrel process for the rest of time because you thought it would be a good idea to build some part of your site in Rails and now you can’t turn it off.

titles as metadata

Are titles on blog entries good things or not? I’m feeling an obligation to give things I write a title. But nowadays this is mostly because it forms a useful bit of text to use as a link target. Without a title, I have to excerpt the first few words of a piece, which always feels a little out of place. If I put a photo up here without a title, linking to it gets really hard.

Metadata is good, no denying it. But titles aren’t metadata in the same way that, say, geolocation on a photo is. You are at a location when you take a photo – not writing the geodata down at the time doesn’t mean it wasn’t there, it just means you’re using a bad camera. But a title isn’t an inherent truth of a blog entry, it’s a thing I have to add.

REST

There have been a couple of things I’ve been linked to recently about how some APIs claim that they are RESTian, but aren’t really – Gareth and Jens. To my mind, there’s not a lot of clarity here. So.

Things that make your web service RESTian:

  • Resources are represented by URLs.
  • Resources can be cached according to their caching headers.
  • You perform operations on those things using HTTP verbs (GET, DELETE, PUT, etc).
  • You discover other resource URLs by examining other resources – for instance GETting the /books/ resource might return a document that contains the URLs of all the books.
  • You can request a representation of a resource in different formats using the Accept header.
  • Your API is stateless. (Presumably allowing for things like rate-limiting, of course)

Things that mean your API is NOT RESTian

  • Your documentation describes how to construct URLs based on object IDs (/books/{id}) and is the only way of finding these URLs
  • Your API has a single endpoint and you pass the method as another parameter.
  • HTTP verb use is restricted to just POST, or POST and GET.

Things that have no bearing on the RESTiness of your API:

  • Your APIs look meaningful (like /books/{id}).
  • You return JSON. Or XML. Or anything in particular.
  • You use the word REST in the documentation a lot.

Things that aren’t magically true just because your API is RESTian:

  • Writing clients is easy.
  • You don’t even need a client, you can just derive everything from a single endpoint.
  • Your API maps properly onto your business objects and therefore makes sense.
  • Your API will scale properly.
  • Your API is easy to extend
  • You won’t get support requests from people who didn’t read the documentation.

Hopefully this should help things a little.

If you disagree with anything above, mail me, I’m genuinely interested if I’ve misunderstood something here.

I also have my own opinions on REST vs non-REST. But I’m trying to be factual here.

Opinions on REST

I wrote a thing about what is and isn’t REST. That was (what I believe are) the facts. This bit is opinion.

Things that actually matter:

  • You’ve shipped something.
  • Your API has client libraries, even if they’re trivial wrappers (because people will complain otherwise, or write bad ones).

REST is a nice dream. But I’m not personally a fan. RPC is how programmers’ minds work, is the problem. Call method, receive bacon.

Here are some top-of-the head problems with a pure REST API:

  • Result pagination – suppose /books/ returns 1 million books? How do I know how many there are? Maybe it returns a list of pages? Suppose there a million pages? Does it merely return the first page and a link to the next one? How do I get page 100? Stack Overflow mentions the Range header, but I can’t just magically use this as a client – the documentation will have to tell me what values are valid for this header. (Does the Range header even go in the OAuth signature? It seems not. Is this a security problem?)

  • How do I even find the links in the first place? Look for any string in the returned data structure that matches ^http://? Presumably the response format needs to be documented, and responses are going to differ based on the sort of object I’m requesting, so in practice, you’re going to have documentation that says ‘responses for URLs under /books/ look like this’, and your URL structure is exposed again.

  • Am I allowed to make verbs up? Suppose I want to be able to flag a particular resource as offensive. Do I post to /books/4/offensive (how do I discover this URL), or POST offensive=1 to /books/4 (but I’m not changing the state of a book, so that doesn’t seem right either)?

These aren’t big problems. They have obvious solutions, even. But the solutions aren’t covered by just saying ‘It’s REST’ – you need to document them.

As a new developer, the main thing a purely REST API is going to differently is that, when I want to do pagination, the docs won’t say ‘add a page parameter’, they’ll start talking about Range headers. When I want to get JSON back instead of XML, I’ll have to look at Accept headers. When performing operations on objects, they need to work out which verb is appropriate here.They will need to store complete URLs to your objects in their database rather than just IDs. (In practice, they’ll just reverse-engineer your ID-to-URL mapping and store the IDs, then complain when you change something and their code breaks.)

Or maybe the developer will be using a high-level client that hides all of this, in which case there is no difference, except that your client libraries have to be a lot more complicated. But you’ve also hidden most of the benefits.

In practice, you will still end up documenting all of your different resource types, what URLs you can find them under, how to parse their representations, and how to find other objects based on those representations. You’ll just have to document them using lots of highly-specific HTTP language, which means your documentation is going to be much harder to use for all the people using clients that hide all this stuff.

The best flickr clients (to my mind) are the ones with a call_flickr( method, params ) function, that takes one of the Flickr API methods, adds the parameters to it, does the authentication signing dance, and returns you the response, parsed into some in-memory data structure. To do this requires some knowledge of the Flickr API, sure, and I can’t just point this client to some other web service’s endpoint and get data out. But REST doesn’t solve this problem either, and I believe it makes the simple things harder.

Every response should contain hyperlinks to other resources, where the structure/metadata around it indicates what’s on the other end. It should still be self-describing even if the URL is completely opaque — “nice URLs” do not make you RESTful.

You should never need an “endpoint” or any documentation at all, whether machine or human-readable. I should be able to explore your whole API just using HTTP. Ideally you’d use the exact same URLs for your API and browser-human interfaces, using the Accept: header to get different representations of the response.

Almost everybody fucks this up, and just uses ‘meaningful’ URLs that you’re expected to build yourself, and lots of sassy human documentation to tell you what the constants are.

Hacker News | True REST abhors the idea of separate “standardized, machine-readable documentat….

Total rubbish, of course. The only actual ‘proper’ REST interface anyone I talk to can come up with is AtomPub and that’s awful. Me, I like XMLRPC. Forget the ‘XML’ part – it’s RPC. RPC that is very very simple and works and every endpoint works the same. I’ve lost count of the number of times I’ve had to write yet another bloody ‘encode form variables and parse response as JSON’ clients for supposedly ‘REST’ APIs.