Tom Insam

further notes on JSON

Off I go, making random unsubstantiated claims about the danger of using JSON with non-ASCII characters. This called for a Test. So I wrote one. Visit my JavaScript unicode test page and see how your browser interprets external JavaScript files - I serve an 'é' using JavaScript to the page via 3 methods and 2 character set encodings, and try to render them all.

My conclusions from some limited testing? Owch. You can't include a JavaScript file and expect the client to interpret it properly, unless you control both the server serving the JavaScript and the HTML page requesting it, and can make sure that they're both in the same character set. Alternatively, you can escape all non-ascii characters in your JavaScript files using the xXX or uXXXX notations, which seems to work everywhere I've tried, but also seems like a pathetic work-around. Anyway, needing a work-around for only the non-obvious case means that no-one will actually do it, because no-one ever seems to bother testing with non-ASCII (see any on the JSON examples page, for instance?).

However, requesting JSON using XMLHTTPRequest seems to do the Right Thing in every browser I've tested, including those that include JavaScript wrongly. So if you're using JSON as an RPC transport, instead of XML, for instance, it looks safe. From a character-set point of view.