Tom Insam

Code Points

A commonly touted disadvantage of UTF-8 is that string indexing is O(n). Because code points take up a variable number of bytes, you won’t know where the 5th codepoint is until you scan the string and look for it. UTF-32 doesn’t have this problem; it’s always 4 * index bytes away.

The problem here is that indexing by code point shouldn’t be an operation you ever need!

[..]

Unicode itself gives the term “character” multiple incompatible meanings, and as far as I know doesn’t use the term in any normative text.

Let’s Stop Ascribing Meaning to Code Points