Code Points

A commonly touted disadvantage of UTF-8 is that string indexing is O(n). Because code points take up a variable number of bytes, you won’t know where the 5th codepoint is until you scan the string and look for it. UTF-32 doesn’t have this problem; it’s always 4 * index bytes away.

The problem here is that indexing by code point shouldn’t be an operation you ever need!


Unicode itself gives the term “character” multiple incompatible meanings, and as far as I know doesn’t use the term in any normative text.

– Let’s Stop Ascribing Meaning to Code Points