Gede: Regarding the “funny quotes”, since they are Unicode now, software has no excuse not to play ball with them. The problem is mostly due to those guys who think that “7 bits per character ought to be enough for anybody”.
Maighstir: There are loads of more-or-less useful characters that GOG completely bugs out at.
It does!? :-S
I have not noticed it. Do you remember some?
But that reminds me I have something to say to Wishbone:
Wishbone: I only recently learned that UTF-32 actually exists, although I cannot for the life of me figure out why anyone would need it, or why anyone would think using it for anything would be a good idea.
Here are my thoughts, though I have not done proper research on it.
If your text uses mainly code points exceeding U+FFFF, I think you will see a reduction on the overall document size if you encode it with UTF-32 rather than UTF-16 or UTF-8.
I don't think there are many texts where this size reduction is very meaningful, but there is one more difference. Contrary to the UTF-8 and UTF-16, UTF-32 is a
fixed-size encoding. This means that transforming or addressing any character is a trivial matter.