Sunday, July 22, 2007

Yeah! UTF-8 and Unicode in Py3k!

From what I just read on GvR's Python 3000 Status Update, it seems like loads of problems that crop up when using non-ASCII characters in string literals will be finally gone in Python 3.0. While developing the new perli.net codebase, this has been a problem, as everything (database + code + HTML output) is UTF-8 encoded, and if you are not careful enough, Python 2.4 will bite you with an exception (string literals containing umlauts, etc..).

Having UTF-8 as the default source encodings will make things easier for code with non-ASCII string literals. That's one of the quirks I dislike about the current Python. I'm glad this problem is taken care of in Python 3.0.

No comments: