Tuesday, April 24, 2007

Python: Printing unicode

By default, the stdout stream in python is assumed to have ascii encoding. While this is the only safe assumption, it gets mighty annoying when your terminal supports utf8 or Microsoft's eponymous mbcs encoding (e.g. pyDev for Eclipse), especially when you are working with unicode data that you would like to print out while debugging.

It seems like this is a common problem. In fact, it comes numerous times at work and I met a gentleman at the past BayPiggies meeting who was looking for a solution himself. It doesn't help that sys.setdefaultencoding() is a red herring that seems to throw everyone off track.

Enough with the introduction, here is the snippet I use to get my stdout to a non-ascii encoding:
    import codecs, sys
sys.stdout = codecs.getwriter('mbcs')(sys.stdout)
Of course, change 'mbcs' to 'utf8' or whatever encoding you need. You can get fancy and look up the appropriate encoding based on the terminal environment (actually, 'mbcs' does this for you on Windows), but if you're just looking to print unicode for testing/debugging, this short snippet gets you to the goal in two lines of code.

3 comments:

Sam said...

Just wanted to say thanks! I've been looking for a long time for a solution to this problem. Really helped me out.

Anonymous said...

Thanks a whole lot for that little snippet. I've been breaking my back trying to get unicode out with any sense.

Kelly Yancey said...

No problem. Glad to have been of help. While the solution is simple, it relies on knowledge of python's unicode handling, conversions, and file object support. None of which is particularly well documented in the standards docs. :|