Don't Fear Unicode

Jacinta Richardson (Perl Training Australia)
Perl, Programming
Location: E146
Average rating: ***..
(3.86, 7 ratings)

Unicode isn’t new, but it still seems hard when you’re starting at the beginning and haven’t even been told the difference between a glyph, a codepoint, a character and a byte. Every year there are talks and tutorials at conferences about it, but if you haven’t grasped the basics, you can feel frustrated and lost much too quickly.

Unicode sneaks into the most unexpected places. Do you ever wonder if your life would be much, much easier if your default encoding was not ASCII? Do you know what UTF-8 and Unicode strings are? Do you know what your default encoding is, or how to change it? Does it all seem to hard, and make you resent anything to do with the locale?

If 7-bit ASCII was good enough for me, it should be good enough for you! Have you been left behind with this whole Unicode thing to the point that you’re confused and resentful of it all? I know I was. When your name, and everything you write works wonderfully in ASCII it can be hard to summon the enthusiasm to learn about Unicode, even when you know that you should be handling your data better.

Imagine your code is using a logging library, that expects strings. What does it do when you pass it a Unicode object? It’ll probably write it, encoding it in your default encoding (probably ASCII). And it’ll probably work, on all of your test cases, and on most of your data. Until someone comes on with a non-ASCII character in their name, and causes your code to throw an exception. You probably weren’t expecting it, it might not even be your library that’s at fault. Unicode works implicitly just often enough that unicode characters can sneak in well before you realise your code isn’t robust enough to handle them.

This talk will cover the essentials of Unicode, locale and how they affect things like regular expressions, reading and writing files, passing unicode and out of databases and sending it out to the world. Perl will be the programming language used to demonstrate these ideas, but much of the content should be accessible to all programmers.

Photo of Jacinta Richardson

Jacinta Richardson

Perl Training Australia

Jacinta Richardson runs Perl Training Australia, a micro-business offering courses throughout Australia. Both as part of her job and a massive free-time sink, she is involved in running conferences (linux.conf.au 2007, Open Source Developers’ Conference (Australia) 2004-2011, Australian System Administrators Conference (SAGE-AU) 2008-2009), attending conferences, writing perl-tips, speaking at Perl Monger meetings whenever she’s in the right town, participating in on-line Perl forums and promoting women in IT. For her work in the Perl community, Jacinta was awarded the White Camel Award in 2008. When away from the computer, Jacinta enjoys scuba diving, cycling and baking.

Comments on this page are now closed.

Comments

Picture of ben hengst
ben hengst
07/20/2012 10:28pm PDT

Thank you again for teaching me everything that I know about unicode, How this all works, why it is important, and where the traps are.

Sponsors

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

View a complete list of OSCON contacts