• Microsoft
  • Nebula
  • Google
  • SugarCRM
  • Facebook
  • HP
  • Intel
  • Rackspace Hosting
  • WSO2
  • Alfresco
  • BlackBerry
  • Dell
  • eBay
  • Heroku
  • InfiniteGraph
  • JBoss
  • LeaseWeb
  • Liferay
  • Media Temple, Inc.
  • OpenShift
  • Oracle
  • Percona
  • Puppet Labs
  • Qualcomm Innovation Center, Inc.
  • Rentrak
  • Silicon Mechanics
  • SoftLayer Technologies, Inc.
  • SourceGear
  • Urban Airship
  • Vertica
  • VMware
  • (mt) Media Temple, Inc.

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of OSCON contacts

Unicode Support Shootout: The Good, the Bad, the Mostly Ugly

Location: Portland 255
Average rating: ***..
(3.91, 11 ratings)

How does Unicode support stack up across major platforms, including Java, Perl, Python, Ruby, and more? Who’s doing the best job, and who’s failing miserably? Is anyone doing a good job? Does anyone actually implement to standard, and to what extent? I’ll compare the major platforms to separate the losers from the not-so-losers.

It’s been my personal hell to find out the answers to these questions in my day-to-day work mining very large Unicode-only corpora. I’ll share my tales of woe and suffering, including my struggles with JDK7, along with a regex-rewriting library I’ve developed to enhance Java regexes’ Unicode sensitivity.

Photo of Tom Christiansen

Tom Christiansen


Tom Christiansen is a programmer, author, and lecturer who’s been involved with Perl since its initial public release back in 1987. Tom is the owner of the PERL.COM domain and website, and original author of much of Perl’s online documentation. Tom is lead author of the The Perl Cookbook and co-author of Programming Perl, Learning Perl (2nd edition), and Learning Perl on Win32 Systems, all bestselling titles by O’Reilly & Associates.

He served two terms on the USENIX Association Board of Directors, and was president of The Perl Journal. Perl users selected Tom to receive the first White Camel Award in 1999. In 2000, Members of the Open Source community voted Tom Best Newbie Helper in the first annual Andover.Net Slashdot Open Source Community Awards, to honor Open Source pioneers.

Tom holds a Masters degree in Computer Science from the University of Wisconsin – Madison with a dual specialization in operating systems design and in computational linguistics. He previously received his Bachelors degree there in Spanish and Computer Science with minor fields of study in French, Mathematics, and Music. Tom has lived abroad in England and in Spain, where he studied Romance Philology, café solo, and vino tinto.

Residing at the western edge of Boulder, Colorado, Tom is an amateur naturalist who spends most of his summer hiking and camping high in the wilderness well above 10,000 feet of elevation, wandering about the vast Colorado Plateau, or relaxing under the glittering kaleidoscope of the Black Rock Desert’s starkly featureless playa. Over the past five years, Tom has become especially interested in how the exciting growth of affordable digital photography has opened up to mere mortals dramatic artistic opportunities previously possible to only the most dedicated and persistent of professional photographers, and often not even to them.

Comments on this page are now closed.


Picture of Peter Banka
Peter Banka
07/29/2011 12:43pm PDT

Didn’t really enjoy the scattered stream-of-consciousness approach. Started late due to lack of preparedness.

Arvind Jayaprakash
07/28/2011 7:24pm PDT

Great to see one person talk about multiple porgramming languages in all earnest and not just trolling. Doubly great given that it comes from a father figure of one language. And yes, it was really informative.

Ilia Cheishvili
07/28/2011 6:14pm PDT

I doubt that there is anyone that knows more about Unicode than Tom. Great presentation.