OVERVIEW
You have a Perl-based website and it’s time to migrate it from a Latin based encoding to UTF-8. Perl has many pieces to the encoding puzzle and a road map is useful here. Hopefully, at the end of this talk you will understand the basics of converting your data to UTF-8, ensuring that your website outputs UTF-8 correctly and how to debug any encoding issues that might crop up.
THE COMMON ENCODING TYPES
Brief overview of the Latin-1 (ISO-8859-1) and Windows-1252 encodings.
UTF-8: A BRAVE NEW WORLD
Brief overview of the UTF-8 encoding standard with regard to the 1, 2, 3 and 4 byte encodings and how the bits are encoded.
Perl and UTF-8
How to do the following in Perl:
ENCODING HELL
Some tips on how to debug some common encoding issues.
CONCLUSION
We notice that it is not easy to navigate the transition from traditional encodings to UTF-8 but with perseverance it is doable. We have illustrated the common encodings, how to process our information in this environment and how to tackle any issues that might arise.
Currently a Perl hacker for Xerox.com though in a previous life he worked for the military industrial complex.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com
Download the OSCON Sponsor/Exhibitor Prospectus
Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required)
View a complete list of OSCON contacts
Comments
Please try not to read the slides but instead use the slides to supplement the content which you are presenting.
Seems like this presenter is a stand-in for the original person. He’s basically just reading the slides…