How to Write Compilers and Optimizers (and solve Data Transformation Problems)

Shevek . (Nebula)
Sponsored Sessions
Location: E143
Average rating: ****.
(4.27, 11 ratings)

A presentation and discussion on building compilers and optimizers for data transformations or domain-specific languages. After this presentation, any programmer should be able to write a basic compiler, and people with prior knowledge will be able to write more advanced optimizing compilers.

Introduction
Compilation, loosely described, is the automated transformation of one piece of data into another; and the optimizer, the frontend’s partner in crime, is a key decision-maker in these transformations.

A good compiler-optimizer pair can solve a map-reduce scheduling problem, a timetabling problem, or even lay out the desks in an office; and the compiler-optimizer itself can be written in under 500 lines of code. (Compiling C to binary takes about 5,000.)

Yet we treat these techniques as arcane and shy away from them in everyday code; we write serializers and greedy algorithms by hand, often with poor performance, inconsistent results or badly understood corner cases.

We can fix that!

This is a huge topic, and the presentation is based on a wealth of experience of writing compilers, in different languages, for different platforms, with very different underlying behaviours. There will be considerable opportunity for the audience to guide the talk through a selection of available sub-topics.

Optimization
Most of the really interesting problems are in the optimizer, which is probably why it isn’t often discussed or taught. A good optimizer can ensure the success of a business, can fit 20 more customers onto a piece of hardware, or can make your program run an order of magnitude faster. Making the wrong decision can be worse than making no decision, especially in an automated system.

  • Basic local optimizations.
  • The problem with local optimization.
  • Global optimization.
  • Algebraic techniques.
  • NP-Hard problems and solution in generality.
  • Modeling for CSP and SAT.
  • Existing libraries.

There are also some interpreter-related party-tricks, like direct-threading, which can be used to scare people, but do make code run faster!

Applications and Front-End
We must be able to integrate our new techniques and tools into our workflow. We will discuss building incremental compilers, techniques for error management, reporting and recovery, and debugging – often touted as twice as hard as development.

During development of a program, the input is more often invalid than not, and a good frontend must handle invalid input elegantly, produce useful and appropriate error messages, and not fail at the first error. Even if we are just building a serializer, we may prefer to use compiler-building tools to construct it. The available front-end tools have reached such a level of maturity that they generally produce the most performant, reliable and correct solutions to many problems.

  • Lexing and parsing of text.
  • Avoiding shift-reduce conflicts and ambiguity.
  • Handling non-textual input, such as graph languages.
  • Available tools.

Also Available
It’s too much for one presentation, but let’s see where we end up!

  • Language Design: What makes a good language for describing a problem?
  • Backends – custom interpreter, existing VM, or hardware?
  • Parallel Systems – a personal favourite at the moment.
  • Worked Examples – shall we compile C, for example?

This session is sponsored by Nebula

Photo of Shevek .

Shevek .

Nebula

Shevek is an expert programmer with a strong interest in parallel and
distributed systems. He has worked on cutting edge research in compilers
and language design, algorithmic optimization, systems and security. He
is capable of maintaining a very straight face under questioning on
topics including “Why is our printer playing ‘happy birthday’?” or “What
is that message doing on the side of that building?” He received a
Doctorate in Computing on the Formalization of Protection Systems from
the University of Bath, England. He also holds a Masters in Pure
Mathematics and an epee.

Comments on this page are now closed.

Comments

Picture of Rob Reilly
Rob Reilly
07/20/2012 11:22am PDT

Shevek,

My printer doesn’t play anything and you can see from my picture, I’m actually smiling (sort of). My wife accuses me of being way too poker-faced.

I suspect you can answer why there’s a message on a building, perhaps from a technical or philosophical standpoint, of course. Non-techies do frequently ponder those things, and sometimes need insight.

Nevertheless, sorry I didn’t get to meet you. Definitely wanted to inquire about your attire.

Maybe next OSCON.

I hope you had a productive conference.

Best regards,

Rob

Sponsors

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

View a complete list of OSCON contacts