Parsing Horrible Things with Python

A session at PyCon US 2012

Sunday 11th March, 2012

12:00pm to 12:30pm (PST)

If you've ever wanted to get started with parsers, here's your chance for a ground-floor introduction. A harebrained spare-time project gives birth to a whirlwind journey from basic algorithms to Python libraries and, at last, to a parser for one of the craziest syntaxes out there: the MediaWiki grammar that drives Wikipedia.

Some languages were designed to be parsed. The most obvious example is Lisp and its relatives which are practically parsed when they hit the page. However, many others—including most wiki grammars—grow organically and get turned into HTML by sedimentary strata of regular expressions, all backtracking and warring with one another, making it difficult to output other formats or make changes to the language.

We will explore the tools and techniques necessary to attack one of the hairiest lingual challenges out there: MediaWiki syntax. Join me for an introduction to the general classes of parsing algorithms, from the birth of the field to the state of the art. Learn how to pick the right one. Have a comparative look at a dozen different Python parsing toolkits. And finally, learn some optimization tricks to get a grammar going at a reasonable clip.

About the speaker

This person is speaking at this event.
Erik Rose

Python wrangler, language hacker, elegance chaser

Next session in E3

1:30pm Building A Python-Based Search Engine by Daniel Lindsley

Coverage of this session

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 12:00pm12:30pm PST

Date Sun 11th March 2012

Short URL


Official session page


View the schedule



Books by speaker

  • Plone 3 for Education

See something wrong?

Report an issue with this session