Sessions at PyCon US 2012 on Sunday 11th March in E3

Your current filters are…

Clear
  • Parsing Horrible Things with Python

    by Erik Rose

    If you've ever wanted to get started with parsers, here's your chance for a ground-floor introduction. A harebrained spare-time project gives birth to a whirlwind journey from basic algorithms to Python libraries and, at last, to a parser for one of the craziest syntaxes out there: the MediaWiki grammar that drives Wikipedia.

    Some languages were designed to be parsed. The most obvious example is Lisp and its relatives which are practically parsed when they hit the page. However, many others—including most wiki grammars—grow organically and get turned into HTML by sedimentary strata of regular expressions, all backtracking and warring with one another, making it difficult to output other formats or make changes to the language.

    We will explore the tools and techniques necessary to attack one of the hairiest lingual challenges out there: MediaWiki syntax. Join me for an introduction to the general classes of parsing algorithms, from the birth of the field to the state of the art. Learn how to pick the right one. Have a comparative look at a dozen different Python parsing toolkits. And finally, learn some optimization tricks to get a grammar going at a reasonable clip.

    At 12:00pm to 12:30pm, Sunday 11th March

    In E3, Santa Clara Convention Center

  • Building A Python-Based Search Engine

    by Daniel Lindsley

    Search is an increasingly common request in all types of applications as the amount of data all of us deal with continues to grow. The technology/architecture behind search engines is wildly different from what many developers expect. This talk will give a solid grounding in the fundamentals of providing search using Python to flesh out these concepts in a simple library.

    • Core concepts
    • Terminology
    • Document-based
    • Show basic starting code for a document
    • Inverted Index
    • Show a simple inverted index class
    • Stemming
    • N-gram
    • Show a tokenizer/n-gram processor
    • Fields
    • Show a document handler which ties it all together
    • Searching
    • Show a simple searcher (& the whole thing working together)
    • Faceting (likely no demo)
    • Boost (likely no demo)
    • More Like This
    • Wrap up

    At 1:30pm to 2:10pm, Sunday 11th March

    In E3, Santa Clara Convention Center

  • Parsing sentences with the OTHER natural language tool: LinkGrammar

    by Jeff Elmore

    Many of you are probably familiar with NLTK, the wonderful Natural Language Toolkit for Python. You may not be familiar with Linkgrammar, which is a sentence parsing system created at Carnegie Melon university. Linkgrammar is quite robust and works "out of the box" in a way that NLTK does not for sentence parsing.

    Abstract

    NLTK is a fantastic library with broad capabilities. But often I find that I want something that will just do what I want without my having to figure out all of the details. An example of this is sentence parsing. A quick google search for parsing sentences with NLTK returns a number of articles describing how to write your own grammar and define a parser based on that grammar and parse sentences. This is great for toy problems and education, but if you actually need to parse sentences "from the wild," writing your own grammar is a huge undertaking.

    Enter Linkgrammar. Linkgrammar was developed at Carnegie Melon university and is now maintained by the developers of Abiword as the basis for their grammar checking capabilities. It works nicely out of the box and is tolerant of irregularities found in authentic text.

    At 2:10pm to 2:55pm, Sunday 11th March

    In E3, Santa Clara Convention Center

    Coverage video