Sessions at PyCon US 2012 about NLTK in Santa Clara ConventionĀ Center

Your current filters are…


Thursday 8th March 2012

  • Introduction to NLTK

    by Jacob Perkins

    Learn the basics of natural language processing with NLTK, the Natural Language ToolKit. First we'll cover tokenization, stemming and wordnet. Next we'll get into part-of-speech tagging, chunking & named entity recognition. Then we'll close with text classification and sentiment analysis. You'll walk out with new super-powers and an appreciation of the difficulties of analyzing human language.

    This tutorial will be a hands on approach to learning natural language processing using NLTK, the Natural Language ToolKit. We will cover everything from tokenizing sentences to phrase extraction, from splitting words to training your own text classifiers for sentiment analysis. Please come prepared with NLTK already installed so we can dive into the code & data immediately.

    Hour 1: Tokenization, Stemming & Corpora

    Tokenization & familiarity with corpus readers and models are required knowledge before you can get into the more interesting aspects of NLTK. This first hour will include:

    • an overview of modules & data
    • loading pickled models
    • sentence & word tokenization
    • stemming & lemmatization
    • an overview wordnet and other included corpora

    Hour 2: Part-of-Speech Tagging & Chunking/NER

    Using tokenization and a working knowledge of corpus readers & pickled models, we'll dive into part-of-speech tagging and chunking/NER, including:

    • using a part-of-speech tagger
    • an overview of tags and tagged corpora
    • training a custom tagger with nltk-trainer
    • using a chunker for phrase extraction and named entity recognition
    • an overview of chunked corpora
    • training a custom chunker with nltk-trainer

    Hour 3: Text Classification & Sentiment Analysis

    After using classifiers for training part-of-speech taggers and chunkers, this final hour will explain text classification in greater detail with:

    • an overview of classified corpora
    • text feature extraction
    • an overview of classification algorithms & when to use them
    • training a sentiment analysis classifier on movie reviews with nltk-trainer
    • using a classifier for sentiment analysis
    • hierarchical classification for sentiment analysis
    • binary vs multi-label classification

    Wrapping Up

    Now that you know how to use NLTK to process some of the included English corpora, we'll wrap up by covering:

    • non-english corpora included with NLTK
    • other Python libraries for NLP
    • custom corpus creation

    At 1:20pm to 4:40pm, Thursday 8th March

    In D3, Santa Clara Convention Center

Sunday 11th March 2012

  • Parsing sentences with the OTHER natural language tool: LinkGrammar

    by Jeff Elmore

    Many of you are probably familiar with NLTK, the wonderful Natural Language Toolkit for Python. You may not be familiar with Linkgrammar, which is a sentence parsing system created at Carnegie Melon university. Linkgrammar is quite robust and works "out of the box" in a way that NLTK does not for sentence parsing.


    NLTK is a fantastic library with broad capabilities. But often I find that I want something that will just do what I want without my having to figure out all of the details. An example of this is sentence parsing. A quick google search for parsing sentences with NLTK returns a number of articles describing how to write your own grammar and define a parser based on that grammar and parse sentences. This is great for toy problems and education, but if you actually need to parse sentences "from the wild," writing your own grammar is a huge undertaking.

    Enter Linkgrammar. Linkgrammar was developed at Carnegie Melon university and is now maintained by the developers of Abiword as the basis for their grammar checking capabilities. It works nicely out of the box and is tolerant of irregularities found in authentic text.

    At 2:10pm to 2:55pm, Sunday 11th March

    In E3, Santa Clara Convention Center

    Coverage video

Schedule incomplete?

Add a new session

Filter by Day

Filter by coverage

Filter by Topic

Filter by Venue

Filter by Space