•  

Sessions at DjangoCon US 2011 about Web Scraping and Python with video

Your current filters are…

Clear

Thursday 8th September 2011

  • Y'all Wanna Scrape with Us? Content Ain't a Thing: Web Scraping With Our Favourite Python Libraries

    by Katharine Jarmul

    Love or hate them, the top python scraping libraries have some hidden gems and tricks that you can use to enhance, update and diversify your Django models. This talk will teach you more advanced techniques to aggregate content from RSS feeds, Twitter, Tumblr and normal old web sites for your Django projects.

    OUTLINE

    • lxml fu: etree vs html
    • lxml faves: iterlinks, prev/next, strip_tags, linepos
    • incorporating xpath
    • building your xml views/templates with lxml (this bullet is optional: may not have time but would love to hear if folks might find this useful)
    • learning how to build a good JSON API handler: what you can learn from some amazing api handlers when you have to build your own
    • feedparser, HTMLParser, re: the quick & dirty ways to parse when LXML isn't fast enough

    At 3:30pm to 4:10pm, Thursday 8th September