Y'all Wanna Scrape with Us? Content Ain't a Thing: Web Scraping With Our Favourite Python Libraries

A session at DjangoCon US 2011

Thursday 8th September, 2011

3:30pm to 4:10pm

Love or hate them, the top python scraping libraries have some hidden gems and tricks that you can use to enhance, update and diversify your Django models. This talk will teach you more advanced techniques to aggregate content from RSS feeds, Twitter, Tumblr and normal old web sites for your Django projects.

OUTLINE

  • lxml fu: etree vs html
  • lxml faves: iterlinks, prev/next, strip_tags, linepos
  • incorporating xpath
  • building your xml views/templates with lxml (this bullet is optional: may not have time but would love to hear if folks might find this useful)
  • learning how to build a good JSON API handler: what you can learn from some amazing api handlers when you have to build your own
  • feedparser, HTMLParser, re: the quick & dirty ways to parse when LXML isn't fast enough

About the speaker

This person is speaking at this event.
Katharine Jarmul

Director of Technology at @hyfn. Lover of all things Unix and pythonista extrodionaire. bio from Twitter

Coverage of this session

Sign in to add slides, notes or videos to this session

Tell your friends!

When

Time 3:30pm4:10pm PST

Date Thu 8th September 2011

Short URL

lanyrd.com/shbrp

Official session page

djangocon.us/…sentations/35/

View the schedule

Share

Topics

See something wrong?

Report an issue with this session