Sessions at EuroPython 2011 about Scraping on Friday 24th June

Your current filters are…

  • Scraping Techniques to Extract Advertisements from Web Pages

    by Mirko Urru and Stefano Cotta Ramusino

    Online Advertising is an emerging research field, at the intersection of Information Retrieval, Machine Learning, Optimization, and Microeconomics. Its main goal is to choose the right ads to present to a user engaged in a given task, such as Sponsored Search Advertising or Contextual Advertising. The former puts ads on the page returned from a Web search engine following a query. The latter puts ads within the content of a generic, third party, Web page. The ads themselves are selected and served by automated systems based on the content displayed to the user.

    Web scraping is the set of techniques used to automatically get some information from a website instead of manually copying it. In particular, we're interested in studying and adopting scraping techniques for:
    i. accessing tags as object members
    ii. finding out tags whose name, contents or attributes match selection criteria
    iii. accessing tag attributes by using a dictionary-like syntax.

    In this talk, we focus on the adoption of scraping techniques in the contextual advertising field. In particular, we present a system aimed at finding the most relevant ads for a generic web page p. Starting from p, the system selects a set of its inlinks (i.e., the pages that link p) and extracts the ads contained into them. Selection is performed querying the Google search engine, whereas extraction is made by using suitable scraping techniques.

    At 2:30pm to 3:30pm, Friday 24th June

    Coverage video