Get Lanyrd on your mobile (iPhone, Android and more) - check it out here

code4lib 2012 schedule

Tuesday 7th February 2012

  • Keynote #1 - Dan Chudnov

    by Dan Chudnov

    At 9:15am to 10:00am, Tuesday 7th February

  • Beyond code: Versioning data with Git and Mercurial

    by Stephanie Collett and Martin Haye

    Stephanie Collett, California Digital Library, stephanie.collett@ucop.edu

    Martin Haye, California Digital Library, martin.haye@ucop.edu

    Within a relatively short time since their introduction, distributed version control systems (DVCS) like Git and Mercurial have enjoyed widespread adoption for versioning code. It didn’t take long for the library development community to start discussing the potential for using DVCS within our applications and repositories to version data. After all, many of the features that have made some of these systems popular in the open source community to version code (e.g. lightweight, file-based, compressed, reliable) also make them compelling options for versioning data. And why write an entire versioning system from scratch if a DVCS solution can be a drop-in solution? At the California Digital Library (CDL) we’ve started using Git and Mercurial in some of our applications to version data. This has proven effective in some situations and unworkable in others. This presentation will be a practical case study of CDL’s experiences with using DVCS to version data. We will explain how we’re incorporating Git and Mercurial in our applications, describe our successes and failures and consider the issues involved in repurposing these systems for data versioning.

    At 10:20am to 10:40am, Tuesday 7th February

  • “Linked-Data-Ready” Software for Libraries

    by Jennifer Bowen

    Jennifer Bowen, University of Rochester River Campus Libraries, jbowen@library.rochester.edu

    Linked data is poised to replace MARC as the basis for the new library bibliographic framework. For libraries to benefit from linked data, they must learn about it, experiment with it, demonstrate its usefulness, and take a leadership role in its deployment.

    The eXtensible Catalog Organization (XCO) offers open-source software for libraries that is “linked-data-ready.” XC software prepares MARC and Dublin Core metadata for exposure to the semantic web, incorporating FRBR Group 1 entities and registered vocabularies for RDA elements and roles. This presentation will include a software demonstration, proposed software architecture for creation and management of linked data, a vision for how libraries can migrate from MARC to linked data, and an update on XCO progress toward linked data goals.

    At 10:40am to 11:00am, Tuesday 7th February

  • Your Catalog in Linked Data

    by Tom Johnson

    Tom Johnson, Oregon State University Libraries, thomas.johnson@oregonstate.edu

    Linked Library Data activity over the last year has seen bibliographic data sets and vocabularies proliferating from traditional library sources. We've reached a point where regular libraries don't have to go it alone to be on the Semantic Web. There is a quickly growing pool of things we can actually link to, and everyone's existing data can be immediately enriched by participating.
    This is a quick and dirty road to getting your catalog onto the Linked Data web. The talk will take you from start to finish, using Free Software tools to establish a namespace, put up a SPARQL endpoint, make a simple data model, convert MARC records to RDF, and link the results to major existing data sets (skipping conveniently over pesky processing time). A small amount of "why linked data?" content will be covered, but the primary goal is to leave you able to reproduce the process and start linking your catalog into the web of data. Appropriate documentation will be on the web.

    At 11:00am to 11:20am, Tuesday 7th February

    Coverage write-up

  • HTML5 Microdata and Schema.org

    by ronallo

    Jason Ronallo, North Carolina State University Libraries, jason_ronallo@ncsu.edu

    When the big search engines announced support for HTML5 microdata and the schema.org vocabularies, the balance of power for semantic markup in HTML shifted.

    What is microdata?
    Where does microdata fit with regards to other approaches like RDFa and microformats?
    Where do libraries stand in the worldview of Schema.org and what can they do about it?
    How can implementing microdata and schema.org optimize your sites for search engines?
    What tools are available?

    At 11:20am to 11:40am, Tuesday 7th February

  • ALL TEH METADATAS! or How we use RDF to keep all of the digital object metadata formats thrown at us

    by Declan Fleming

    Declan Fleming, University of California, San Diego, dfleming AT ucsd DING edu

    What's the right metadata standard to use for a digital repository? There isn't just one standard that fits documents, videos, newspapers, audio files, local data, etc. And there is no standard to rule them all. So what do you do? At UC San Diego Libraries, we went down a conceptual level and attempted to hold every piece of metadata and give each holding place some context, hopefully in a common namespace. RDF has proven to be the ideal solution, and allows us to work with MODS, PREMIS, MIX, and just about anything else we've tried. It also opens up the potential for data re-use and authority control as other metadata owners start thinking about and expressing their data in the same way. I'll talk about our workflow which takes metadata from a stew of various sources (CSV dumps, spreadsheet data of varying richness, MARC data, and MODS data), normalizes them into METS by our Metadata Specialists who create an assembly plan, and then ingests them into our digital asset management system. The result is a beautiful graph of RDF triples with metadata poised to be expressed as HTML, RSS, METS, XML, and opens linked data possibilities that we are just starting to explore.

    At 11:40am to 12:00pm, Tuesday 7th February

    Coverage slide deck

  • HathiTrust Large Scale Search: Scalability meets Usability

    by Tom Burton-West

    Tom Burton-West, DLPS, University of Michigan Library, tburtonw AT umich edu

    HathiTrust Large-Scale search provides full-text search services over nearly 10 million full-text books using Solr for the back-end. Our index is around 5-6 TB in size and each shard contains over 3 billion unique terms due to content in over 400 languages and dirty OCR.

    Searching the full-text of 10 million books often results in very large result sets. By conference time a number of features designed to help users narrow down large result sets and to do exploratory searching will either be in production or in preparation for release. There are often trade-offs between implementing desirable user features and keeping response time reasonable in addition to the traditional search trade-offs of precision versus recall.

    We will discuss various scalability and usability issues including:

    Trade-offs between desirable user features and keeping response time reasonable and scalable
    Our solution to providing the ability to search within the 10 million books and also search within each book
    Migrating the personal collection builder application from a separate Solr instance to an app which uses the same back-end as full-text search.
    Design of a scalable multilingual spelling suggester
    Providing advanced search features combining MARC metadata with OCR
    The dismax mm and tie parameters
    Weighting issues and tuning relevance ranking
    Displaying only the most "relevant" facets
    Tuning relevance ranking
    Dirty OCR issues
    CJK tokenizing and other multilingual issues.

    At 1:00pm to 1:20pm, Tuesday 7th February

  • Relevance Ranking in the Scholarly Domain

    by Tamar Sadeh

    Tamar Sadeh, PhD, Ex Libris Group, tamar.sadeh@exlibrisgroup.com

    The greatest challenge for discovery systems is how to provide users with the most relevant search results, given the immense landscape of available content. In a manner that is similar to human interaction between two parties, in which each person adjusts to the other in tone, language, and subject matter, discovery systems would ideally be sophisticated and flexible enough to adjust their algorithms to individual users and each user’s information needs.

    When evaluating the relevance of an item to a specific user in a specific context, relevance-ranking algorithms need to take into account, in addition to the degree to which the item matches the query, information that is not embodied in the item itself. Such information, which includes the item’s scholarly value, the type of search that the user is conducting (e.g., an exploratory search or a known-item search), and other factors, enables a discovery system to fulfill user expectations that have been shaped by experience with Web search engines.

    The session will focus on the challenges of developing and evaluating relevance-ranking algorithms for the scholarly domain. Examples will be drawn mainly from the relevance-ranking technology deployed by the Ex Libris Primo discovery solution.

    At 1:20pm to 1:40pm, Tuesday 7th February

  • Kill the search button II - the handheld devices are coming

    by Michael Poltorak

    Jørn Thøgersen, Statsbiblioteket/State and University Library, Aarhus, Denmark. jt@statsbiblioteket.dk
    Michael Poltorak Nielsen, Statsbiblioteket/State and University Library, Aarhus, Denmark. mn@statsbiblioteket.dk, (aka the Danes - some of them).

    Web based library search engines are traditionally operated using keys, input fields, buttons, and links. Being equipped with touch screens, accelerometers, GPS's, and cameras, smartphones and tablets offer a whole new range of input options.
    In this talk we'll demonstrate some of our ideas of how to utilise these new input options interacting with a search engine. The basic idea is to have no traditional GUI input elements, but only use touch interactions (pinch, zoom, swipe, long-press, etc) and gestures (shake, tilt, turn, etc.). Using these interactions, we’ll demonstrate how to:

    do searches
    toggle search result views
    switch pages
    request materials, add to favourites
    interact with your stuff, renew items
    We'll also show you some (conceptual) ideas about using the device camera for locating and checking out materials.

    On a general level, what we are trying to achieve is a move away from a web based paradigm and establish new ways of interaction better suited to the new devices and on their own terms. The demonstration will feature working mobile prototypes including both native apps (iPhone) and web apps. In both cases they will run on live data from our OPAC on www.statsbiblioteket.dk/search/

    This talk is actually also a continuation of our Code4Lib 2010 talk called "Kill The Search Button" (http://code4lib.org/conference/2...), which we unfortunately never got around to do, due to a Danish blizzard.

    At 1:40pm to 2:00pm, Tuesday 7th February

  • Design for Developers

    by Lisa Kurt

    Lisa Kurt, University of Nevada, Reno, lkurt@unr.edu

    Users expect good design. This talk will delve into what makes really great design, what to look for, and how to do it. Learn the principles of great design to take your applications, user interfaces, and projects to a higher level. With years of experience in graphic design and illustration, Lisa will discuss design principles, trends, process, tools, and development. Design examples will be from her own projects as well as a variety from industry. You’ll walk away with design knowledge that you can apply immediately to a variety of applications and a number of top notch go-to resources to get you up and running.

    At 2:00pm to 2:20pm, Tuesday 7th February

  • The Golden Road (To Unlimited Devotion): Building a Socially Constructed Archive of Grateful Dead Artifacts

    by Robin Chandler, Susan Chesley Perry and Kevin S. Clarke

    Robin Chandler, University of California (Santa Cruz), chandler [at] ucsc [dot] edu
    Susan Chesley Perry, University of California (Santa Cruz), chesley [at] ucsc [dot] edu
    Kevin S. Clarke, University of California (Santa Cruz), ksclarke [at] ucsc [dot] edu

    The Grateful Dead Archive at the University of California (Santa Cruz) is a collection of over 600 linear feet of material, including: business records, photographs, posters, fan envelopes, tickets, video, audio (oral histories, interviews and music) and 3-d objects such as stage props and band merchandise. In addition, with the release of the Grateful Dead Archive Online website in 2012, the Archive will start actively collecting artifacts from an enthusiastic community of Grateful Dead fans.

    This talk will discuss the challenges of merging a traditional archive with a socially constructed one. We will also present the first round of development and explain how we're using tools like Omeka, ContentDM, UC3 Merritt, djatoka, Kaltura, Google Maps, and Solr to lay the foundation for a robust and engaging site. Future directions, like the integration/development of better curation tools and what we hope to learn from opening the archive to contributions from a large community of fans, will also be discussed.

    At 2:20pm to 2:40pm, Tuesday 7th February

Wednesday 8th February 2012

  • Discovering Digital Library User Behavior with Google Analytics

    by Kirk

    Kirk Hess, Digital Humanities Specialist, University of Illinois Urbana-Champaign, kirkhess@illinois.edu

    Digital library administrators are frequently asked questions like "How many times was that document downloaded", or "What’s the most popular book in our collection?" Conventional web logging software, such as AWStats, can only answer those questions some of the time, and there’s always the question of whether or not the data is polluted by non-users, such as spiders and crawlers. Google Analytics, (http://google.com/analytics/) , a JavaScript-based solution that excludes most crawlers and bots, shows how users found your site and how they explored it.

    The presentation will review tracking search queries, adding events such as clicking external links or downloading files, and custom variables, to track user behavior that is normally difficult to track. We'll also discuss using jQuery scripts to add tracking code to the page without having to modify the underlying web application. Once you've collected data, you may use the Google Analytics API to extract data and integrate it with data from your digital repository to show granular data about individual items in your Digital Library. Finally, we'll discuss how this information allows you to improve the user experience, and summarize some of the research we are doing with our digital repository and the data gathered from Google Analytics.

    At 9:15am to 9:35am, Wednesday 8th February

  • How people search the library from a single search box

    by Cory Lown

    Cory Lown, North Carolina State University Libraries, cory_lown@ncsu.edu

    Searching the library is complex. There's the catalog, article databases, journal title and database title look-ups, the library website, finding aids, knowledge bases, etc. How would users search if they could get to all of these resources from a single search box? I'll share what we've learned about single search at NCSU Libraries by tracking use of QuickSearch (http://www.lib.ncsu.edu/search/i...), our home-grown unified search application. As part of this talk I will suggest low-cost ways to collect real world use data that can be applied to improve search. I will try to convince you that data collection must be carefully planned and designed to be an effective tool to help you understand what your users are telling you through their behavior. I will talk about how the fragmented library resource environment challenges us to provide useful and understandable search environments. Finally, I will share findings from analyzing millions of user transactions about how people search the library from a production single search box at a large university library.

    At 9:35am to 9:55am, Wednesday 8th February

  • Building research applications with Mendeley

    by mrgunn

    William Gunn, Mendeley william.gunn@mendeley.com (@mrgunn)

    This is partly a tool talk and partly a big idea one.

    Mendeley has built the world's largest open database of research and we've now begun to collect some interesting social metadata around the document metadata. I would like to share with the Code4Lib attendees information about using this resource to do things within your application that have previously been impossible for the library community, or in some cases impossible without expensive database subscriptions. One thing that's now possible is to augment catalog search by surfacing information about content usage, allowing people to not only find things matching a query, but popular things or things read by their colleagues. In addition to augmenting search, you can also use this information to augment discovery. Imagine an online exhibit of artifacts from a newly discovered dig not just linking to papers which discuss the artifact, but linking to really good interesting papers about the place and the people who made the artifacts. So the big idea is, "How will looking at the literature from a broader perspective than simple citation analysis change how research is done and communicated? How can we build tools that make this process easier and faster?" I can show some examples of applications that have been built using the Mendeley and PLoS APIs to begin to address this question, and I can also present results from Mendeley's developer challenge which shows what kinds of applications researchers are looking for, what kind of applications peope are building, and illustrates some interesting places where the two don't overlap.

    At 9:55am to 10:15am, Wednesday 8th February

  • Stack View: A Library Browsing Tool

    by Annie Cain

    Annie Cain, Harvard Library Innovation Lab, acain@law.harvard.edu

    In an effort to recreate and build upon the traditional method of browsing a physical library, we used catalog data, including dimensions and page count, to create a virtual shelf.

    This CSS and JavaScript backed visualization allows items to sit on any number of different shelves, really taking advantage of its digital nature. See how we built Stack View on top of our data and learn how you can create shelves of your own using our open source code.

    At 10:35am to 10:55am, Wednesday 8th February

  • NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis

    by Jeremy Nelson

    Jeremy Nelson, Colorado College, jeremy.nelson@coloradocollege.edu

    In October, the Library of Congress issued a news release, "A Bibliographic Framework for the Digital Age" outlining a list of requirements for a New Bibliographic Framework Environment. Responding to this challenge, this talk will demonstrate a Redis (http://redis.io) FRBR datastore proof-of-concept that, with a lightweight python-based interface, can meet these requirements.

    Because FRBR is an Entity-Relationship model; it is easily implemented as key-value within the primitive data structures provided by Redis. Redis' flexibility makes it easy to associate arbitrary metadata and vocabularies, like MARC, METS, VRA or MODS, with FRBR entities and inter-operate with legacy and emerging standards and practices like RDA Vocabularies and LinkedData.

    At 10:55am to 11:15am, Wednesday 8th February

    Coverage link

  • Ask Anything!

    by Herp Derp

    Carmen Mitchell, carmenmitchell at gmail (@carmendarlene)
    code4lib 2012, Wednesday, February 8 2012, 11:15-12:00

    a.k.a. "Human Search Engine". A chance for you to ask a roomful of code4libbers anything that's on your mind: questions seeking answers (short or long), requests for things (hardware, software, skills, or help), or offers of things. We'll keep the pace fast, and the answers faster. Come with questions and line up at the start of the session and we'll go through as many as we can; sometimes we'll stop at finding the right person or people to answer a query and it'll be up to you to find each other after the session. Third time at code4libcon! (Thanks to Ka-Ping Yee and Dan Chudnov for the inspiration/explanation, reused here in part.)

    At 11:15am to 12:00pm, Wednesday 8th February

  • Indexing big data with Tika, Solr & map-reduce

    by Scott Fisher and Erik Hetzner

    Scott Fisher, California Digital Library, scott.fisher AT ucop BORK edu
    Erik Hetzner, California Digital Library, erik.hetzner AT ucop BORK edu

    The Web Archiving Service at the California Digital Library has crawled a large amount of data, in every format found on the web: 30 TB, comprising about 600 million fetched URLs. In this talk we will discuss how we parsed this data using Tika and map-reduce, and how we indexed this data with Solr, tweaked the relevance ranking, and were able to provide our users with a better search experience.

    At 1:00pm to 1:20pm, Wednesday 8th February

  • In-browser data storage and me

    by Jason Casden

    Jason Casden, North Carolina State University Libraries, jason_casden@ncsu.edu

    When it comes to storing data in web browsers on a semi-persistent basis, there are several partially-adopted, semi-deprecated, product-specific, or even universally accepted options. These include models such as key-value stores, relational databases, and object stores. I will present some of these options and discuss possible applications of these technologies in library services. In addition to quoting heavily from Mark Pilgrim's excellent chapter on this topic, I will weave in my own experience utilizing in-browser data storage in an iPad-based data collection tool to successfully improve performance and data stability while reducing network dependence. See also: HTML5.

    At 1:20pm to 1:40pm, Wednesday 8th February

    Coverage link

  • Lies, Damned Lies, and Lines of Code Per Day

    by James Stuart

    James Stuart, Columbia University, james.stuart@columbia.edu

    We've all heard about that one study that showed that Pair Programming was 20% efficient than working alone. Or maybe you saw on a blog that study that showed that programmers who write fewer lines of code per day are more efficient...or was it less efficient? And of course, we all know that programmers who work in (Ruby|Python|Java|C|Erlang) have been shown to be more efficient.

    A quick examination of some of the research surrounding programming efficiency and methodology, with a focus on personal productivity, and how to incorporate the more believable research into your own team's workflow.

    At 1:40pm to 2:00pm, Wednesday 8th February

    Coverage slide deck

  • Practical Agile: What's Working for Stanford, Blacklight, and Hydra

    by Naomi Dushay

    Naomi Dushay, Stanford University Libraries, ndushay@stanford.edu
    code4lib 2012, Wednesday, February 8 2012, 14:00-14:20

    Agile development techniques can be difficult to adopt in the context of library software development. Maybe your shop has only one or two developers, or you always have too many simultaneous projects. Maybe your new projects can’t be started until 27 librarians reach consensus on the specifications.

    This talk will present successful Agile- and Silicon-Valley-inspired practices we’ve adopted at Stanford and/or in the Blacklight and Hydra projects. We’ve targeted developer happiness as well as improved productivity with our recent changes. User stories, dead week, sight lines … it’ll be a grab bag of goodies to bring back to your institution, including some ideas on how to adopt these practices without overt management buy in.

    At 2:00pm to 2:20pm, Wednesday 8th February

Thursday 9th February 2012

  • Your UI can make or break the application (to the user, anyway)

    by Robin Schaaf

    Robin Schaaf, University of Notre Dame, schaaf.4@nd.edu

    UI development is hard and too often ends up as an after-thought to computer programmers - if you were a CS major in college I'll bet you didn't have many, if any, design courses. I'll talk about how to involve the users upfront with design and some common pitfalls of this approach. I'll also make a case for why you should do the screen design before a single line of code is written. And I'll throw in some ideas for increasing usability and attractiveness of your web applications. I'd like to make a case study of the UI development of our open source ERMS.

    At 11:00am to 11:20am, Thursday 9th February

  • Quick and Dirty Clean Usability: Rapid Prototyping with Bootstrap

    Shaun Ellis, Princeton University Libraries, shaune@princeton.edu

    "The code itself is unimportant; a project is only as useful as people actually find it." - Linus Torvalds [1]

    Usability has been a buzzword for some time now, but what is the process for making the the transition toward a better user experience, and hence, better designed library sites? I will discuss the one facet of the process my team is using to redesign the Finding Aids site for Princeton University Libraries (still in development). The approach involves the use of rapid prototyping, with Bootstrap [2], to make sure we are on track with what users and stakeholders expect up front, and throughout the development process.

    Because Bootstrap allows for early and iterative user feedback, it is more effective than the historic Photoshop mockups/wireframe technique. The Photoshop approach allows stakeholders to test the look, but not the feel -- and often leaves developers scratching their heads. Being a CSS/HTML/Javascript grid-based framework, Bootstrap makes it easy for anyone with a bit of HTML/CSS chops to quickly build slick, interactive prototypes right in the browser -- tangible solutions which can be shared, evaluated, revised, and followed by all stakeholders (see Minimum Viable Products [3]). Efficiency is multiplied because the customized prototypes can flow directly into production use, as is the goal with iterative development approaches, such as the Agile methodology.

    While Bootstrap is not the only framework that offers grid-based layout, development is expedited and usability is enhanced by Bootstraps use of of "prefabbed" conventional UI patterns, clean typography, and lean Javascript for interactivity. Furthermore, out-of-the box Bootstrap comes in a fairly neutral palette, so focus remains on usability, and does not devolve into premature discussions of color or branding choices. Finally, using Less can be a powerful tool in conjunction with Bootstrap, but is not necessary. I will discuss the pros and cons, and offer examples for how to getting up and running with or without Less.

    At 11:20am to 11:40am, Thursday 9th February

  • Search Engine Relevancy Tuning - A Static Rank Framework for Solr/Lucene

    Mike Schultz, formerly Summon Search Architect, mike.schultz@gmail.com

    Solr/Lucene provides a lot of flexibility for adjusting relevancy scoring and improving search results. Roughly speaking there are two areas of concern: Firstly, a 'dynamic rank' calculation that is a function of the user query and document text fields. And secondly, a 'static rank' which is independent of the query and generally is a function of non-text document metadata. In this talk I will outline an easily understood, hand-tunable static rank system with a minimal number of parameters.

    The obvious major feature of a search engine is to return results relevant to a user query. Perhaps less obvious is the huge role query independent document features play in achieving that. Google's PageRank is an example of a static ranking of web pages based on links and other secret sauce. In the Summon service, our 800 million documents have features like publication date, document type, citation count and Boolean features like the-article-is-peer-reviewed. These fields aren't textual and remain 'static' from query to query, but need to influence a document's relevancy score. In our search results, with all query related features being equal, we'd rather have more recent documents above older ones, Journals above Newspapers, and articles that are peer reviewed above those that are not. The static rank system I will describe achieves this and has the following features:

    Query-time only calculation - nothing is baked into the index - with parameters adjustable at query time
    The system is based on a signal metaphor where components are 'wired' together. System components allow multiplexing, amplifying, summing, tunable band-pass filtering, string-to-value-mapping all with a bare minimum of parameters.
    An intuitive approach for mixing dynamic and static rank that is more effective than simple adding or multiplying.
    A way of equating disparate static metadata types that leads to understandable results ordering.

    At 11:40am to 12:00pm, Thursday 9th February

Schedule incomplete?

Add a new session

Filter by Day

Filter by coverage

Filter by Topic