Code4Lib2010 Notes from Day 1, morning

#1 keynoter: Cathy Marshall:

A web page weighs 80 micrograms?

15 years ago (New Yorker article): well, anyone can have a web page, although most of them are commercial enterprises.  All you need is to digitize your stuff.  Born digital (pictures), even in TIFF were low resolution.

Today 4.3 billion personal photos on Flickr alone.

Cathy Marshall:  guilty of “feral ethnography” (studying what others do)

Benign neglect:  de facto archiving/collection policy?  Is Facebook millenials’ scrapbook?  Should it be?  Whose job is it to “remember”?

Digital stuff accumulates at a faster rate than physical space (out of sight & easier to “neglect”?)  CS view:  how is this a problem?  Storage is getting cheaper, why not just keep everything?

Why keep everything? Difficult to predict an items future worth (true also for physical items as well); deleting is hard, thankless work; filtering and searching allows locating things easily. (easier to keep than to cull)

Losing stuff becomes a means of culling collections (realizing afterwards that losing the stuff wasn’t such a big deal after all).

Attitude that “I don’t really need this stuff, but it’s nice to be able to keep it around.”

It’s easier to lose things than to maintain them.  The more things are used, the more it’s likely to be preserved.

No single reservation technology will win the battle for saving your stuff, but people put/save stuff all over the place (flickr, youtube, blogs, facebook..)

People lose stuff for non technology reasons (don’t understand EULA’s, lost passwords, die, etc.)  For scholars, key vulnerability is changing organizations (more so than technology failures).

Digital originals vs. reference copies:  highest fidelity (e.g., photos) is the one closest to the source (local computer).  Remote service, e.g., flickr, has the metadata, though, which becomes important for locating, filtering (save everything).  Where are the tools for gathering the metadata and finding the copies with the metadata?

Searching for tomorrow: Re-encountering techniques?  But some encounters are with stuff that was supposed to disappear.

Bottom efforts taking place (personal & small orgs); new institutions showing up to digitize collections.  New opportunistic uses of massed data.

Power of benign neglect vs. power of forgetting: some things you want to make sure are gone.  How sure are you that you really want to forget (data)?

Cloud computing talks:

Cloud4lib.

Coolest glue in the galaxy?  Is it even possible to have a centralized repository of development activities, especially among disparate libraries?

What is the base level, policy, that needs to exist in order to make it all work? (what exactly is the glue)  What is sticky enough to create critical mass?  Base services:  repository, metadata

Breakout session: brainstorming to figure out oversight & governance.  I see “possible” but not “probably” here.

Linked Library Data Cloud:

From Tim Berners-Lee:  bottom line:  make stuff discoverable

Concept of triples in RDF:  Subject, Predicate, Object.

Linking by assertion, using central index (e.g. id.loc.gov), which is linked data.  But how to make bib data RDF:  LCCN.  Resources (linked, verified data as URI’s).

If you have an LCCN in your MARC record, you already have what you need for Linked Data.  If you know what the LCCN, you can grab all the linked stuff and make it part of your data.

OCLC’s virtual information authority file:  another source for linked data.

No standard data model, but more linkages to resources are outside the library domain:  how to get them? And what about sustainability and preservation?  If a goes away, what do you do?

Do it Yourself Cloud Computing (with R and Apache)

How to use data and data analysis in libraries.

1.  What is The Cloud?  Replacement for the desktop: globally accessible

2. What is R? Free and Open Source, SAS, SPSS, Stata. Software that supports data analysis and visualization.  It has a console!  Eventually a gui port? Cons:  learning curve, problems with very large datasets.  Pros:  de facto standard, huge user community, extensivle.

3.  What is Rapache?  R + Apache? Apache module developed at Vanderbilt.  Puts instance of R in each apache process, similarly to PHP.  You can embed R script in web pages.

4.  Relevance to libraries – keep the slide! keep the slide!

Public Datasets in the Cloud

IaaS (Infrastructure as a service

Paas (Platform as a service)

Saas (Software as a service)

In this case, raw data, that can be used elsewhere, not what can be downloaded from a web site

Demo:  Public datasets in the cloud (in this case, from Amazon ec2): get data from/onto remote site, retrieve via ssh

Using Google fusion tables:  you can comment, slice, dice.  You can embed the data from google fusion onto a web site.

7+ Ways to improve library UIs with OCLC web services

“7 really easy ways to improve your web services”

Crosslisting print and electronic materials: use WorldCart Search API to check/add link to main record

Link to libraries nearby that have a book that is out (mashup query by oclc number and zip code)

Providing journal TOCs: use xISSN to see if a feed of recent article TOC is available, embed a link to open a dialog with items from the cat to the UI.

Peer review indicators:  use data from xIssn to add peer review info to (appropriate) screens

Providing info about the author: use Identities and Wikipedia API to insert author info into a dialog box within UI

Providing links to free full text:  use xOCLCNum to check for free full text scanning projects like Open Content Alliance and HathiTrust and link to full text where available.

Add similar items? (without the current one also listed)

Creating a m-catalog: put all our holdings in worldcat and build a mobile site using the worldcat search api

3 Comments

  1. Laura says:

    Careful with your acronyms if you want to discuss the Linked Data Library Cloud with catalogers. I suspect by LCCN you mean Library of Congress Classification number but catalogers will read that as Library of Congress Control Number. Could you clarify what you were referring to? BTW, catalogers use LCC for LC classification, LCCN for LC control number (the unique identifier for LC bib and authority records within their ILS).

    Also – OCLC VIAF = Virtual International Authority File
    http://www.oclc.org/research/activities/viaf/default.htm

    Although given OCLC’s history with changing the meaning of their very own acronym from Ohio to Online etc. , it would surprise me if VIAF was changing its name too. 🙂

    I’m not a nitpicky person, but catalogers are. Using their terms will help techies help them to get on board with Linked Data.

  2. clbean says:

    Thanks, Laura.

    I was taking notes pretty fast, and edited them later. I should have caught that. In this case LCCN is the Library of Congress Control Number.

    Carol