Archive for the ‘Libraries’ Category.

Social Media ROI for libraries

We have all been indoctrinated in the importance of incorporating social media in our libraries’ outreach/marketing strategies, to the point one almost has to explain by way of apology if their library isn’t on social media. I am wondering, however, where is the evidence?  How do we know, first, that social media has any effect on our institutional bottom line other than from social media supporters connecting dots (e.g., surveys indicating social media use, and references to other supporters) and saying, “of course it does!” Where’s the data?

I am still exploring, so help me out if you actually have the evidence documented somewhere (e.g., an article or data set?).  But mind you, we are talking about libraries, and library-like institutions here, not commercial operations for which the conversion metrics of Google Analytics (or others) nicely work. Because hopefully we already know that libraries don’t so nicely fit that commercial model. Also, considering the sources used to justify the swooning over social media, the percentage of a national or global community’s use of social media does not necessarily translate to a library’s base (i.e., the community that pays for and uses it).

Focusing just on the U.S., because I work in a region within the U.S., I did find an interesting data set about what libraries are doing with social media and how they are handling it: (with a shoutout to the authors/librarians involved who evidently believe in open data and data sharing!).  It’s an appropriately complex set of data, but a quick scan through the survey, especially the write in responses, indicates social media integration is pretty hodgepodge, as if validating a feeling of distrust (as in, “what it this really going to do for us?”)

Doing a cursory literature search (limited to the last few years because this is such a quickly changing landscape), I came across one helpful article: Marketing Finding Aids on Social Media: What Worked and What Didn’t Work (, in which a research team selected ten social media sites to promote content, using email lists as well, and tracked and evaluated the click-throughs using Google Analytics. Although it presumes that social media marketing is needed (and I don’t dispute that position), they actually have the data to prove (1) its effectiveness, and (2) which ones give the best results.  Why can’t we have more like this?

Why can’t we have less “jump on the bandwagon” programs and courses and classes, and more instruction on assessment of need and measurement of impact?  How about classes that teach what data can be gathered, how to gather it, and how to use it?  Because maybe we should distrust social media’s usefulness.  What is it going to do for us?  Is there really a social media ROI for library-type institutions?

Automation and Small Libraries, and CornerThing

The situation hasn’t really changed in the world of library automation since last year’s post.  Libraries find what works for them, given their  economic and human resources.  What is different, is a new tool, developed with some virtual interns.  I call it CornerThing, because I’m not very creative with names. 🙂

I’ve got these small libraries (American Corners), where, for some of them, their “automation” consists of massive spreadsheets.  And LibraryThing.  Checkouts are still done by hand on cards.  They compile reports by hand, going through the cards each month, to send to me, or one of my colleagues.  It seemed like there must be an app to use LibraryThing to do more than just display a collection.  I searched and checked as only a Reference Librarian would. 🙂  Nothing was out there.  So how hard could it be to make an app that could capture checkout statistics (the part I was interested in)?

I originally wanted an iPad app, but rather than spend precious little free time on it myself, I decided to get a couple interns who we willing to learn some new skills while creating a simple app.  It was an interesting experience.  I didn’t get an iPad app, because no one applied for that project.  Several applied for the Android app project I added almost as an afterthought (why not? More options!).  So I got an Android app, now in beta, which about 1/3 of those small libraries, which already have Android tablets, can use.

CornerThing:  it syncs with a LibraryThing collection, downloading the metadata to the device, into a lightweight searchable database.  Subsequent “syncs” only add changes.  It’s possible to add an item in the app, but the syncing is not two-way.  Then there’s a searchable database for borrowers, entered on the fly, or by uploading a spreadsheet file (via computer connection).  Items from the collection can be checked out to borrowers, with a due date, and checked back in.  When an item is checked out, the data is captured on the item record and preserved.  Once the item is checked back in, the connection between the borrower and item is erased, but the numerical data on checkouts is retained on the item record, so reports can be generated by selected metadata (e.g., author, title, keyword).

CornerThing: a simple circulation app for small libraries (like American Corners) to take advantage of their LibraryThing collections.  I’m pretty sure it would work for other small libraries with limited resources. 🙂  It’s also open source. If you’re interested, send me a message.

Automation and small libraries – first look

It’s kind of amazing to me that after over fifteen years in this business, I’m looking at a situation that pretty much hasn’t changed for small libraries looking for an automation system.  There wasn’t much available for them at a reasonable cost back then, and there is even less today.  Go ahead. Show me where I’m wrong.

Seriously, a small library, a small public library, one that is supported, sometimes begrudgingly, by (too often non-existent) local public funds, does not have a lot to spend on annual fees for a library automation system.  They have even less to spend on a tech person who could install and maintain one of the FLOSS options.  And even if they do, there’s still the recurring cost of a web server to put it online.  O.K., maybe they could tie into the local government’s web server, assuming the local government entity has one (probably a “yes” in the U.S., but not in other parts of the world).

I actually did a complete retrospective conversion at a library years ago.  It was a special library that had the funds to support a quality system.  I was shocked and horrified at the “system” (basically a giant word processing document) that was being used to track titles.  There was no mechanism to track costs.  I have since come to appreciate the efforts of librarians in small libraries doing the best they can with what they have, and with the skill sets they have, to manage their collections.  Hello, World: librarians are amazing, and you should throw money at them, because they do amazing things with your resources.

So, in the history of library systems, the first stop is Marshall Breeding’s excellent record of what was, what is, and who ate up whom.  Although this doesn’t cover everything being used out there, it does give an interesting picture of the developing landscape.  I was surprised and amazed that the Georgia Public Library Service, back in 2004, chose to build its own automation system (Evergreen) for libraries in the state rather than use one of the existing systems.  There was, of course, Koha, out of New Zealand.  But that one got mired in nasty disputes over code and trademarks.  So far, so good, with Evergreen (keeping fingers crossed)

Next stop is Wikipedia’s list of Next-Generation library systems, especially the Comparison chart, but you might want to scan through the brief explanation of what Next-Generation means here.  The list is notable, because it includes both large systems and systems smaller libraries can use.  But note that these are all web-based systems.  Some of them, however, can function on a stand-alone basis, on the desktop.  This is important, because the most basic need of libraries is library management software.  Getting it on the web is secondary to librarians (although not to their patrons, of course).

So let’s take a stab at what some of the possibilities are for small (mostly) underfunded libraries today.  There are two perspectives to consider here:  libraries with staff that are systems-capable, and libraries with limited or no staff capable of managing the backend of a system.

In the first category (libraries with systems-capable staff), we have, first, systems that have a stand-alone option,

and second, systems that are add-ons to content management systems a library may be using, or have access to, for their web site (so far, I’m only seeing Drupal modules, are there any others out there?).

In the second category, it’s pretty bleak without (1) hiring someone, or a service, to set up a system, or (2) a funding stream to pay annual fees.  In the first case I refer back to one of the four above that can be installed on a stand-alone basis.  After installation, training would probably be required as well, despite the documentation.   In the second case, LibraryWorld seems to have a loyal following around the world.  I haven’t had an opportunity to look at it recently, so I don’t really have anything to say about it (yet).  Feel free to add comments below about your experience with it.

But LibraryWorld is a closed system, and if you are looking for something open source, there are

  • PMB Services (uses phpMyBibli)
  • OPALS  (they say they are open source, but I don’t see a download link – the acid test of open source)

There are, of course, larger open source systems, which may work for a consortium of libraries: one of the Koha versions, and Evergreen come to mind.  Both have companies that will install and customize the system.

Finally, there is LibraryThing, which is oddly ignored by everyone except those who want their collections online for their patrons and have no other way to do that.  Granted it is limited in terms of collection management:  checkouts? reports?  But it can work, because it is, actually, a next-generation cataloging system.  It’s online, it’s searchable, it’s fairly easy to add resources, the cataloging options are wide open (if that’s how you want to characterize keywords).  And even though the amount of resources that can be entered free of charge is limited  (with the option for more space requiring a small fee), most small library collections are pretty small. Best of all, it’s accessible by difference devices.  All we need is apps that extend the basic functionality offered with a LibraryThing account.

So here I am, looking for viable library management software options for small libraries outside of the U.S., and this is what I’ve come up with. Give my some feedback, library world.  Which of the options above are worth taking a longer look at?


Peer Review and Relevancy

The Code4Lib Journal exists to foster community and share information among those interested in the intersection of libraries, technology, and the future.  —Mission Statement, Code4Lib Journal

A colleague on the Code4Lib Journal’s editorial committee has posted a defense of the Journal’s position on peer review, or more specifically, double blind refereeing.  I was tempted, several months ago, to address the topic in the opening editorial for Issue 16, but was too preoccupied with food. 🙂  I don’t always see things the way Jonathan does, although I’ve learned over the years he gets it right a lot more times than I do.  In this case, we both agree that the Journal is, in fact, peer reviewed, but not double blind refereed, and we both agree this is a good thing.

Jonathan has, from my perspective, a rather entertaining analogy of the process as open source software development.  He also makes the argument that the Journal’s purpose is to provide value to our audience, and, just to make sure the elephant in the room is not ignored, stresses in very blunt terms that the Journal is not here to validate librarians’ or library technologists’ work for purposes of tenure or advancement.  I’m not going to disagree, but I am going to address the elephant differently.

I understand how “peer review” has been a good thing.  It has provided outside confirmation of the relevancy and quality of scholars’ work.  This was not the reason we started the Code4Lib Journal, however.  We were looking for a way to encourage and facilitate the (relatively) rapid dissemination of helpful information to our audience (which we did not see as being limited to people self-identified as members of  the Code4Lib community).

Because of this goal, we do things a little differently at the Code4Lib Journal.  We don’t require entire drafts be submitted, but we do have a rather tight timeline to publication, so it is obviously a plus when we get a draft up front, or when the author is already working on a draft and can get it to us quickly if the proposal is accepted.  All of us on the editorial committee are library technologists of some kind, although not all the same kind.  We all consider each submission, sometimes engaging in lively and lengthy discussions about a proposal.  In that regard, I would argue that the Journal is not just peer reviewed, it is uber-peer reviewed, because it is not one or two “peers” who are making the call on whether to accept a proposed article, it is typically 7-12 peers who are making the call.

Again, because of the goal to encourage and facilitate rapid dissemination of information, we are committed to working with authors to get their articles ready for publication.  This typically takes the form of (electronically) marking up a draft with specific comments and suggestions that we, as peers, think would make the article more useful to our audience.  The process sometimes goes through several iterations.  It doesn’t always work.  We have had some fall off the table, so to speak.  But the consistently high quality issues we have published are a testament to the validity of the process. Double-blind refereeing here would be an inherently inferior process for achieving our mission and goal.

But back to the elephant in the room.  Why, today, given the current state of disruptive technology in the publishing  industry, are we even talking about refereed journals?  I’m waiting for the other shoe to drop regarding that process.  Why are libraries still hyper-focused on double blind refereeing and peer review as validating mechanisms in tenure?  Isn’t it time to rethink that?  After all, libraries have already registered their disgust at being held hostage by journal publishers:  Universities require their scholars publish, and their libraries have to pay whatever the publisher demands for a license to see that scholarship.

Double-blind refereeing is time consuming.  So is what we do at the Code4Lib Journal. But I would posit that our way is more effective in identifying relevant information and  ensuring its quality.  How much ROI is there, really, in sticking with the old vetting process for validating tenure candidates?  May I suggest letting that other shoe drop and cut to the core of

  • Why is tenure needed in your institution?
  • How effective is the current tenuring system in supporting your institution’s mission and goals?
  • What value does scholarly publishing bring to your institution?
  • What are you willing to do to ensure continued scholarly publication?

The Code4Lib Journal was started 5 years ago by a message from Jonathan Rochkind to the Code4Lib mailing list asking, basically,  “who’s in?”  Change can be done.  It just takes someone willing to voice the call.  If publishing is important to tenure, send out a call to your colleagues to start a Code4Lib type journal.  If it looks too scary, ask.  I’m willing to help.  I’m sure there are others out there as well.


Drupal and other distractions

What started out as a 3-4 month hiatus to do an intranet site redesign, is now winding down after 6 months.  After the first couple months when it became apparent no progress was going to be made, I regrouped and put together a different, motivated but novice team. It’s been a little over three months since that team started on the project, and the results are impressive.  Although we could go live with it, I decided to do some “beta testing” on our unsuspecting end-users.  That has been enlightening: there may be some revisions in store before we finally get this baby to bed.

The terms usually associated with Drupal are “steep learning curve.”  I was the only one in my organization who even knew what Drupal is, although some seemed to have a vague concept of “content management system.” But I recommended we go with Drupal over other options because of (1) it’s potential, (2) the growing and active group of libraries with Drupal, and (3) because it’s the CMS I was most familiar with.  Looking back, I’d have done some things differently (isn’t that always the case?), but I would still choose Drupal. We haven’t fully taken advantage of all Drupal’s potential, but that’s only because I decided to hold off development of more advanced features until after we completed the initial project.

I was fortunate to have 2 others who were eager to learn and undaunted by Drupal’s complexity.  In a little over two months, with 1 1/2 days of one-on-one training and many many hours of phone conferences with them, they understand Drupal better than I did after two years of playing with it. This is a good thing, since they will likely be the ones left with the task of maintaining the site over the long run.  But we needed more than us three to migrate the content from the previous site, so I recruited 4 others, 3 of whom were apprehensive about approaching technology at this level.  One had a Technical Services background, and provided us with the taxonomy structure we needed. Two added content directly into special content types I set up, and one tracked down copyright-free pictures we needed.  It was an interesting exercise in project management: finding the team members we needed by dividing the tasks by skill level required, configuring Drupal to be easier to use for technophobes, and by approaching prospects individually to ask for help on a limited scale.

About half way through the project, as I struggled with trying to get the site to display the same in IE7 and Firefox, I shifted gears and decided to do the layout completely in CSS.  Actually I was shamed into it after a query to the Drupal library group.  I finished those changes just about the same time everything else fell into place.  And it works just fine in both browsers, thank you!  But we had been designing with the assumption most end users would be using a set screen size and resolution.  This week we discovered those assumptions were way off.  The good news is that we have found a lot more real estate to work with.  The bad news is that while things aren’t broken, the site doesn’t look quite the way we envisioned.

There may be some more tweaking involved, but there are now two others who have enough experience to do the tweaking.  Life is good.  Now to get back to the digitization project.

I’d be happy to take that off your hands

In the not too distant past, I was manning the reference desk, listening to a man say he had to come to the library to use the computers because his laptop was so badly infested with viruses that he had to throw it away.

“You threw it away?” I asked, incredulously.

“Yeah, it’s worthless now.  I can’t use it.  I’m just going to throw it away.”

Realizing he hadn’t actually thrown it away yet, but was willing to, I glibly asked if he’d throw it my way.  He looked at me incredulously at the same time I realized there was probably some intervening ethics involved.  So I said, “Or, I could show you how to make it usable again so there will never be another virus on it.”

He was still incredulous.  I assured him it can be done.  He wanted to know what he could do for me.  I told him “Never tell anyone about this,” forming a mental image of what would happen if he went out and told all his friends, or worse, wrote to the director about what I’d done for him.

He came back a couple days later, but didn’t have the laptop with him.  I hooked him up with a copy of Keir Thomas’ Beginning Ubuntu Linux, and a newer version of the CD included in the book.  He was still somewhat incredulous.  He left the book, but promised to come back the next day with the laptop.  Unfortunately, I didn’t see him again after that. I’m still wondering whether the original story was true, or if my comments prompted him to find someone to clean up the laptop for him.

I’ve since left that job.  Sometimes I miss the interesting world of public libraries.

Administrators vs. Technology

Somehow this post got lost in the drafts folder.  But since it’s an enduring topic, it’s still current. 🙂

A friend has some advice for library administrators:  The Top Ten Things Library Administrators Should Know About Technology.  It’s not a new subject, but it’s a topic that is being discussed openly more and more. 🙂  One gets the impression administrators are actually beginning to realize computer technology is not only not going to stand still, it is moving on at a dizzying pace that demands attention.

Now Roy Tennant is one of those icons in the library technology world who is worth listening to.  But technology geeks sometimes write in a language which makes the eyes of library administrators glaze over (been there, done that, got the T-shirt).  So I offer here a translation service for the first four items in Roy’s excellent post.

1. Technology isn’t as hard as you think it is.

The tools available for getting websites up and running are much easier than a few years ago, and it’s getting better each day.  Some things are still complicated (like writing software), but basic services don’t require that knowledge.

2. Technology gets easier all the time.

Installing special software used to be hard.  Today there are pre-packaged programs for complex software programs that make installation a snap.

3. Technology gets cheaper all the time.

Even if you pay a third party to store your web site and make it available on the Internet, the cost of what you can get today is much less than it was even a few years ago, and it keeps getting cheaper.

4. Maximize the effectiveness of your most costly technology investment — your people.

Hardware is cheap (all of it).  The expensive part of technology is knowledgeable staff.  Don’t make it harder for your expensive staff when the tools are so cheap by comparison.

The rest don’t need translating. 🙂

These really are points that need to be made again and again until administrators start feeling more comfortable with the technology side of library services.  The problem is, are any administrators listening?  Really listening?  Roy has a larger library audience than I have 🙂  Maybe there will be a few who will read and take heart, especially since LISnews posted it as news.

Creating a Comparison Matrix

Charles Bailey has published a very helpful bibliography (Digital Curation and Preservation Bibliography, v.1), from which the resources below were gleaned.  In addition, I have been adding resources to Mendeley, a research management tool: Digital Curation, Digital Library Best Practices & Guidelines, Digital Library Systems, and Metadata.

I have added a few more open source items, and a lot of proprietary systems I discovered thanks to Mr. Bailey’s rich resource.  I am constructing a matrix of features for comparison, borrowing from the reports above and my initial chart, based mainly on features that are most important for our needs:

  • Product
  • URL
  • Owned by/Maintained by
  • License type
  • Runs on (OS)
  • Database
  • Server Software
  • Interoperability with Digital Repository Systems
  • Works with (what other software)
  • Programming Lang
  • Additional hardware or software required
  • Hosting available
  • OAI-PMH?
  • Rights management
  • Manage Restricted Materials
  • User submission
  • Set processing priorities
  • Manage processing status
  • Localization options
  • Formats supported
  • Image file import (TIFF, JPEG, etc.)
  • A/V file import
  • Text file import (TEI, PDF, etc.)
  • Image file management w/ associated metadata
  • A/V file management w/ associated metadata
  • Text file management w/ associated metadata
  • Batch edit
  • DC type
  • METS
  • MODS
  • MARC
  • Imports (MARC , EAD, Tab Delimited/CSV
  • Batch Import (MARC, EAD, CSV)
  • Exports (MARC, EAD, MADS, MODS, METS, Dublin Core, EAC, Tab Delimited)
  • Batch Exports (MARC, EAD, MADS, MODS, METS, Dublin Core, EAC, Tab Delimited)
  • Easy Data Entry
  • Spell Check
  • Other Schemas
  • Create description record from existing record and automatically populate fields
  • Item-level Description
  • Link accession and description records
  • Link accession record to multiple description records
  • Link description record to multiple accession records
  • Hierarchical – fonds, collection, sous-fonds, series, sub-series, files, items and link with its parts in the hierarchy.
  • Ability to reorganize hierarchies
  • Flexibility of Data Model
  • Templating/default fields
  • Controlled vocabularies
  • Authority Records
  • Link authority record to unlimited description records
  • Link description record to unlimited authority records
  • Compliance to Archival Standards
  • Data validation
  • Backup/Restore utility
  • Integrated Web Publication
  • Public search interface
  • Advanced search (by field)
  • Faceted Search
  • Browse levels
  • Search results clearly indicate hierarchical relationships of records
  • Records linked to other parts of hierarchy
  • User Access and Data Security Function
  • Control who can delete records
  • User permissions management
  • Control when record becomes publicly accessible
  • Feeds
  • Install Notes
  • Forum/List URL
  • Bug tracker URL
  • Feature Req URL
  • Trial/demo/sandbox
  • Training available
  • Technical support provided by developers
  • User Manuals (user, admin)
  • Context-specific help
  • Page turning
  • Developer customization available
  • User customization permitted
  • What reports
  • Customize reports
  • Repository statistics
  • Plugins
  • UTF

Comparing Digital Library Systems

I am currently evaluating options for implementing a digital library.  It’s an ongoing process. :o)  Since there are probably more proprietary systems out there, I’m hoping people will leave comments letting me know about them (same thing for open source).  I’ll post the charted results when I’m done (hopefully in the near future).

There are several digital asset management systems for digital libraries. On the proprietary side (closed source) there are (this is not an exhaustive list):

  • ContentDM (OCLC): software that handles the storage, management and delivery of library digital collections to the Web
  • DigiTool (ExLibris)
  • Archivalware (PTFS): a web-based, full-text search and retrieval content management system.
  • SKCA (CuadraStar):  Star Knowledge Center for Archives
  • Eloquent: A suite of applications, Librarian (ILS), Archives (software for physical archives management). Records (records management), Museum, which can be purchased individually or combined for a complete content management system (Museum+Librarian+Archives).
  • Mint: a “cultural asset management system” mix of their individual products M2A (archives), M2L (libraries), and M3 (museums).  Based in Canada (Link updated).
  • PastPerfect: primarily for museums, includes library integration.
  • Proficio: collections management system from Re:discovery.
  • Gallery Systems: a suite of software products for management and web publishing
  • Questor Argus: Collection management and portal software (Link updated).
  • Mimsy XG: collection management and web publishing software (Link updated).
  • IDEA: content management and web publishing software, with modules for libraries, archives, and museums
  • EMu: Museum and Archive management software from KEsoft, (includes web publishing)
  • Digital Commons: A repository system developed by Berkeley Electronic Press.  They set up and maintain a hosted site.
  • SimpleDL: options for hosted library or licensed software on a local server.  Unfortunately, there is not much information on who, what, or how within the site.
  • AdLib: Library, archival, and museum software systems from Adlib Information Systems.  There is a free “lite” version of the Library and Museum software (requires registration).

On the open source side, there are (also not an exhaustive list):

  • CollectiveAccess: a highly configurable cataloguing tool and web-based application for museums, archives and digital collections. There is a demo to try it out. (Link updated).
  • Greenstone: a suite of software for building and distributing digital library collections.Greenstone is produced by the New Zealand Digital Library Project
  • Omeka: a free, flexible, and open source web-publishing platform for the display of library, museum, archives, and scholarly collections and exhibitions.  There is a sandbox to try it out.
  • DSpace: software to host and manage subject based repositories, dataset repositories or media based repositories
  • ResourceSpace: a web-based, open source digital asset management system which has been designed to give your content creators easy and fast access to print and web ready assets.)
  • CDS Invenio:  a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. (Link updated).
  • Islandora: A project combining Fedora and Drupal (web content management system).  It has a VirtualBox demo download available. (Link updated).
  • Razuna: an open source digital asset management with hosting options and consulting services to set up and deploy the system.
  • Digital Collection Builder (DCB):  from, a software distribution built from the Qubit Toolkit for Libraries & Museums. (Updated URL goes to tools)
  • ICA-AtoM Project: (“International Council on Archives – Access to Memory”): a software distribution built from the Qubit Toolkit, for Archives.  An online demo is available, as well as a downloadable version (update: see this site for currently supported version).
  • CollectionSpace: a collections management system and collection information system platform, primarily for museums. Current version is 0.6
  • NotreDAM: Open source system developed in Italy by Sardegna Richerche.  A demo (updated URL; software on GitHub) is available, as well as documentation (update: see GitHub project page).  It is not a trivial install, requiring two instances of Ubuntu 9.10, but there is a VirtualBox (update: see GitHub location) instance for evaluation purposes. (Link updated).

There is also repository software, like Fedora, which can be used with a discovery interface such as Blacklight, or Islandora.

The main difference between proprietary systems and the open source systems listed above is economics.  While the argument in the past has been that open source systems are not as developed and require more in-house expertise to implement, that is not the case any more.  For one thing, even proprietary systems require in-house expertise in varying levels in order to realize full functionality of their features (see, e.g., Creating an Institutional Repository for State Government Digital Publications).  For another, as the number of libraries implementing Digital Libraries with resource discovery have increased, development of Digital Asset Management Systems has matured beyond the Alpha, and sometimes even Beta, stage.  Open source Systems which did not reach critical mass have quietly died or been absorbed into better supported products.  In the proprietary field, systems typically are developed within a parent organization that includes other software, such as an Integrated Library System, whose profits support R&D for the DAM.

So, while economics should broadly encompass all aspects of  implementation, including time and asset costs, in this case the economics is primarily the money involved, since the difference in the other factors has pretty much been leveled.  With any system, you will be involved in user forums, in bug fix requests, in creating (or updating) documentation, in training, in local tweaking, with or without outside help.  Proprietary systems are currently asking between $10,000 and $20,000 per year for a (relatively) small archive, from what I have seen and heard.

Another issue which may come up is “Cloud Computing.”  Proprietary vendors (and even some open source systems) offer the option of hosting your digital library repository (where all the digital objects live) on their servers.  The issue with remote hosting, of course, is control.  Who has ultimate control and responsibility for the items in the repository?  If the archive is intended to be open and public, the issue is more one of accountability and curation:  how securely is the data being backed up, and what is/will be done to ensure long term viable access?

If the archive is intended to be for local use only (for example, on an intranet), the issues change dramatically regarding remote hosting by an outside vendor.  It is no longer just a matter of secure backups, but the security of the system itself.  Who can access the respository?  How secure is the repository from outside crackers?  With even Google admitting to a breach of their network security, how much security can be expected from a vendor?

In some cases, we may want both public and private (local) access to archive materials.  While originally my thinking was to simply control access using the metadata for each object, others more experienced than I am recommend creating separate repositories for public and private archives, which adds another layer of complexity.

UPDATE:  Added Digital Collection Builder (DCB) and ICAToAtoM (2010/5/5)

UPDATE: Added CollectionSpace, Eloquent, Mint, PastPerfect, Proficio, Gallery Systems, Questor Argus, Mimsy XG, IDEA, EMu, and Digital Commons (2010/5/21)

UPDATE: Added SimpleDL, AdLib, and NotreDAM (2010/6/10)

UPDATE:  Fixed broken links:  Mint, DSpace, Islandora demo, removed reference to online demo for Digital Collection Builder (2011/4/11)

UPDATE: Multiple broken links updated (2016/08/07)

Code4Lib2010 Notes from Day 3

Keynote #2:  Paul Jones

catfish, cthulhu, code, clouds and levenshtein cloud (what is the levenshtein distance between Cathy Marshall and Paul Jones?)

Brains: create the code using certain assumptions about the recipient, which may or may not be accurate

Brains map based on how they are used.

(images showing difference in brain usage between users who are Internet Naive, and Internet Savvy: for Internet Naive, “reading text” image closely matches “Internet searching” image, but is very different for the Internet Savvy image)

Robin Dunbar: anthropologist who studies apes & monkeys.  Grooming, gossip, and the evolution of language (book).  Neocortex ratio (ability to retain social relationship memory).  “Grooming” in groups maintains relationship memory.  In humans, the relationship can be maintained long distance via communication (“gossip”).  Dunbar circles of intimacy; lower the number, higher the “intimacy” relationship.

Who do you trust and how do you trust them?

Attributed source matters (S.S. Sundar, & C. Nass: “Source effects in users’ perception of online news”).  Preference for source: 1.others,, 3.self, editor; quality preference: 1.others,, editors, 4.self (others=random preset; computer=according to computer generated profile; self=according to psychological profile)

Significance:  small talk leads to big talk, which leads to trust.

Small talk “helps” big talk when there is: 1. likeness (homophily) 2. grooming and gossip 3. peer to peer in an informal setting 4. narrative over instruction

Perception of american nerds problems: 1.ADD 2. asperger (inability to get visual emotional cues) 3.hyperliterality, & jargon of the tasks and games 4. friendless, 5. idiocentric humor(but this gets engineered away?)

but they: 1.multitask, 2. use text based interactions (visual cues become emoticons), 3.mainstream jargon into slang, 4. redefine friendship 5. use the power of Internet memes, shared mindspace

The gossipy part of social interactions around information is what makes it accessible, memorable and actionable.  But it’s the part we strip out.

Lightning Talks

Batch OCR with open source tools

Tesseract (no layout analysis) & Ocropus (includes layout analysis): both in google code (python script builds a PDF file from an image) for code

VuFind at Western Mich.U

Uses marc 005 field to determine new book (now -5 days, now -14 days, etc.)

Please clean my data

Cleaning harvested metadata & cleaning ocr text

Transformation and translation steps added to the harvesting process (all metadata records are in xml); regex as part of step templates; velocity template variables store each regex (gives direct access to the xml elements, then use the java dom4j api to do effectively whatever we wish)

Using newspapers:  example of “fix this text” (within browser).  Cool

Who the heck uses Fedora disseminators anyway?

Fedora content models are just content streams.

(They put their stuff in Drupal.)

Disseminator lives in Fedora, extended with PHP to display in Drupal, & edit

Library a la carte Update

Ajax tool to create library guides, open source style

Built on building blocks (reuse, copy, share)

Every guide type has a portal page (e.g., subject guide, tutorial (new), quiz)

Local install or hosted: tech stack: ruby, gems, rails, database, web server

Open source evangelism:  open source instances in the cloud!

Digital Video Made Easier

Small shop needed to set up video services quickly.  Solution: use online video services for ingest, data store, metadata, file conversion, distribution, video player.

Put 3 demos in place (will be posted on twitter stream): api, youtube api

Disadvantages: data in the cloud, Terms of service, api lag, varying support (youtube supportive, blip not so much)


Tool to help students find physical space more easily.

Launched oct. 2009, approx 65 posts per week.

php + MySql + jQuery

creativecommons licensed

EAD, Apis & Cooliris

Limited by contentdm (sympathy from audience), but tricked out to integrate cooliris.

(Pretty slick)


You Either Surf or You Fight: Integrating Library Services With Google Wave

Why?  go where your users are

Wave apps: gadgets & robots (real time interaction)

Real time interaction

Google has libraries in java and python, deployed using google app engine (google’s cloud computing platform.  Free for up to 1.3 million requests per day)

Create an app at; Get app engine SDK, which includes the app engine launcher (creates skeleton application & deploys into app engine)

Set up the app.yaml (give app a name, version number [really important]).  api version is the api version of app engine.  Handler url is /_wave/.*

Get the wave robot api (not really stable yet, but shouldn’t be a problem) & drop it into the app directory

Wave concepts:  wavelet is the current conversation taking place within google wave; blip is each message part of the wavelet (hierarchical); each (wavelet & blip) have unique identifiers, which can be programmatically addressed.

Code avail on

External libs are ok – just drop in the proj directory.  Using beautifulsoup to scrape the OPAC

Google wave will do html content, sort of.  Use OpBuilder instead (but it is not well documented).  Doesn’t like CSS

For debugging, use logging library (one of the better parts)