Archive for the “Libraries” Category

Somehow this post got lost in the drafts folder.  But since it’s an enduring topic, it’s still current. :-)

A friend has some advice for library administrators:  The Top Ten Things Library Administrators Should Know About Technology.  It’s not a new subject, but it’s a topic that is being discussed openly more and more. :)   One gets the impression administrators are actually beginning to realize computer technology is not only not going to stand still, it is moving on at a dizzying pace that demands attention.

Now Roy Tennant is one of those icons in the library technology world who is worth listening to.  But technology geeks sometimes write in a language which makes the eyes of library administrators glaze over (been there, done that, got the T-shirt).  So I offer here a translation service for the first four items in Roy’s excellent post.

1. Technology isn’t as hard as you think it is.

The tools available for getting websites up and running are much easier than a few years ago, and it’s getting better each day.  Some things are still complicated (like writing software), but basic services don’t require that knowledge.

2. Technology gets easier all the time.

Installing special software used to be hard.  Today there are pre-packaged programs for complex software programs that make installation a snap.

3. Technology gets cheaper all the time.

Even if you pay a third party to store your web site and make it available on the Internet, the cost of what you can get today is much less than it was even a few years ago, and it keeps getting cheaper.

4. Maximize the effectiveness of your most costly technology investment — your people.

Hardware is cheap (all of it).  The expensive part of technology is knowledgeable staff.  Don’t make it harder for your expensive staff when the tools are so cheap by comparison.

The rest don’t need translating. :)

These really are points that need to be made again and again until administrators start feeling more comfortable with the technology side of library services.  The problem is, are any administrators listening?  Really listening?  Roy has a larger library audience than I have :)   Maybe there will be a few who will read and take heart, especially since LISnews posted it as news.

Comments No Comments »

Charles Bailey has published a very helpful bibliography (Digital Curation and Preservation Bibliography, v.1), from which the resources below were gleaned.  In addition, I have been adding resources to Mendeley, a research management tool: Digital Curation, Digital Library Best Practices & Guidelines, Digital Library Systems, and Metadata.

I have added a few more open source items, and a lot of proprietary systems I discovered thanks to Mr. Bailey’s rich resource.  I am constructing a matrix of features for comparison, borrowing from the reports above and my initial chart, based mainly on features that are most important for our needs:

  • Product
  • URL
  • Owned by/Maintained by
  • License type
  • Runs on (OS)
  • Database
  • Server Software
  • Interoperability with Digital Repository Systems
  • Works with (what other software)
  • Programming Lang
  • Additional hardware or software required
  • Hosting available
  • OAI-PMH?
  • Rights management
  • Manage Restricted Materials
  • User submission
  • Set processing priorities
  • Manage processing status
  • Localization options
  • Formats supported
  • Image file import (TIFF, JPEG, etc.)
  • A/V file import
  • Text file import (TEI, PDF, etc.)
  • Image file management w/ associated metadata
  • A/V file management w/ associated metadata
  • Text file management w/ associated metadata
  • Batch edit
  • DC type
  • METS
  • MODS
  • MARC
  • Imports (MARC , EAD, Tab Delimited/CSV
  • Batch Import (MARC, EAD, CSV)
  • Exports (MARC, EAD, MADS, MODS, METS, Dublin Core, EAC, Tab Delimited)
  • Batch Exports (MARC, EAD, MADS, MODS, METS, Dublin Core, EAC, Tab Delimited)
  • Easy Data Entry
  • Spell Check
  • PREMIS?
  • Other Schemas
  • Create description record from existing record and automatically populate fields
  • Item-level Description
  • Link accession and description records
  • Link accession record to multiple description records
  • Link description record to multiple accession records
  • Hierarchical – fonds, collection, sous-fonds, series, sub-series, files, items and link with its parts in the hierarchy.
  • Ability to reorganize hierarchies
  • Flexibility of Data Model
  • Templating/default fields
  • Controlled vocabularies
  • Authority Records
  • Link authority record to unlimited description records
  • Link description record to unlimited authority records
  • Compliance to Archival Standards
  • Data validation
  • Backup/Restore utility
  • Integrated Web Publication
  • Public search interface
  • Advanced search (by field)
  • Faceted Search
  • Browse levels
  • Search results clearly indicate hierarchical relationships of records
  • Records linked to other parts of hierarchy
  • User Access and Data Security Function
  • Control who can delete records
  • User permissions management
  • Control when record becomes publicly accessible
  • Feeds
  • Install Notes
  • Forum/List URL
  • Bug tracker URL
  • Feature Req URL
  • Trial/demo/sandbox
  • Training available
  • Technical support provided by developers
  • User Manuals (user, admin)
  • Context-specific help
  • Page turning
  • Developer customization available
  • User customization permitted
  • What reports
  • Customize reports
  • Repository statistics
  • Plugins
  • UTF

Comments 2 Comments »

I am currently evaluating options for implementing a digital library.  It’s an ongoing process. :o )  Since there are probably more proprietary systems out there, I’m hoping people will leave comments letting me know about them (same thing for open source).  I’ll post the charted results when I’m done (hopefully in the near future).

There are several digital asset management systems for digital libraries. On the proprietary side (closed source) there are (this is not an exhaustive list):

  • ContentDM (OCLC): software that handles the storage, management and delivery of library digital collections to the Web
  • DigiTool (ExLibris)
  • Archivalware (PTFS): a web-based, full-text search and retrieval content management system.
  • SKCA (CuadraStar):  Star Knowledge Center for Archives
  • Eloquent: A suite of applications, Librarian (ILS), Archives (software for physical archives management). Records (records management), Museum, which can be purchased individually or combined for a complete content management system (Museum+Librarian+Archives).
  • Mint: a “cultural asset management system” mix of their individual products M2A (archives), M2L (libraries), and M3 (museums).  Based in Canada
  • PastPerfect: primarily for museums, includes library integration.
  • Proficio: collections management system from Re:discovery.
  • Gallery Systems: a suite of software products for management and web publishing
  • Questor Argus: Collection management and portal software
  • Mimsy XG: collection management and web publishing software
  • IDEA: content management and web publishing software, with modules for libraries, archives, and museums
  • EMu: Museum and Archive management software from KEsoft, (includes web publishing)
  • Digital Commons: A repository system developed by Berkeley Electronic Press.  They set up and maintain a hosted site.
  • SimpleDL: options for hosted library or licensed software on a local server.  Unfortunately, there is not much information on who, what, or how within the site.
  • AdLib: Library, archival, and museum software systems from Adlib Information Systems.  There is a free “lite” version of the Library and Museum software (requires registration).

On the open source side, there are (also not an exhaustive list):

  • CollectiveAccess: a highly configurable cataloguing tool and web-based application for museums, archives and digital collections. There is a demo to try it out.
  • Greenstone: a suite of software for building and distributing digital library collections.Greenstone is produced by the New Zealand Digital Library Project
  • Omeka: a free, flexible, and open source web-publishing platform for the display of library, museum, archives, and scholarly collections and exhibitions.  There is a sandbox to try it out.
  • DSpace: software to host and manage subject based repositories, dataset repositories or media based repositories
  • ResourceSpace: a web-based, open source digital asset management system which has been designed to give your content creators easy and fast access to print and web ready assets.)
  • CDS Invenio:  a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. There is a demo here.
  • Islandora: A project combining Fedora and Drupal (web content management system).  It has a VirtualBox demo download available.
  • Razuna: an open source digital asset management with hosting options and consulting services to set up and deploy the system.
  • Digital Collection Builder (DCB):  from Canadiana.org, a software distribution built from the Qubit Toolkit for Libraries & Museums.  There is an online demo available
  • ICA-AtoM Project (“International Council on Archives – Access to Memory”): a software distribution built from the Qubit Toolkit, for Archives.  An online demo is available, as well as a downloadable version.
  • CollectionSpace: a collections management system and collection information system platform, primarily for museums. Current version is 0.6
  • NotreDAM: Open source system developed in Italy by Sardegna Richerche.  A demo is available, as well as documentation.  It is not a trivial install, requiring two instances of Ubuntu 9.10, but there is a VirtualBox instance for evaluation purposes.

There is also repository software, like Fedora, which can be used with a discovery interface such as Blacklight, or Islandora.

The main difference between proprietary systems and the open source systems listed above is economics.  While the argument in the past has been that open source systems are not as developed and require more in-house expertise to implement, that is not the case any more.  For one thing, even proprietary systems require in-house expertise in varying levels in order to realize full functionality of their features (see, e.g., Creating an Institutional Repository for State Government Digital Publications).  For another, as the number of libraries implementing Digital Libraries with resource discovery have increased, development of Digital Asset Management Systems has matured beyond the Alpha, and sometimes even Beta, stage.  Open source Systems which did not reach critical mass have quietly died or been absorbed into better supported products.  In the proprietary field, systems typically are developed within a parent organization that includes other software, such as an Integrated Library System, whose profits support R&D for the DAM.

So, while economics should broadly encompass all aspects of  implementation, including time and asset costs, in this case the economics is primarily the money involved, since the difference in the other factors has pretty much been leveled.  With any system, you will be involved in user forums, in bug fix requests, in creating (or updating) documentation, in training, in local tweaking, with or without outside help.  Proprietary systems are currently asking between $10,000 and $20,000 per year for a (relatively) small archive, from what I have seen and heard.

Another issue which may come up is “Cloud Computing.”  Proprietary vendors (and even some open source systems) offer the option of hosting your digital library repository (where all the digital objects live) on their servers.  The issue with remote hosting, of course, is control.  Who has ultimate control and responsibility for the items in the repository?  If the archive is intended to be open and public, the issue is more one of accountability and curation:  how securely is the data being backed up, and what is/will be done to ensure long term viable access?

If the archive is intended to be for local use only (for example, on an intranet), the issues change dramatically regarding remote hosting by an outside vendor.  It is no longer just a matter of secure backups, but the security of the system itself.  Who can access the respository?  How secure is the repository from outside crackers?  With even Google admitting to a breach of their network security, how much security can be expected from a vendor?

In some cases, we may want both public and private (local) access to archive materials.  While originally my thinking was to simply control access using the metadata for each object, others more experienced than I am recommend creating separate repositories for public and private archives, which adds another layer of complexity.

UPDATE:  Added Digital Collection Builder (DCB) and ICAToAtoM (2010/5/5)

UPDATE: Added CollectionSpace, Eloquent, Mint, PastPerfect, Proficio, Gallery Systems, Questor Argus, Mimsy XG, IDEA, EMu, and Digital Commons (2010/5/21)

UPDATE: Added SimpleDL, AdLib, and NotreDAM (2010/6/10)

Comments 11 Comments »

Keynote #2:  Paul Jones

catfish, cthulhu, code, clouds and levenshtein cloud (what is the levenshtein distance between Cathy Marshall and Paul Jones?)

Brains: create the code using certain assumptions about the recipient, which may or may not be accurate

Brains map based on how they are used.

(images showing difference in brain usage between users who are Internet Naive, and Internet Savvy: for Internet Naive, “reading text” image closely matches “Internet searching” image, but is very different for the Internet Savvy image)

Robin Dunbar: anthropologist who studies apes & monkeys.  Grooming, gossip, and the evolution of language (book).  Neocortex ratio (ability to retain social relationship memory).  ”Grooming” in groups maintains relationship memory.  In humans, the relationship can be maintained long distance via communication (“gossip”).  Dunbar circles of intimacy; lower the number, higher the “intimacy” relationship.

Who do you trust and how do you trust them?

Attributed source matters (S.S. Sundar, & C. Nass: “Source effects in users’ perception of online news”).  Preference for source: 1.others, 2.computer, 3.self, 4.news editor; quality preference: 1.others, 2.computer, 3.news editors, 4.self (others=random preset; computer=according to computer generated profile; self=according to psychological profile)

Significance:  small talk leads to big talk, which leads to trust.

Small talk “helps” big talk when there is: 1. likeness (homophily) 2. grooming and gossip 3. peer to peer in an informal setting 4. narrative over instruction

Perception of american nerds problems: 1.ADD 2. asperger (inability to get visual emotional cues) 3.hyperliterality, & jargon of the tasks and games 4. friendless, 5. idiocentric humor(but this gets engineered away?)

but they: 1.multitask, 2. use text based interactions (visual cues become emoticons), 3.mainstream jargon into slang, 4. redefine friendship 5. use the power of Internet memes, shared mindspace

The gossipy part of social interactions around information is what makes it accessible, memorable and actionable.  But it’s the part we strip out.

Lightning Talks

Batch OCR with open source tools

Tesseract (no layout analysis) & Ocropus (includes layout analysis): both in google code

HocrConverter.py (python script builds a PDF file from an image)

xplus3.net for code

VuFind at Western Mich.U

Uses marc 005 field to determine new book (now -5 days, now -14 days, etc.)

Please clean my data

Cleaning harvested metadata & cleaning ocr text

Transformation and translation steps added to the harvesting process (all metadata records are in xml); regex as part of step templates; velocity template variables store each regex (gives direct access to the xml elements, then use the java dom4j api to do effectively whatever we wish)

Using newspapers:  example of “fix this text” (within browser).  Cool

Who the heck uses Fedora disseminators anyway?

Fedora content models are just content streams.

(They put their stuff in Drupal.)

Disseminator lives in Fedora, extended with PHP to display in Drupal, & edit

Library a la carte Update

Ajax tool to create library guides, open source style

Built on building blocks (reuse, copy, share)

Every guide type has a portal page (e.g., subject guide, tutorial (new), quiz)

Local install or hosted: tech stack: ruby, gems, rails, database, web server

Open source evangelism:  open source instances in the cloud!

Digital Video Made Easier

Small shop needed to set up video services quickly.  Solution: use online video services for ingest, data store, metadata, file conversion, distribution, video player.

Put 3 demos in place (will be posted on twitter stream): blip.tv api, youtube api

Disadvantages: data in the cloud, Terms of service, api lag, varying support (youtube supportive, blip not so much)

GroupFinder

Tool to help students find physical space more easily.

Launched oct. 2009, approx 65 posts per week.

php + MySql + jQuery

creativecommons licensed

EAD, Apis & Cooliris

Limited by contentdm (sympathy from audience), but tricked out to integrate cooliris.

(Pretty slick)

Talks

You Either Surf or You Fight: Integrating Library Services With Google Wave

Why?  go where your users are

Wave apps: gadgets & robots (real time interaction)

Real time interaction

Google has libraries in java and python, deployed using google app engine (google’s cloud computing platform.  Free for up to 1.3 million requests per day)

Create an app at appengine.google.com; Get app engine SDK, which includes the app engine launcher (creates skeleton application & deploys into app engine)

Set up the app.yaml (give app a name, version number [really important]).  api version is the api version of app engine.  Handler url is /_wave/.*

Get the wave robot api (not really stable yet, but shouldn’t be a problem) & drop it into the app directory

Wave concepts:  wavelet is the current conversation taking place within google wave; blip is each message part of the wavelet (hierarchical); each (wavelet & blip) have unique identifiers, which can be programmatically addressed.

Code avail on github.com/MrDys/

External libs are ok – just drop in the proj directory.  Using beautifulsoup to scrape the OPAC

Google wave will do html content, sort of.  Use OpBuilder instead (but it is not well documented).  Doesn’t like CSS

For debugging, use logging library (one of the better parts)

Comments Comments Off

A better advanced search

http://searchworks.stanford.edu

How to filter multiple similar titles by the same author, or multiple author instances (artist as author, as subject, as added author), or combine multiple facet values

At start: no drop down boxes, only titled text boxes, based on above.  Keyword (& Item Description) 3rd on the list; “Subject Terms” instead of just “Subject”

Dismax, & Solr local params:  local param syntax: _query_:{dismax qf …..}

jQuery functions added to multi-facet search boxes; also added faceting to results (actionable facets)

The search breadcrumbs got really complex.

Drupal 7: A more powerful platform for building library applications

Has a new Information Architecture, writes things into “contexts” (attempt to make it easier for end user)

Users can cancel their own account

New admin theme, toolbars & shortcuts (taken from admin menu toolbar module); dashboard (add what you want)

Uses overlays (rather than changing page)

Module selection screen changed to landscape table view.

Permission screen: allows admin role (same as 1st user)

Install options are default, or minimal profile.

Minimum Software Requirements: php 5.2, mysql 5.0 (or postgres 5.0)

File System changes: separate public and private paths

Has native imaging handling out of the box

email security notifications set automatically with install; php filter module now global.

cron.php requires key in url to run

Field UI (included in core) draws from cck module in Drupal 6.  Types: boolean, decimal, file, images, list, text taxonomy, etc.; can apply to almost anything

Update manager: upload and install a module/theme from drupal

Page elements are assignable; templating system changed (more consistent?)

The base theme is based on Zen: Stark (naked)

Theming of content is now granular (content can be pulled from container for theming)

Javascript uses jQuery 1.3, jQuery Forms 2.2 & jQuery UI 1.7; ajax framework from cTools

Really backend stuff: 5.0 database abstraction layer can utilize PHP Data Objects; dynamic select queries; stream wrapper: URI’s can be referenced; field API not node specific & any element can be fieldable

Enhancing Discoverability With Virtual Shelf Browse

Displays book covers, with mouseovers; scrolls right and left

Not everything has a cover image; uses “faux covers” ala google books

Goal: browse arbitrary number of titles around a known item in call number order, including online & all locations

Daily output from ILS to delimited text, then db ingest with python, to call number index in mysql; call number is in alternate formats w/in table records.

Front end challenges: DOM=SLOW; multiple plugins = headache; remote servers = latency issues; too much ajax = browser issues; IE not a friend (doh!)

How to Implement A Virtual Bookshelf With Solr

ILS is Sirsi.

Problems with dirty data, & no standard call no (incl. sudocs & theses/dissertation numbering schemes)

Comments No Comments »