Notes
Outline
OAI Metadata Harvesting with Theses and Dissertations
Thomas Hickey, OCLC
Access 2001
Background
OCLC Office of Research
http://www.oclc.org/research/
NDLTD
Networked Digital Library of Theses and Dissertations
http://www.ndltd.org/
OAI
Open Archive Initiative
http://www.openarchives.org/
ALCME Project
Exploring
Open source platforms
Web-accessible services
OAI Metadata Harvesting
Using metadata for theses and dissertations
NDLTD
Networked Digital Library of Theses and Dissertations
Run out of Virginia Tech
Some 100+ members interested in improving access
OAI servers seem a natural direction
NDLTD (cont.)
Released a metadata standard
EDTMS
Dublin Core-like with own namespace
Currently XML based
RDF a possibility
Incorporates linking to a distributed name authority database
OAIMH protocol
Open Archives Initiative Metadata Harvesting protocol (http://www.openarchives.org/OAI/openarchivesprotocol.htm)
Allows efficient harvesting of metadata
The Vision
Make information widely available
Allow layering of systems:
OAI Protocol
Uses HTTP
Fairly simple URLs to
Identify the server
List formats, record sets
Get records (can specify date modified)
Has flow control so that large sets can be managed
OAI Protocol continued
XML version of Dublin Core is required
Other metadata formats possible (wrapped in XML)
Typical uses:
Publish metadata for a special collection
Use to keep two catalogs synchronized
OAI at OCLC
Currently in Office of Research
Publishing
Harvesting
Building services
Open source
Publishing
Catalog of WorldCat theses and dissertations records
Currently have 100,000 available
Plan to have all 3,000,000 up
Starting to embed services in the records
Authority links
OCLC Open Access links
Harvesting
Harvesting a variety of OAI servers
Making them available in a single catalog
Most theses are already in WorldCat
May be able to get more foreign theses
Harvesting Example
Services
Links into WorldCat information (planned)
Associated searches
Holdings
ILL?
Name authority links
Name Authorities on OAI
Name Authorities (cont.)
Layered on top of OAI
URL associated with each entry
Mechanisms for synchronization, publication
http://alcme.oclc.org/ndltd/AuthLink.html
Open Source
Gwen/Pears
database and search engine
Scorpion
RDF toolkits (EOR, Perl based)
http://www.oclc.org/research/software/
Future Work
Loading more records
Supporting NDLTD ETDMS
Adding test services to records
Incorporating Authority files into ACE
Analysis of harvested metadata
ZNG SOAP server
ACE
New research project
Advanced Collection Environment
ASP model for managing collections
Starting with personal collections
Simplifies
Allows more experimentation
Expect much to cross over to libraries
ACE (cont.)
Dublin Core based
First few hundred records are free
Tries to be a complete service
Targeted for the serious collector
Emphasis on management, not commerce
Expect testing within OCLC Research in October, testing outside late this year