Notes
Outline
Implementing FRBR on Large Databases
Thomas Hickey
Diane Vizine-Goetz
OCLC Research
What is FRBR
IFLA study group report: Functional Requirements for Bibliographic Records
Bibliographic model independent of cataloging rules
Clusters bibliographic items into a four-level structure
Work
Expression
Manifestation
Item
Control of Entities in FRBR
Why FRBR?
Potential to improve:
Cataloging
Discovery
Delivery
By
Bringing versions of works together
Showing relationships of various kinds
Enabling users to navigate to level of interest
Research on FRBR & WorldCat
Subsets
By library, region
Example/problem sets
Shakespeare, the Bible
Humphry Clinker
1,000 random works
By genre
Dissertations
Fiction
Whole file, 47 million bibliographic records
Our Approach
Concentrating on work-level
Problems with expression-level clusters
Efficient, maintainable, understandable
Few, if any, false matches with correct cataloging
Err on the side of missed matches
Some accommodation of frequent variants
Compare with manually clustered
The Algorithm
A key is generated for each record
Extract author, title
Look up in NACO authority file
Added entry information as needed
Form a key from bibliographic record
Author, title, added entry information
These can be sorted, compared
Problems
Many (17%) records do not have
Author main-entry
Uniform title
In general these can not be matched
Look at added entries
Information at the expression and manifestation levels
Handled separately
180,000 clusters involving ~400,000 records
Top 10 WorldCat Clusters
# Recs    Author/Title Key
8,383     bible\n t
8,055     bible
6,174     bible\authorized
4,033     bible\o t\psalms
3,964     haggadah
3,477     great britain/treaties etc
2,402     bible\o t
2,248     koran
2,153     arabian nights
Top 10 from a Public Library
# Recs  Author/Title Key
89 bible\authorized
85 mother goose
84 chopin, frederic\1810 1849/piano music
81 schulz, charles m/peanuts
63 davis, jim/garfield
61 moore, clement clarke\1779 1863/night before christmas
60 mozart, wolfgang amadeus\1756 1791/instrumental music
58 bach, johann sebastian\1685 1750/cantatas
57 beethoven, ludwig van\1770 1827/sonatas
56 twain, mark\1835 1910/adventures of huckleberry finn
Results
Manual estimate: 1.5 manifestations/work in WorldCat
Algorithm: ~1.3
25,844 clusters have 20 or more records
401,659 clusters have 5 or more records
Preliminary Plans
Build structures for FRBR into new catalog
Expose FRBR clustering for searching
Make visible in cataloging
As consensus on implementation is developed
As cataloging rules accommodate FRBR
Spin-offs
NACO normalization code
Testbed
Server
Authority work
ePrints UK
FRBR in other projects
FictionFinder
NDLTD union catalog
Fiction Subset
2,665,662 WorldCat records
1,758,479 work clusters
1.5 records/cluster
3,866 clusters have 20 or more records
50,540 clusters have 5 or more records
Top 10 clusters for fiction
# Recs Author/Title Key
1,296 defoe, daniel\1661 1731/robinson crusoe
1,248 carroll, lewis\1832 1898/alices adventures in wonderland
   971 cervantes saavedra, miguel de\1547 1616/don quixote
   828 stevenson, robert louis\1850 1894/treasure island
   689 twain, mark\1835 1910/adventures of huckleberry finn
   624 twain, mark\1835 1910/adventures of tom sawyer
   618 swift, jonathan\1667 1745/gullivers travels
   600 andersen, h c\hans christian\1805 1875/tales
   581 stowe, harriet beecher\1811 1896/uncle toms cabin
   570 arabian nights
FictionFinder
Employs work clusters in a prototype system for searching and browsing bibliographic records for fiction
Indexes records at the work level and organizes displays by work and expression (primarily language)
Includes records for textual items; additional modes of expression (moving image, sound) to be added later
395 records for author “crichton, michael\1942” clustered into 17 entries
Typical Results Set Display
Typical Work-level Display
Typical Results Set Display
Typical Work-level Display
Benefits
Aggregated displays for works and expressions
Enhancement of (fiction) records at work level
with elements from records within the work cluster (e.g., summaries, genre terms, subject headings, class numbers)
with external data (e.g., literary prizes, prequels/sequels, reviews)
Challenges
Identifying appropriate bibliographic data for systematically grouping or differentiating works and expressions
Works
Changes in genre (novel v.s graphic novel)
Changes in mode of expressions (audio book v.s radio play)
Degree of modification (abridgement of juvenile work v.s an adaptation for young children)
Expressions
 translators, illustrators, editors
Next Steps
FRBR algorithm
Explore applications
Refine algorithm as needed
FictionFinder
Add records for sound and image
Conduct user studies
Links
Functional Requirements for Bibliographic Records - Final Report
http://www.ifla.org/VII/s13/frbr/frbr.htm
Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR)
http://www.dlib.org/dlib/september02/hickey/09hickey.html
OCLC Research Activities and IFLA's Functional Requirements for Bibliographic Records
http://www.oclc.org/research/projects/frbr/index.shtm
Implementing FRBR on Large Databases
http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm