Content Modeling in Fedora 4:
Newspaper Case Study
Joshua Westgard
September 21, 2016
I. Repository Status
- Standalone Fcrepo
- WebAC with CAS integration
- Went into production in July
- First major collection will be Diamondback
- Blacklight/Mirador
II. Diamondback Student Newspaper
- 1920s-1970s
- Digitized to NDNP standards from microfilm
- ALTO XML with article segmentation
- Likely the first of several newspapers
III. Content Modeling
- PCDM seems sufficient
- Collection -- Issue -- Page -- File
- Reel can be treated as another item, with pages objects as frames
III. Content Modeling
- PCDM seem sufficient
- Collection -- Issue -- Page -- File
- Reel can be treated as another item, with pages objects as frames
- Article segmentation remains more of an open question
IV. Batch Loader
- Loader script for config and argument parsing
- Python classes for PCDM
- Data handler module for project-specific logic
- Data handler passes back an item-centric resource set
IV. Batch Loader
- Rdflib module for metadata and content modeling
- Requests for HTTP interaction
- Primarily interacting by POST and PATCH
- CSV module and lxml for data handling
Thank you!
Joshua Westgard (westgard@umd.edu)
/