Metadata for ETDs v1
This is Version 1 of this Guidance Document. Version 2 is here.
About this Document (Summary)
Another issue revealed in the needs assessment process was that most institutions do not have workflows and systems in place to capture the appropriate levels of metadata needed to manage ETDs over their entire lifecyle, often because of a lack of awareness of the issues and ramifications of not maintaining such information. A concise overview of this topic will be prepared to inform stakeholders and decision makers about the critical issues to be aware of in gathering and maintaining preservation metadata for ETDs, not just at the point of ingestion, but subsequently, as ETDs often have transitional events in their lifecyle (embargo releases, redactions, etc.). This overview will be developed in concert with the related metadata tools.
Metadata for ETDS
Digital collection development has moved from being an additional activity to a core service in many libraries. Since late 1990’s, ETDs have been playing significant roles, not just as new forms of scholarly communication, but as drivers for the development of institutional repositories and digital libraries in general. Digital libraries and supporting technologies have now matured to the point where their contents are incorporating complex and dynamic resources and services. The emphasis has now moved to the challenges inherent in ensuring long-term access to digital collections. Most academic institutions have identified digital preservation as a priority that they wish to collectively address. As stated by many digital preservation researchers, preservation activities require the development of standards, best practices,and sustainable funding models to support long-term commitment to digital resources. This document will provide an overview of PREMIS Metadata and Lifecycle Event Recordkeeping for ETDs. As so many institutions are still seeking feasible strategies and action plans that lay the foundations for ensuring long term access, we think that it is important to try to take a broader view of digital resource management as well. In this regard, ETDs can and will serve once again, as drivers for the development of digital preservation strategies.
Metadata Schemes for ETDs
ETDs Specific Metadata Elements
PREMIS and ETDs
Recognizing the critical role of metadata in any successful digital preservation strategy, the Preservation Metadata Implementation Strategies (PREMIS) have been influential in providing a "core" set of preservation metadata elements that support the digital preservation process. In light of PREMIS requirements, this document will specify the metadata needed for ensuring ETDs long-term access and preservation.
The PREMIS data model consists of five interrelated entities: Intellectual, Object, Event, Agent, and Rights with each semantic unit mapped to one of these areas. Although all of the five entities are interrelated, they can be used and implemented independently from each other. Accordingly, different institutions adopt one or more different PREMIS entities. For example, the University of North Texas (UNT) Libraries have implemented Object, Event, and Agent entities at this point. The UNT implementation experience will be described further in the use cases sections.
Rights entity is of particular interest to the ETDs community. For the purpose of the PREMIS Data Dictionary, statements of rights and permissions are taken to be constructs that can be described as the Rights entity.
- Rights are entitlements allowed to agents by copyright or other intellectual property law.
- Permissions are powers or privileges granted by agreement between a rights holder and another party or parties.
The minimum core rights information that a preservation repository must know, however, is what rights or permissions it has to carry out actions related to objects within the repository. These may be generally granted by copyright law, by statute, or by a license agreement with the rightholder.
The PREMIS Editorial Committee has been working on changes for Rights. A number of institutions (including University of North Texas, Harvard University, and other early PREMIS adopters) identified that the Rights entity lacked the robustness required by various types of digital objects such as ETDs. To address such limitations, the PREMIS Editorial Committee has been working on changes and enhancements to the Rights entity semantic units, for example to allow for terms of restrictions (e.g., for embargo periods). Currently there is just "term of grant", and "term of restriction" is being proposed under the same container.
The PREMIS Editorial Committee is also working on a major revision for version 3.0. This will include changes in the data model, where Intellectual Entity will become another level of Object instead of a separate entity and "Environment" will become another entity rather than tied to an Object. This new version will not be available until at least April 2012. The PREMIS Editorial Committee has had several requests to implement these rights changes earlier and so they are proposing to prepare a version 2.2 early in 2012 to include these changes. Among other possible changes that are not compatible with the existing data dictionary is changing the name of licenseIdentifier and its components to licenseDocumentationIdentifier to better convey its use. A revision of the Rights Entity is available on thePIGPEN wiki page. we hope this will be included in the next major revision 3.0, which is expected to be out in April 2012.
Issues and Considerations
One thing to keep in mind is that PREMIS is designed to be a preservation metadata format, which means that ETDs are no different from other digital objects at the archiving level. Different repositories and systems are implementing different workflow and submission models. It should be noted that some of the identified metadata are (or aren't) currently being collected by repositories. Regardless, decisions on how to implement the recommended additions or enhancements will be entirely up to the repositories themselves, depending on various internal and external factors.
Other Institutions' Experiences
Metadata Approach to ETDs' Lifecyle Management
Why metadata Quality?
Factors affecting metadata quality
Various (local, collaboraors, system,...) requirements
Resource (human, financial, time,...) availablity
Metadata qualities assurances Mechanisms
Metadata Analysis tools
Standards and Metadata
- PREMIS Preservation Metadata Maintenance Activity(Library of Congress): http://www.loc.gov/standards/premis/
- PREMIS Tutorials and Implementation Presentations materials: http://www.loc.gov/standards/premis/tutorials.html
- PREMIS OWL ontology (wiki page): http://premisontologypublic.pbworks.com/w/page/45987067/FrontPage
- Hastings, Samantha and Alemneh, Daniel (2010). Exploration of Adoption of Preservation Metadata in Cultural Heritage Institutions. Proceedings of the 73rd American Society for Information Science & Technology (ASIS&T) Annual Meeting, Pittsburgh, Pennsylvania: http://digital.library.unt.edu/ark:/67531/metadc29321
- "Implementing PREMIS: A Case Study of the Florida Digital Archive" by Devan Ray Donaldson and Paul Conway, in Library Hi-Tech, 2010: http://www.emeraldinsight.com/journals.htm?issn=0737-8831&volume=28&issue=2&articleid=1864754&show=pdf
- "Conformant Implementation of the PREMIS Data Dictionary By PREMIS Editorial Committee October 2010: http://www.loc.gov/standards/premis/premis-conformance-oct2010.pdf
- "A Checklist for Documenting PREMIS-METS Decisions in a METS Profile" by Sally Vermaaten, May 2010: http://www.loc.gov/standards/premis/premis_mets_checklist.pdf
- Alemneh, Daniel (2009). An Examination of the Adoption of Preservation Metadata in Cultural Heritage Institutions: An Exploratory Study Using Diffusion of Innovations Theory. Doctoral Dissertation: http://digital.library.unt.edu/ark:/67531/metadc9937/
- PREMIS Editorial Committee. (2008). PREMIS Data Dictionary for Preservation Metadata, Version 2.0. http://www.loc.gov/standards/premis/v2/premis-2-0.pdf
- Caplan, P. & Guenther, R. (2005). Practical preservation: The PREMIS experience. Library Trends, 54, (1), 111-124. http://www.loc.gov/standards/premis/caplan_guenther-librarytrends.pdf
- "Digital Preservation Metadata Standards" by Angela Dappert and Markus Enders, in Information Standards Quarterly, Spring 2010 http://www.loc.gov/standards/premis/FE_Dappert_Enders_MetadataStds_isqv22no2.pdf
Dubkin Core: The Road from Metadata to Linked Date. NISO/DCMI Webinar, August 25, 2010: http://www.niso.org/apps/group_public/download.php/4766/nisodcmi_aug10web.pdf
- Enders, Markus(2010). A METS based information package for long term preservation of web archives: http://www.ifs.tuwien.ac.at/dp/ipres2010/papers/enders-70.pdf
- Toshiyuki G. (2007). “Preservation metadata: current situation and future perspectives.”Joho Kanri 50(2), 74-86 : http://www.jstage.jst.go.jp/article/johokanri/50/2/50_74/_article
- NISO. Understanding Metadata. National Information Standards Organization, 2004. Online: http://www.niso.org/publications/press/UnderstandingMetadata.pdf
- Alemneh, D., Hastings, S., Hartman, C. (2002). “A metadata approach to preservation of digital resources: the University of North Texas Libraries’ experience.” First Monday. 7(8), 25p. : http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/981/902
Authorities and Vocabularies
- Preservation Events: http://id.loc.gov/vocabulary/preservationEvents.html
- Preservation Level Role: http://id.loc.gov/vocabulary/preservationLevelRole.html
- Cryptographic Hash Functions: http://id.loc.gov/vocabulary/cryptographicHashFunctions.html
- Phillips, Mark and Alemneh, Daniel (2011) Data Desiccation: Facilitating Long-Term Access, Use, and Reuse of ETDs. Proceedings of the 14th International Symposium on Electronic Theses and Dissertations, 2011, Cape Town, South Africa: http://digital.library.unt.edu/ark:/67531/metadc67625/
- von Suchodoletz, Dirk; Rechert, Klaus; and Welte, Randolph (2011) Automation of Flexible Migration Workflows. In the International Journal of Digital Curation, Vol. 6(1): http://www.ijdc.net/index.php/ijdc/article/viewFile/172/240
- von Suchodoletz, Dirk (2010)A Future Emulation and Automation Research Agenda: http://drops.dagstuhl.de/opus/volltexte/2010/2771/pdf/10291.vonSuchodoletzDirk.Paper.2771.pdf
- Schultz, Matt and Skinner, Katherine (2010) A Guide to Distributed Digital Preservation: http://www.metaarchive.org/sites/default/files/GDDP_Educopia.pdf
- Rumsey, S., O’Steen, B. (2008). OAI-ORE, PRESERV2 and Digital Preservation. Ariadne, 57, 8p. : http://www.ariadne.ac.uk/issue57/rumsey-osteen/
Standards and Best Practices Resources
- COUNTER (Counting Online Usage of Networked Electronic Resources): http://www.projectcounter.org/about.html
- NISO (2012). Making Good on the Promise of ERM: A Standards and Best Practices Discussion Paper: http://www.niso.org/apps/group_public/download.php/7946/Making_Good_on_the_Promise_of_ERM.pdf
- NISO Framework Working Group. (2007, December). A framework of guidance for building good digital collections, 3rd ed. Retrieved January 26, 2012 from http://www.niso.org/publications/rp/framework3.pdf
- ONIX for Licensing Terms (OLT): http://www.editeur.org/85/Overview/
- OLT is family of messaging formats for communication of rights information. It is built on a consistent underlying model of rights and usages.
- Shibboleth: http://shibboleth.internet2.edu/
- The Shibboleth System is a standards based, open source software package for web single sign-on across or within organizational boundaries. It allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner.
- WorldCat Registry: http://www.worldcat.org/registry/Institutions
- The WorldCat Registry is a Web-based directory for libraries and library consortia. It is an authoritative single source for information that defines institutional identity, services, relationships, contacts and other key data often shared with third parties
Preservation Related Web Sites
- Digital Library Federation: http://www.diglib.org/
- Internet Archive: http://www.archive.org/index.php
- International Internet Preservation Consortium: http://netpreserve.org/about/index.php
- National Archives of Canada: Preservation Activities: http://www.collectionscanada.gc.ca/preservation/index-e.html
- National Digital Information Infrastructure Preservation Program (NDIIPP): http://www.digitalpreservation.gov/
- The National Digital Stewardship Alliance (NDSA): http://www.digitalpreservation.gov/ndsa/
- National Library of Australia: PANDORA: http://pandora.nla.gov.au/
- National Library of Sweden: Kulturarw3 – The Swedish Web Archive: http://www.kb.se/english/find/internet/websites/
Tools and Software
- PREMIS in METS Toolbox: http://pim.fcla.edu/
- DAITSS (Dark Archive in the Sunshine State): http://daitss.fcla.edu/
- DAITSS is a digital preservation repository application developed by the Florida Center for Library Automation FCLA with some support from the IMLS.
Metadata Elements for ETDS
ETDs Metadata Crosswalks (UNT,ND-LTD, TDL,...)
Controlled Vocabularies for ETDs
PREMIS and ETDs
Timeline of Major PREMIS Development Activities
The Portal to Texas History in PREMIS Implementation Registry: Sample Document
There are currently 45 projects in the PREMIS Implementation Registry