Overview of PREMIS Metadata and Recordkeeping for ETDs

Jump to: navigation, search

This is a placeholder workspace for draft and final documents related to this project deliverable.

This is Version 2. Version 1 is here.


About this Document (Summary)

Another issue revealed in the needs assessment process was that most institutions do not have workflows and systems in place to capture the appropriate levels of metadata needed to manage ETDs over their entire lifecyle, often because of a lack of awareness of the issues and ramifications of not maintaining such information. A concise overview of this topic will be prepared to inform stakeholders and decision makers about the critical issues to be aware of in gathering and maintaining preservation metadata for ETDs, not just at the point of ingestion, but subsequently, as ETDs often have transitional events in their lifecyle (embargo releases, redactions, etc.). This overview will be developed in concert with the related metadata tools.

A final working outline for this Guidance document is available to all project steering committee authors for review and comment at Google Docs.

Metadata for ETDS


Digital collection development has moved from being an additional activity to a core service in many libraries. Since late 1990’s, ETDs have been playing significant roles, not just as new forms of scholarly communication, but as drivers for the development of institutional repositories and digital libraries in general. Digital libraries and supporting technologies have now matured to the point where their contents are incorporating complex and dynamic resources and services. The emphasis has now moved to the challenges inherent in ensuring long-term access to digital collections. Most academic institutions have identified digital preservation as a priority that they wish to collectively address. As stated by many digital preservation researchers, preservation activities require the development of standards, best practices,and sustainable funding models to support long-term commitment to digital resources. As so many institutions are still seeking feasible strategies and action plans that lay the foundations for ensuring long term access, we think that it is important to try to take a broader view of digital resource management as well. In this regard, ETDs can and will serve once again, as drivers for the development of digital preservation strategies. This document will provide an overview of metadata based lifecycle management and the succession of recordkeeping strategies for ETDs.

ETDs Specific Metadata Profiles and Practices




Introduction Recognizing the critical role of metadata in any successful digital preservation strategy, the Preservation Metadata Implementation Strategies (PREMIS) have been influential in providing a "core" set of preservation metadata elements that support the digital preservation process. In light of PREMIS requirements, this document will specify the metadata needed for ensuring ETDs long-term access and preservation.

The PREMIS data model consists of five interrelated entities: Intellectual, Object, Event, Agent, and Rights with each semantic unit mapped to one of these areas. Although all of the five entities are interrelated, they can be used and implemented independently from each other. Accordingly, different institutions adopt one or more different PREMIS entities. For example, the University of North Texas (UNT) Libraries have implemented Object, Event, and Agent entities at this point. The UNT implementation experience will be described further in the use cases sections.

  • Intellectual Entity
  • Object Entity
  • Event Entity
  • Agent Entity
  • Rights Entity
    • Rights entity is of particular interest to the ETDs community. For the purpose of the PREMIS Data Dictionary, statements of rights and permissions are taken to be constructs that can be described as the Rights entity.
      • Rights are entitlements allowed to agents by copyright or other intellectual property law.
      • Permissions are powers or privileges granted by agreement between a rights holder and another party or parties.
    • The minimum core rights information that a preservation repository must know, however, is what rights or permissions it has to carry out actions related to objects within the repository. These may be generally granted by copyright law, by statute, or by a license agreement with the rightholder.
    • A number of early PREMIS adopters identified that the Rights entity lacked the robustness required by various types of digital objects such as ETDs. To address such limitations, the PREMIS Editorial Committee has been working on changes and enhancements to the Rights entity semantic units, for example to allow for terms of restrictions (e.g., for embargo periods). Currently there is just "term of grant", and "term of restriction" is being proposed under the same container.

The PREMIS Editorial Committee is also working on a major revision for version 3.0. This will include changes in the data model, where Intellectual Entity will become another level of Object instead of a separate entity and "Environment" will become another entity rather than tied to an Object. This new version will not be available until at least April 2012. The PREMIS Editorial Committee has had several requests to implement these rights changes earlier and so they are proposing to prepare a version 2.2 early in 2012 to include these changes. Among other possible changes that are not compatible with the existing data dictionary is changing the name of licenseIdentifier and its components to licenseDocumentationIdentifier to better convey its use. A revision of the Rights Entity is available on thePIGPEN wiki page. we hope this will be included in the next major revision (PREMIS 3.0), which is expected to be out in April 2012.

One thing to keep in mind is that PREMIS is designed to be a preservation metadata format, which means that ETDs are no different from other digital objects at the archiving level. Different repositories and systems are implementing different workflow and submission models. It should be noted that some of the identified metadata are (or aren't) currently being collected by repositories. Regardless, decisions on how to implement the recommended additions or enhancements will be entirely up to the repositories themselves, depending on various internal and external factors.


Other Institutions' Practices

Metadata Approach to ETDs' Lifecyle Management

Supporting Submission System Requirements

  • Restriction / Embargoes
  • Copyright

Facilitating Access

  • Multiple format Access
  • Integrating parts

Facilitating Preservation Activities

  • Providing documentation

Issues and Considerations

Who creates metadata

  • Authors (students)
  • Graduate School / Graduate Readers
  • ProQuest/Libraries/... (Librarians/ Information professionals)
  • Auto (extraction)
  • Users (post-ingest)

Metadata Quality

  • Local vs. Global
  • Pre vs. Post-injust

Managing metadata as a resource

  • Indexing/abstracting services
  • Metadata Analysis tools

Other Factors/Issues to be considered

Master Metadata Element List for ETDS

  • The profile will serve as goals or bench marks (based on best practices) and include detail:
    • Definition and notes about elements
    • The source or form of content
      • use of authority lists





Authorities and Vocabularies

Digital Preservation

Best Practices and Guidelines Resources

Other Relevant Links

Tools and Software

  • PREMIS in METS Toolbox: http://pim.fcla.edu/
  • DAITSS (Dark Archive in the Sunshine State): http://daitss.fcla.edu/
    • DAITSS is a digital preservation repository application developed by the Florida Center for Library Automation FCLA with some support from the IMLS.
  • TIPR repository exchange Package: wiki.fcla.edu:8000/tiPr


Controlled Vocabularies for ETDs

Personal tools