Talk:Lifecycle Management Tools
This is a workspace for draft and final documents related to this project deliverable.
About the Lifecycle Management Tools
The project has aimed to develop and disseminate a set of software tools to address specific needs in managing ETDs throughout their lifecycle. These tools have been intended to be created as completely modular micro-services, i.e. single function standalone services that that can be used alone or incorporated into larger repository systems. Micro- services for digital curation functions are a growing approach to system integration pioneered by the California Digital Library and the Library of Congress, and subsequently adopted by the University of North Texas, Chronopolis, MetaArchive, and other digital preservation repositories.
The micro-services listed below have been viewed as relatively simple to construct, as they are primarily based on the idea of being able to call other existing open source software tools.
- ETD Format Recognition
- PREMIS Metadata Event Record-keeping
- Virus Checking
- Digital Drop Box with Metadata Submission Functionality
Over the course of the research for this project it was determined that two of the above micro-services (ETD Format Recognition & Virus Checking) needed no additional scripting to integrate with other repository systems in use, and would work best as standalone utilities coupled with some project-produced documentation for implementing on behalf of ETDs. See below for further details.
The PREMIS Metadata Event Record-keeping micro-service is undergoing improvements to its API at UNT and being packaged and documented for deployment as a standalone micro-service.
The Digital Drop Box with Metadata Submission Functionality is undergoing research in the context of two existing open-source applications (Archivematica & DataStage)
All Lifecycle Management Tools are scheduled for dissemination and assessment beginning in Spring 2014 according to the Revised Project Workplan.
Functional Requirements for ETD Micro-Services
Each of the Lifecycle Management Tools aims to serve as a standalone micro-service that can be called via command line or script interfaces in order to ensure that the systems can be easily integrated in existing environments in a modular way. Each micro-service will have clear documentation that will enable implementers to deploy the tool in their own setting. The intent of researching, developing and documenting these four micro-services is that they will catalytically enhance existing repository systems being used for ETDs, which often lack simple mechanisms for these functions.
The micro-service packages produced in the course of this project will include the following tools:
ETD Format Recognition Service
Accurate identification of ETD component format types is an important step in the ingestion process, especially as ETDs become more complex. This micro-service should:
- enable batch identification of ETD files through integration of function calls from tools like JHOVE2, DROID, and other format identification toolkits; and
- structure micro-service output in ad hoc tabular formats for importation into repository systems used for ETDs such as DSpace, and the ETD-db software, as well preservation repository software such as iRODS and DAITSS and preservation network software such as LOCKSS.
The project has successfully researched all of the primary format recognition utilities in current use, including JHOVE/2, DROID, FITS and even the UNIX file command. Based on our research of these tools, a thorough analysis of integrations with existing repository systems, and interviews with numerous and diverse ETD programs, project work has shifted to documenting the proper usage of these tools in an ETD Program context.
PREMIS Metadata Event Record-Keeping Service
One gap highlighted in the needs analysis was the lack of simple PREMIS metadata and event record keeping tools for ETDs. This micro-service needs to:
- generate PREMIS Event semantic units to track a set of transitions in the lifecycle of particular ETDs using parameter calls to the micro-service; and
- provide profile conformance options and documentation on how to use the metadata in different ETD repository systems.
Virus Checking Service
Virus checking is an obvious service needed in ETD programs, as students’ work is often infected unintentionally with computer viruses. This micro-service should:
- provide the capability to check ETD component files using the ClamAV open source email gateway virus checking software;
- record results of scans using the PREMIS metadata event tracking service; and
- be designed such that other anti-virus tools can be called with it.
ClamAV was closely researched for any needed scripting and improvements, along with investigations into its potential for systematic integration with various repository systems. No special scripting is needed and this utility is best documented as a standalone service in conjunction with various ETD workflows prior to depositing in an IR. The pass/fail of this service can be recorded and used by the PREMIS Event Service. There are no other open-source anti-virus services as robust or preferable as ClamAv.
Digital Drop Box (with metadata submission functionality)
This micro-service addresses a frequently sought function to provide a simple capability for users to deposit ETDs into a remote location via a webform that gathers requisite submission information requested by the ETD program. The submission information should:
- generate PREMIS metadata for the ETD files deposited;
- have the capacity to replicate the deposited content securely upon ingest into additional locations by calling other Unix tools such as rsync; and
- record this replication in the PREMIS metadata.
Use cases for a simple submission drop box for ETDs have proven difficult to research in the project due to the large number of deposits that originate with ProQuest's UMI. In the course of our research we have been directed by institutions to spend time with two existing technologies that would fulfill the needs for this service and which have potential for adoption by various ETD programs. These two technologies are:
- Archivematica: https://www.archivematica.org/wiki/Main_Page
The project team will have more to report on our experiments with modeling ETD workflows using these two applications during the Spring/Summer of 2014.