Virus Checking Documentation for ETDs
This is a placeholder page for instructional documentation on the ways in which ETD programs can apply virus checking services like ClamAV in the context of some standard workflows that lead to an ETD deposit into an institutional repository. The dissemination and assessment of this virus checking micro-service is scheduled to take place in the project in Spring 2014, per the Revised Project Workplan.
ClamAV is an open-source software suite for detecting viruses and other malicious threats in files.
The package includes a code library (for software integrations) and a background daemon capable of monitoring a system, but the simplest use case is to just use the included clamscan command-line tool. It is able to scan archives or individual files, and produces a simple, easy-to-parse output message for each file you pass to it.
- Platforms: Linux/BSD, Windows
- Project Homepage
Use Case for ETDs
Because ETDs and various supplemental files pass through a number of electronic processes and production workspaces prior to their submission and deposit, virus scanning of ETDs is an important step in the overall curation lifecycle process. Infected files can cause damage to the infected file contents and any of the computing systems that are used to store, maintain, or disseminate them. Identifying and removing viruses and other malware from these files can ensure that the ETD repository remains stable and reputable.
ClamAV is the most open and widely used virus scanning tool. It is best used as a standalone utility at or near the point of the initial submission process and where possible should be incorporated into the online system being used to handle author submissions. ClamAV also has an API (LibClamAV) that can assist with adding ClamAV programmatically into an open source ETD submission system.
In many cases, graduate school staff and their dedicated campus IT and technical services are the first line of defense when it comes to preventing infected files from entering an ETD program's deeper workflows. For that reason, this project is prioritizing this group of ETD stakeholders as it prepares instructional documentation for applying ClamAV strategically to ETDs as they enter the approval queue. This instructional documentation will cover scenarios where ClamAV needs to be applied to a batch set of ETD files, as well as to single files.
This project will also provide instructional documentation for how graduate school staff and/or their dedicated IT/technical services can ensure that the outputs and results from ClamAV, as it is applied to ETD files, can best be made available to stakeholders later in the overall curation workflow--namely the library. The library is typically responsible for packaging ETD files and metadata for archival storage and access purposes. In particular, recording ClamAV as an agent and the pass/fail status of a virus check as an event within a preservation metadata schema such as PREMIS will be described. These agent and event elements could be queried through tools such as the PREMIS Event Service.
There are also less automated and metadata centric approaches that can be applied to document that a virus check has occurred and has met with success for an ETD. These approaches will also be documented in this instruction set.
Download and Install
Using the Command Line Application
Here is an example with a single archive:
$ clamscan malware.zip malware.zip: Worm.Mydoom.U FOUND
And an example with multiple files:
$ clamscan /tmp/test/* /tmp/test/removal-tool.exe: Worm.Sober FOUND /tmp/test/md5.o: OK /tmp/test/blob.c: OK /tmp/test/message.c: OK /tmp/test/error.hta: VBS.Inor.D FOUND