background image
The XML Family
of Technologies
Responses received indicate that such a service would be very attractive.To develop a
viable COVAX service, therefore, it is felt that the best solution would be the creation of
a scalable consortium, consisting of representatives from the institutions and organisations
using the service, plus partners delivering and developing technical solutions as well as
local training and administration.This scenario will be even more attractive if, in the
future, it is possible to populate the technological solution produced by COVAX with
more archival content and modules containing, for example, e-learning, inter-institution
loans, and administration information.
COVAX ran for twenty-four months, from January 2000 to December 2001, at a total
cost of
The project Coordinator was Residencia de Estudiantes (Spain),
together with eight partner organisations from five EU member states.
An Archive ­ XML for Document Storage
A public sector archive wishes to store internal documents in a database, allowing
them to be reformatted and printed or displayed in different ways for different audiences.
The archive staff are concerned about the permanence and authenticity of the docu-
ments, and need a cheap and effective means of ensuring both as part of an ongoing
records management strategy.
The increase in internal communications has prompted an urgency for the archive to
hold its documents in a structured fashion. Automatically generated metadata is stored
together with the documents in a database. Since it is created at the same time as the
document to which it corresponds, this metadata is as full and as accurate as can reasonably
be expected.The initial documents are created in Word format and are then converted to
XML, from which they can be converted to other formats, including HTML for Web
delivery.The XML version is maintained as the definitive copy for archival purposes,
although the original Word documents are stored as backup as well.This security measure,
it should be stressed, is by no means a permanent solution, as the Word files must be
converted into the latest version repeatedly as Microsoft updates its product.
Converting the RTF data into XML can be done using a program such as MajiX,
an open-source package developed by TetraSix for converting documents between
Microsoft Word and XML.
If the archive wishes to disseminate its documents externally,
a utility such as XMLMill can be used to convert the stored XML into the non-editable
PDF format.
The documents can therefore be ingested, stored, delivered, and repurposed
from a central data storage space. An XML-compatible format should ensure the data's
longevity as well as any rival solution. Of course, XML is not a complete preservation
strategy in its own right, so the archive staff will still have to cope with such issues as
ensuring the authenticity, security, and integrity of the documents.
Partners were Angewandte Informationstechnik Forschungsgesellschaft mbH and Salzburg
(Austria), Blekinge Tekniska Högskola (Blekinge Institute of Technology, Sweden), Software
AG España, S.A.
, Universitat Oberta de Catalunya and Biblioteca de Menéndez Pelayo (Spain),
(London and South Eastern Library Region, UK), and ENEA (Italian National Agency for New
Technology, Energy and the Environment, Italy).
87 More detailed information on the process can be found at
TWR2004_01_layout#62 14.04.2004 14:07 Uhr Seite 61