background image
DigiCULT
.
Info
32
long-ter m project, it is important to
ensure that the comments provide a
suitable explanation to the code.
Could a programmer without pr ior
knowledge of the project pick up the
work in five years' time and continue
working on it? A pr imary complica-
tion in digital preservation is lack of
documentation, such as that for for-
mat specifications and ambiguous
code. A preservation tool must not be
allowed to contain or produce similar
difficulties.
T
he XML structure is, very
generally, defined as follows.
A <package> is compr ised of
<meta-data> and <content> fields;
alternatively it can contain further
<package> entities. Zero or more
<meta-data> fields can be included
and entr ies can confor m to many
metadata schemas. A <content> field
contains the pr imary infor mation to
be preserved. In some cases, the
<content> will contain a <base64>
encoded version of the binary file.
This is a content transfer encoding
standard, here used to allow the
object to be stored in a non-binary
for m. In other situations, a nor mali-
sed hierarchical XML file will be sto-
red. For example, a <database> may
contain multiple <tables> each of
which adheres to a certain <dataset>.
A
t the present time XENA norma-
lises eight file formats. It is expec-
ted that over time additional
modules will be added to enable a wider
variety of formats to be handled.
Currently, the NAA expect that 80% of
their records will be contained within
these eight formats. If additional institu-
tions were to incorporate the tool in their
own preservation strategy, the Open
Source development and documentation
methodologies would assist the production
of the required additional normalising and
transformation modules.
C
onverting into institution-speci-
fic formats is of concern, as this
may force a persistent reliance
on XENA. The current implementation
of XENA also relies on OpenOffice
(http://www.openoffice.org/) to normali-
se binary-object documents. Should an
external program be used over which the
developers maintain no control? The
XML-based format and open XML sche-
mas used will hopefully allow the deve-
lopment of alternative software utilities to
interpret them. The cynic in me does not
believe that one software package will be
maintainable indefinitely, and therefore
this approach will be beneficial in the
long term. However, if a software package
will itself become obsolete, the question is
raised: why perform this normalising
work at all? The NAA say that, rather
than providing indefinite access, they aim
to maintain access to the records for
twenty years. This is a goal which I belie-
ve is feasible and allows appropriate initial
solutions to the problem which can be
extended as our understanding of the
issues develop.
X
ENA is implemented in Java,
with comments included
within the code. As with any
T
here are currently few practi-
cal solutions to the problems
of for mat obsolescence faced
by the digital preservation community. As
discussed in Issue 4 of DigiCULT.Info
with reference to the BBC Domesday
project, inaccessible infor mation
represents not only a great waste of
time and money, but the potentially
per manent loss of our cultural her ita-
ge.
14
To help safeguard against this
danger, the National Archives of
Australia (http://www.naa.gov.au/)
have produced XENA (XML
Electronic Normalising of Archives),
15
an
Open Source tool designed to
respond to the specific requirements
of the institution. This approach is
attractive: although research of the
broader topics in this field are requi-
red and under way, smaller, managea-
ble and practicable institutional solu-
tions will not only infor m this rese-
arch but will take steps now to help
ensure the adequate preservation of
our digital her itage.
T
he Open Source development
of the tool allows and assists
a wider development com-
munity base who produce an increa-
sed number of ideas and methods, and
enables a var iety of institutions to
freely develop additional modules for
their specific institutional needs. The
approach taken by the NAA is to
nor malise a record, captur ing its
inherent `essence'. Nor malisation is
the conversion of the source object
from the or iginal for mat into an
XML-based archival for mat.
16
Upon
retr ieval, transfor mation occurs con-
verting the XML into an appropr iate
for mat for viewing. This one-step
procedure prevents the repetitive
cycle of migration.
17
XENA: E
LECTRONIC
N
ORMALISING
T
OOL
A
DAM
R
USBRIDGE
, ERPANET D
IGITAL
P
RESERVATION
T
ECHNICAL
A
NALYST
14See `Overcoming the dangers of technological
obsolescence: Rescuing the BBC Domesday project' and
`Techniques and Challenges of emulating obsolete soft-
ware'. Interview with Paul Wheatley in DigiCULT.Info,
Issue 4, August 2003, http://www.digicult.info/
downloads/digicult_newsletter_issue4_lowres.pdf
15To view project files, visit Project:XENA at
http://sourceforge.net/projects/xena/
16Cf. http://www.naa.gov.au/recordkeeping/
preservation/digital/xml_data_formats.html
17These ideas are discussed in more detail in `An
Approach to the Preservation of Digital Records',
an NAA Green Paper at http://www.naa.gov.au/
recordkeeping/er/digital_preservation/Green_Paper.pdf