background image
DigiCULT
.
Info
27
FUTURE DIRECTIONS
T
he selective approach to archiving is
very labour-intensive.While PAN-
DAS has enabled us to realise some effi-
ciencies and to increase productivity, it still
requires manual entry of metadata and
processing of titles one by one.
I
n order to be able to deal with the
increasing volume of online publishing,
particularly in the government sector, the
Library is undertaking a pilot project to
explore the possibility of automating or
semi-automating the processes of identify-
ing, selecting, describing and archiving
Commonwealth government publications.
To automate identification, selection and
description, it is working with seven gov-
ernment agencies to examine the metadata
they produce and to refine workflows for
making it available to the National Library
for inclusion in the National Bibliographic
Database (NBD).
34
The aim is then to dis-
til from this a small number of best prac-
tice models that all government agencies
could use to contribute metadata to the
NBD. In a second phase of the pilot proj-
ect, metadata would be extracted from the
NBD and batch loaded into PANDAS for
archiving. Before this can be done, PAN-
DAS will need to be further developed and
this will be undertaken as part of the
development of version 3.
T
he main driving forces for the future
development of PANDAS are:
the need to be able to ingest a much
larger volume of online publications and
their associated metadata, automatically
or semi-automatically;
the need to improve discovery of and
access to items in the burgeoning
Archive in a variety of ways;
the need for PANDORA to be able to
interface with data acquired other than
by the current selective harvesting
method.This may include large amounts
of publisher-supplied data, partial or
whole domain harvests, and publications
Javascript not working correctly, missing
files, or broken links due to coding errors
or case sensitivity.
A
fter a site has been gathered and
checked for completeness, it is pre-
pared for public access in the Archive. A
script is used to disable external links,
forms and mailto addresses automatically.
When users click on one of these links in
the archived copy, a screen will appear giv-
ing them the option to leave the archived
version of the site and access the link to
the live site (if the latter still exists).
A
public `title entry page' is generated
for new titles, or is updated when a
new instance of a title is gathered.The title
entry page indicates whether the title is
being gathered regularly, provides a link to
the publisher's live site if it still exists, lists
all the archived instances of the title, pro-
vides information about software required
to view it, and indicates any access restric-
tions. It also provides a link to the publish-
er's copyright statement if there is one, and
to the National Library's own generic
copyright warning. Subject and alphabetic
title listings are automatically updated
when new titles are added to the Archive.
T
he PANDAS software manages access
restrictions on the titles that require
them. Restrictions may be applied for
commercial reasons, for privacy or cultural
reasons, or as part of a policy decision for
certain categories of material. Items can be
restricted for a specified time period to use
within the Library building only, or can be
password protected so that designated users
can obtain access. PANDAS manages these
restrictions and allows only those
researchers in designated locations (based
on IP address) or with the required pass-
word to access the archived versions. As
time periods for restrictions expire, the sys-
tem automatically updates the title entry
pages which indicate the changed access
conditions.
Library's server.The software runs on a
standard PC and with Internet Explorer
5.5 or above.
T
he collection managers use PANDAS
to store descriptive, administrative and
preservation metadata about the Web site
being archived, such as:
descriptive information (title, URL,
persistent identifier, publisher details);
permission status (whether permission
to archive was granted or denied by the
publisher);
type of resource;
status (selected, rejected, still being
monitored);
subjects (selected from a list of broad
categories);
collection name (if the title is being
included as part of a collection of sites
on a particular event or topic, such as
an election campaign);
restriction information (e.g. access
in-house only for a specified period);
gathering schedule.
T
he gathering schedule can specify that
a site should be archived once only,
or it can specify a gathering frequency
(monthly, quarterly, annually, etc). It is also
possible to enter a date list so that an item
is gathered on specific dates.
T
he PANDAS software allows a variety
of Website copying software to be
`plugged-in' and allows the collection
manager to control the scope and timing
of the gathering process. As part of the
gathering process, technical metadata about
the file types captured is recorded by the
system.This information is used to gener-
ate automatically public notes listing any
plug-ins required for viewing the site.
P
ANDAS currently does not have any
automatic quality assurance feature
such as a link checking feature. Collection
managers must check the whole site man-
ually to identify problems, such as
34 The National Bibliographic Database is a database to which approximately 1,100 Australian libraries contribute catalogue
records and library holdings and it is made available through the Kinetica service (http://www.nla.gov.au/kinetica/).