background image
DigiCULT
.
Info
39
ble to all metadata as well as to preview-
quality images. Access to higher resolution
images requires a licence. For digitised
maps, the MrSID technology will be
implemented by the end of 2003.
T
he periodicals (and other types of
documents in the future) will be
made available in the Kramerius applica-
tion, which is currently being developed
and tested by the Czech Qbizm Corpor-
ation (http://www.qbizm.com/). Activities
include migration of metadata as well as
the optimisation of images for access. In
this case, the access image format is DjVu
(http://www.djvuzone.org/), into which
all the images are being converted.The
application is expected to be launched for
users by the end of 2003. All the copy-
right-free titles (from the eighteenth cen-
tury up to 1880) will be accessible free of
charge. More recent titles will be accessible
on site, where the originals are stored.
T
hese two access systems will enable
more concentration on digital
archival storage that will become physically
independent.The archival system applied
today is the Czech AIP Safe document
system, product of AIP Safe Ltd
(http://www.aipsafe.com/).
O
ur digitisation programmes have
grown on the basis of substantial
research and development activities.They
concern both metadata and data areas
including large tests of new emerging
solutions. Our long-term goal is to create
a virtual research environment based on
historical documents. For this, other types
of documents and data objects will be
added as well as TEI-processed texts of
selected manuscripts.
U
p-to-date information about Czech
digitisation activities is available from
the http://digit.nkp.cz server in English
and Czech.
(http://digit.nkp.cz) for manuscripts and
old printed books, periodicals and museum
objects.The DTD for digitised mono-
graphs is becoming available and the DTD
for sound documents is in preparation.
T
he digitised documents are archived
on compact discs (Memoria pro-
gramme) and in a robotic library on mag-
netic tapes (Kramerius programme), and
fast access services to all of them will soon
be available via special applications.
M
anuscripts and similar materials will
be accessed through the Memoria
database (http://www.manuscriptorium.
com).We intend that the database will
become a shared catalogue for historical
collections.The bibliographic description
of its records should be in MASTER for-
mat, while from the beginning of 2004
UNIMARC records will also be accepted,
as a tool to support conversion between
these two formats will have been devel-
oped. Interest in uploading records into
this database is international: the first
international contract was signed with
the University of Wroclaw Library, Poland.
Other institutions in Poland, Slovakia and
other countries have also indicated a wish
to join.The database software is based on
the reliable Czech National Bibliography
Tornado application, developed and main-
tained by AIP Beroun Ltd.
43
Tornado has
been extended to operate on MASTER
records and to handle the complex meta-
data structures of the complete digitised
documents. Furthermore, additional tools
for work with images have been added.
The access images will be optimised JPEGs
in preview, low and normal quality, and 1-
bit (black and white) GIF format.The nor-
mal quality JPEG is carefully watermarked
with visible letters on the image back-
ground. JPEG 2000 (JP2) was considered
for high-quality access but, owing to too
visible background structure cleaning at
certain compression ratios, its application
was discontinued. Free access will be possi-
T
he current average annual production
is roughly 100,000 pages of manu-
scripts and 400,000 pages of periodicals.
The data format used for storage is now
exclusively JPEG with varying compres-
sion qualities 256 shades of grey in the
case of periodicals. Since routine produc-
tion began, a lot of attention was paid to
the metadata container comprising the
whole digitised document.The lack of
available good practice at that time led to
the creation of a proprietary SGML-based
standard, called DOBM, in 1996.The for-
mat was in fact an enlargement of the
HTML DTD with elements enabling the
description of contents of objects, a rudi-
mentary solution, now better expressed by
XML.The DOBM files were HTML files
containing formatting in parallel with
internal mark-up of rich content elements.
The format was very flexible and enabled
the needed mark-up on all structural levels
of digitised documents including pages,
annual volumes, periodical numbers and
even articles.
N
owadays, new XML-based formats
are available.The move to XML was
fostered not so much on the availability
and universality of this platform, but rather
on changes in content description rules
since 1996, especially the appearance of the
MASTER DTD (http://www.tei-
c.org.uk/Master/Reference/) for biblio-
graphic description of manuscripts (in the
definition of which we also participated).
N
ew XML Document Type
Definitions have been prepared for
manuscripts, as well as other types of docu-
ments.Today, metadata migration into new
XML containers is coming to an end for
manuscripts and is in full swing for digi-
tised periodicals.We used the redefinition
of the metadata containers to facilitate a
large qualitative improvement of metadata
structures on all levels of digitised docu-
ments.The new XML DTDs are available
from our digitisation server
43 User Manual can be found at http://www.aipberoun.cz/docs/CNB/manual_en.pdf.