background image
32 DigiCULT
collections database were required, using terms such
as Astrolabe and Octant, from different AAT hierar-
chies. Revisiting the issue, FACET's semantic expan-
sion on navigation instruments was able to short-cut
this process. Items from the collection indexed by
sextants, astrolabes, etc. now resulted from a single
query.
Our initial research aims focused on the poten-
tial of thesauri as semantic retrieval tools. Howev-
er, we came to realise that many of the techniques
were equally applicable as tools to assist the indexing
or cataloguing process. The indexer also needs to map
from the terminology employed in an object descrip-
tion to the controlled terminology used by the the-
saurus. The indexer may wish to browse in order to
explore the precise local context of a term in the
thesaurus. The indexer may even wish for automat-
ic suggestions by a thesaurus-based indexing system
of likely terms to use. We reported on initial investi-
gations that suggested some potential for semi-auto-
matic approaches at the 2002 mda conference,
22
in the same session where Helen Ashby reported
on the development of the Railway Object Names
Thesaurus.
23
T
HESAURUS
REPRESENTATION
AND
ACCESS
PROTOCOLS
W
ith the advent of a viable distributed infor-
mation architecture in the form of the Web
comes the prospect of a global online museum. The
use of common standard representations potential-
ly enables searching across multiple collections, effec-
tively blurring the physical boundaries between
institutions.
24
A prerequisite for this is the use of
standardised protocols for query and access to con-
tent. The adoption of common standards has the
benefit of enabling a logical division of effort. KOS
resources, search interfaces, cataloguing/indexing/
mapping tools and indexed collections using com-
mon thesaurus protocols may all be developed by
separate institutions, and may be physically host-
ed in separate locations. This would have the effect
of allowing the institutions to concentrate on their
core area of specialisation, tapping into the domain
knowledge of others as and when appropriate. It also
reduces the potential for duplication of effort and the
duplication of data.
Linda Hill and colleagues have argued for `a gen-
eral KOS service protocol from which protocols for
specific types of KOS can be derived'.
25
The idea
is to provide programmatic access to KOS content
through the various types of (Web) services men-
tioned above, as opposed to thinking only of interac-
tive human interfaces. Thus, in future a combination
of thesaurus and query protocols might permit a the-
saurus to be used with a choice of search tools on
various kinds of database. This includes not only
controlled vocabulary search applications but also
collections without controlled metadata. For exam-
ple, semantic query expansion services could be
used with both free text and controlled vocabulary
indexed collections.
A variety of interchange format specifications for
the representation and dissemination of thesaurus data
have been developed and are in use today. These are
tagged text formats such as the MARC21 Author-
ity format as used by the J. Paul Getty Trust, XML-
based formats such as the ZThes DTD,
26
and RDF
representations such as SWAD-Europe's SKOS-Core
schema.
27
In order to facilitate distributed thesau-
rus access, a platform-neutral access protocol should
be used to manipulate thesaurus data. Protocols for
retrieving thesaurus data are closely linked to the-
saurus representation formats. The CERES,
28
Zthes
and ADL
29
protocols are reviewed in a recent paper,
which also reports on the FACET Web demonstra-
tor.
30
The Simple Knowledge Organisation System
(SKOS) API is a more recent development, which
defines a core set of methods for programmatical-
ly accessing and querying a thesaurus based on the
SWAD-Europe project's SKOS-Core schema.
31
The
NKOS Website
32
and discussion list are good sources
of information on new developments. NKOS Work-
shops at ECDL conferences allow discussion on KOS
data exchange and standards issues. The workshop at
ECDL 2003
33
included reports from the US NISO
and UK BSI Thesaurus Standards Groups, currently
considering revisions to the existing standards.
F
UTURE
ISSUES
FOR
NETWORKING
KOS
IN
DIGITAL
HERITAGE
T
he Web demonstrator was our first step in
exploring issues underlying networked access to
KOS, something we intend to pursue in future work.
Results from FACET show that best-match (ranked
result) approaches can be applied to KOS-based que-
ries via semantic expansion of query terms. The Web
interface also showed that semantic expansion may be
employed as a browsing tool when wishing to hide
some of the complexity of hierarchical structures.
Existing KOS already have rich resources to offer
digital heritage Web applications, notwithstanding
22
Tudhope, D., Binding,
C., Blocks, D. and Cunliffe,
D. 2002: Thesaurus based
indexing and retrieval,
http://www.mda.org.
uk/conference2002/
paper15.htm
23
Ashby, H. 2002: Cocks
and Blowers - decoding
Railspeak, http://
www.mda.org.uk/
conference2002/
paper13.htm
24
For a brief review of
pioneering interoperability
projects, such as Aquarelle
and CIMI: Dawson, D.
1998. Aquarelle to Z39.50:
the A to Z of access to cul-
tural heritage information.
New Review of Hypermedia
and Multimedia, 4, 245-254.
25
Hill, L., Buchel, O. and
Janée, G. 2002: Integration
of Knowledge Organization
Systems into Digital Library
Architectures (Position
Paper for 13th ASIS&T
SIG/CR Workshop,
17 November 2002),
http://alexandria.sdc.ucsb.
edu/~lhill/paper_drafts/
KOSpaper7-2-final.doc
26
Zthes: A Z39.50 Profile
for Thesaurus Navigation,
http://zthes.z3950.org/
profile/current.html
27
SWAD-Europe
Thesaurus Activity, http://
www.w3.org/2001/sw/
Europe/reports/thes/