background image
8 DigiCULT
es, and Miller makes it clear that the related prob-
lems `are bigger than any one organisation or sector,
and that the best solutions will be collaborative and
cross-cutting; that they will be common and shared'.
A major step in addressing the challenges is initia-
tives such as the Common Information Environment
(CIE) that serve as an umbrella under which a grow-
ing number of organisations are working towards a
shared understanding and shared solutions.
oday, heritage institutions invest much effort
in providing online access to their descriptive
metadata on collection objects and, increasingly, to
digital surrogates of these objects. To support resource
discovery, they need to expose metadata about their
resources, so that it can be used by other applications
and services, such as harvesters, distributed search
mechanisms or alerting services. In order to facili-
tate exchange and interoperability between services,
the institutions will need to provide item-level meta-
data conforming to a widely shared standard such as
Dublin Core
or, if the target is to serve learning
and teaching communities, Learning Object Meta-
How rich is the `deep Web'?
Although search engines may boast about their
ability to index the Web, their searching spiders
(robot programs that crawl the Web) can har-
vest only a small amount of information from the
`deep Web', which is the much larger portion of
the Web. The search company Bright Planet states
that this is `approximately 500 times bigger than
the surface Web'. A study conducted by Bright
Planet in 2000 also found that the `deep Web'
contained nearly 550 billion individual docu-
ments compared with the one billion documents
of the surface Web; that there existed more than
200,000 deep Web sites; that ninety-five per cent
of the deep Web is publicly accessible information
(i.e. not subject to fees or subscriptions), and that
it represents the largest growing category of new
information on the Internet.
The main reason
why this information remains hidden from sear-
ching spiders is that accessing it requires typing
something and/or scanning a page and selecting a
combination of options. And, well, the searching
spiders not only lack fingers for typing, but also a
brain capable of judgement.
This includes awareness of the sheer fact that these
resources exist, how rich they are, and that most-
ly they can be accessed free of charge. If the institu-
tions do not succeed in this, to most people outside
the heritage communities these resources might as
well not be there.
Many people are unaware that most of the rich and
authoritative information accessible over the Internet
is invisible to general-use search engines like Google.
This information resides on the `hidden' Web, which
is largely comprised of content-rich databases from
universities, associations, businesses, government agen-
cies and memory institutions. As a matter of fact, the
information in these often publicly funded databases
is not being used to its full potential; in particular, it
could be opened up more effectively to non-special-
ist information seekers.
Paul Miller, Director of the UK Common Infor-
mation Environment initiative, notes that Google
and other powerful Internet search engines manage
to retrieve only very little from the richly structured
information in heritage databases, and asks: `How
can the creators and curators of high value, author-
itative, information ensure that their work is found
and used? More than that, though: how can we work
to build associations and relationships between these
aggregations of world-class content such that the
user with an interest in "The Titanic", say, can move
in a structured fashion from photographs and docu-
ments in The National Archives to objects in muse-
ums, to learning resources from Curriculum Online,
to multiple media context from the BBC? How do
we achieve all of this in a manner that is adaptable
to the varying needs of our users, whether they are
school children, undergraduates, teachers, research-
ers, or members
of the gener-
al public with an
ongoing thirst for
knowledge? And
how do we max-
imise the value of
the current and
ongoing invest-
ment in creation,
curation and
delivery of this
information and
the range of serv-
ices built around
These are
many questions
and challeng-
Paul Miller: Towards the
Digital Aquifer: introducing
the Common Information
Environment In: Ariadne,
Issue 39, April 2004, http://
one working prototype
designed to demonstrate
the CIE concept to
users, content providers
and policy makers is My
Historic Environment
described by Kate Fernie et
al. at
Michael K. Bergman:
The Deep Web: Surfacing
Hidden Value (August
2001). In: The Journal of
Electronic Publishing, Vol
7, Issue 1, http://www.
01/bergman.html; also
accessible at: http://www.
Cf. the University
of California's tutorial
`Finding Information on
the Internet', which gives
a detailed explanation
of why some resources
remain invisible to spiders:
Organisations cur-
rently working to shape
the CIE include Becta, The
British Library, Culture
Online, the Department
for Education & Skills'
(DfES) e-Learning Strategy
Unit, the e-Science Core
Programme, JISC, the
Museums, Libraries and
Archives Council (MLA,
formerly Resource), The
National Archives, the
NHS' National Electronic
Library for Health (NeLH),
the Office of the e-Envoy
and UKOLN.
Dublin Core Metadata
Element Set, http://dublin