background image
38 DigiCULT
He added, `one could argue, that virtually all infor-
mation access systems for cultural data are naively
built, at best related to evaluating user satisfaction,
rather than controlling their fitness against real
research scenarios'.
`(b) Underestimation of the general complexity of cultur-
al contents and overestimation of the application specificity':
Doerr observed that `this leads to the political con-
viction that IST should not deal with too applica-
tion specific issues, which in turn leads to a situation
where the commonalities of wider domains are never
understood due to lack of evidence from the specific
case'. For real progress in applications, he suggested
concentrating on cases of high intellectual complexi-
ty from the `soft sciences', `so that research challenges
are not defined by computer science theoreticians but
are founded on reality'.
`(c) Systematic investment in joining different applications
that can benefit from an integrated approach, such as in
food-safety and agriculture, biodiversity and natural
history, archaeology and ethnology, history and soci-
ology, etc.'
`(d) Preparing ground for collaboration on standards across
disciplines and applications.'
`(e) Further, systematic investment in top-ontologies of rela-
tionships. The whole ontology research has so far
widely ignored the role of understanding relation-
ships for information integration. The CIDOC CRM
is a rare example.'
Last but not least, Doerr added, `(f) Lack of truth war-
ranty in integrated systems' (which among other issues
he explained further, see below).
With respect to questions such as: What would repre-
sent major steps or breakthroughs in RTD needed to
achieve the vision? How should these be addressed?,
Doerr considered and outlined the following points
which we are reproducing quoted in full for the most
part:
`1. Specific mapping technology: Domain experts must
have tools to annotate how the structure of their
dedicated databases and legacy systems relate to a
generic ontology, so that automatic and semiauto-
matic data transformation and mediation can be driv-
en. IT experts are unable to comprehend application
semantics and domain experts are unable to define
parsing rules.'
`2. Analysis of real user questions, research situations and
scientific discourse in terms of formal top-ontologies (such
as ISO/CD 21127). Case studies of real research
achievements in terms of KR [Knowledge Repre-
sentation]. Engineering of compact top-level ontol-
ogies with very wide applicability. Harmonisation of
ontologies.
`3. "Open World" query languages and systems: A que-
ry should return: Objects known to fulfil the query,
objects known as not fulfilling the query, objects that
may fulfil the query, but the system has not enough
knowledge to decide. Metadata, database designs and
distributed architectures of databases with schemata
of different levels of detail that make the unknown
explicit to the query system and user.'
`4. Massive investment in data cleaning (duplicate remov-
al) techniques and architectures and algorithms to negoti-
ate and preserve the notion of identity of objects between
semiautonomous sources. Standards, algorithms and stra-
tegic Knowledge organisation systems need to be devel-
oped. Information integration is normally perceived
as a "one shot" action. In reality, it must be a con-
stant element of federated systems. Theory needs to
be developed into how a global index to a federa-
tion of continuously changing resources can converge
to better and better stages of knowledge rather than
degrade over time.'
`5. "Intuitive User Interfaces" that render to the user the
notion of "what the source knows" rather than trying to
answer the user question directly. In normal cultural
databases the search space is almost empty. 99% of
all legal combinations of parameters return an emp-
ty answer. The user rapidly gives up. The source does
not inform sufficiently what it is about, which terms
it understands, where information is rich, if the user
chooses the wrong term or if the source has no data.'
`6. Relate factual and categorical data: Databases are
mostly geared to describing facts. Science is most-
ly about concepts. Manufacturers produce objects but
describe product models.' This point would, as Doerr
explained further, require `coherent storage and que-
rying across instantiation levels' to allow for que-
ries such as... "In which kinds of biotop do monkeys
live?", "Which observations support this?", "What
was the first camera?", "When was this technology
replaced and why?", "Who may have influenced
Einstein on his theory?".' Furthermore, effectively
relating factual and categorical data would require
`coherent theories and ontologies about metaclasses
and meta-relationships, and relations between classes
and metaclasses (...), databases, maintenance algo-
rithms and query systems.'
Finally, representing the top level with which intel-
ligent cultural and scientific heritage would integrate,
or relate to other domains of knowledge and discipli-
nary practices:
`7. Unified or harmonized top ontologies of relationships
(and classes) for historical descriptions of artefacts, geohistory,
palaeontology, archive material and literature reference; IPR
management, planning, social roles, norms and norm viola-
DCTHI7_271104.indd 38
06.12.2004 8:37:41 Uhr