background image
98
several, often closely related, sources of information jointly. Our libraries are already available on
the Internet by using separate interfaces.
In order to augment their potential usefulness, it is most desirable to have a system for the users
to access, through a unique interface, all available corpora.While there has been much effort in
the field of database federation,
4
there are also works especially focused on the federation of doc-
ument databases.
5
However, these proposals lack some aspects that we consider crucial. As they
federate only databases, they do not consider the user interface, while it is important for us to
provide a unique access point on the Web. Specific aspects of document databases, such as the
exploitation of available text retrieval techniques for each database, are also usually ignored.
6
We have built a federated system that considers all these aspects, focusing specifically on fed-
erating digital libraries, maintaining their text retrieval capabilities, and also assigning special
importance to the user interface to access them. Our system integrates our two digital libraries
(and their underlying document databases), plus a new one containing significant non-Spanish
Emblem Books that were translated into Spanish in this period because of their importance.These
translated emblem books were very popular at the time, and their study is interesting, because
they inspired other Spanish authors to write their own emblem books.This system is already func-
tional, but it will be available through Internet only when the database of Translated Emblem
Books is completed.
The federated system offers a unique access point to browse through the documents and per-
form searches.This interface has been developed following guidelines that will make it not only
intuitive and easy to use, but also powerful and flexible enough to respond to queries across any
(or all) databases in the system.The federated system was initially built ad hoc for these databases;
however, the design of its architecture allows us to extrapolate the ideas we used to build it, so it
can be used to federate any set of document databases.These ideas are based on the requirements
that any federated system must meet - scalability, easy adaptation to changes and user-friendliness
- and others that we consider specific for the federation of digital libraries in particular.These
requirements are the following:
· Scalability: increasing the number of databases in the federation should not decrease the
overall system efficiency.
· Easy adaptation to changes: adding, dropping, or modifying databases must be easy and quasi-
automatic.
· User-friendliness.
· Preservation of the text tetrieval capabilities: as noted above, we strongly believe that the federa-
tion must exploit any text retrieval technique which already exists in the databases.
· Identification of the sources: it is important to let the users know the source of each docu-
ment, thereby preserving the considerable effort dedicated to the design of each individual
Digital Library.Therefore, although the search is performed through a unique entry point,
the answer leads to the interface of the Digital Library where the documents were found.
· Preservation of duplicates: although a work, or even a copy, was found in two different digital
libraries, it is essential to be able to retrieve them from both Digital Libraries, because they
may have very different states of preservation, ex libris, etc.
In order to build a system that meets these requirements, we based our work on the use of a
layered structure and the use of XML trees to perform the schema conciliation of the databases
and to help build the user interface.
The rest of this paper is organized as follows.The next section describes the three databases
included in our federation, followed by a description of the XML trees and how we use them to
integrate the different databases and to build the user interface; an overview of the architecture
4
See, for example, S. Busse, R.-D.
Kutsche, and U. Leser,"Strategies for
the Conceptual Design of Federated
Information Systems," Engineering
Federated Information Systems, ed. M.
Roantree,W. Hasselbring, and S.
Conrad. Proceedings of the 3rd
Workshop EFIS 2000, Amsterdam/
Berlin: Akademische Verlagsge-
sellschaft, 2000, 23-32; J. Samos, A.
Abelló, M. Oliva, E. Rodríguez, F.
Saltor, J. Sistac, F. Araque, C. Desgado,
E. Garví, and E. Ruiz,"Sistema
Cooperativo para la Integración de
fuentes Heterogéneas de Informa-
ción y Almacenes de Datos," Novatica
142 (1999): 42-49; and S. Chawathe,
H. Garcia-Molina, J. Hammer, K. Ire-
land,Y. Papakonstantinou, J. Ullman,
and J.Widom,"The TSIMMIS Pro-
ject: Integration of Heterogeneous
Information Sources," Proceedings of
the Tenth Conference of the Information
Processing Society of Japan,Tokyo 1994,
7-18. Also available at http://www.
db.ucsd.edu/publications/tsimmis-
overview.pdf or http://www-db.
stanford.edu/tsimmis/publications.
html.
5
M. A. Gonçalves, R. K. France, E.
A. Fox, and T.E. Doszkocs,"MARI-
AN Searching and Querying across
Heterogeneous Federated Digital
Libraries," DELOS Workshop: Infor-
mation Seeking, Searching and Querying
in Digital Libraries, 2000. Full text
available at http://www.ercim.org/
publication/ws-proceedings/
Del Noe01/11_Fox.pdf.
6
See Ricardo Baeza-Yates, Modern
Information Retrieval (New York:
ACM Press; Harlow, England: Addi-
son-Wesley, 1999) and also N.R.
Brisaboa, M.R. Penabad, A.S. Places,
and F. J. Rodríguez,"Problems and
Solutions to Federate Digital
Libraries," Poster at 5th European
Conference on Research and
Advanced Technology for Digital
Libraries (ECDL 2001). Darmstadt,
Germany. Conference report,
http://www.dlib.org/dlib/
october01/cunningham/
10cunningham.html.
DC_Emblemsbook_180204 19.02.2004 11:26 Uhr Seite 98