as a member-funded consortium, to which the Association for Computers and the Humanities
(ACH), the Association for Computational Linguistics (ACL), and the Association for Literary
and Linguistic Computing (ALLC) belong. "The Guidelines were delivered in 1994 and have
become the de facto standard for encoding literary and linguistics texts, corpora, and the like."
is stated: "However, the guidelines are based on the assumption that there is a common core of
textual features shared by virtually all texts and virtually all serious work on texts. In brief, the
TEI Guidelines define a general-purpose encoding scheme which makes it possible to encode
different views of text, possibly intended for different applications, serving the majority of schol-
arly purposes of text studies in the humanities. However, no predefined encoding scheme can
serve all research purposes.Therefore, the TEI also provides means of modifying and extending
the encoding scheme defined by the Guidelines."
Even though TEI at first was developed in SGML, an XML version has since become available.
Providing a "fully XML-compliant version of the TEI Guidelines is a significant advance, placing
TEI firmly in the mainstream of current digital library and World Wide Web developments. In sim-
ple terms, the TEI Guidelines define a language for describing how texts are constructed and pro-
pose names for their components. By defining a standard set of names the Guidelines make it
possible for different computer representations of texts to be combined into databases, and they
also provide a common language for scholars wishing to work collaboratively.There are many such
standard vocabularies in the industrial world - in banking, or in chemical modelling, for example.
The TEI's achievement has been to try to do the same thing for textual and linguistic data."
Thus, even though TEI was designed primarily to mark up text, the TEI Guidelines also allow
the description and indexing of images. That is extremely important for the indexing of rare
books, as we usually do not have machine-readable digital texts, but image sequences which are
put on the Web.
Th e T E I D o c u m e n t Ty p e D e f i n i t i o n
If we decide in favor of XML as our mark-up language, then we have to ask the question which
DTD should be used. DTD's describe the logical structure of a certain type of document in for-
mal notation, and are therefore the rules for XML structures. Homogeneity is guaranteed only
with such DTD's which facilitate standardized retrieval and Web-based work. DTD's make possi-
ble the automatic validation of documents, the development of standards, uniform retrieval, and
even the use of decentralized captured data on the Internet.With the TEI DTD, an open standard
has been established for the humanities which permits a whole host of scaling-possibilities.
"The TEI DTD was designed for marking up any existing document in any language in order
to facilitate their analysis. For this reason it has a very loose structure, and uses generic section
dividers to separate out sections of documents."
Its openness regarding the recording of docu-
ment structures is an important feature of the operability of the TEI DTD, and makes it possible
to consider the variety among different documents.Thus, the TEI DTD was extremely useful to
our project, even if TEI does not explicitly take the text form "emblem" into account.
The full TEI DTD is a huge DTD, and the official documentation runs over 1400 pages. How-
ever, it is possible to do useful work with the DTD using a much smaller sub-set. We use a
reduced set of elements and attributes, about which I will go into more detail below.A TEI doc-
ument is composed of a header and a body of text. In simple terms, the header contains all the
meta-information, which belongs to the title, while the resource is encoded in the body. Biblio-
graphical information can be stored both to the electronic document and to the printed source.
The TEI displays its greatest benefit in the indexing of source material with regard to content
below the title-level.The scale extends from the rough mark-up to the edition.
