background image
DigiCULT
.
Info
22
Collective Annotations
Some handwritten archives documents
can be too difficult to recognise fully with
automatic methods. Therefore we propose
to supplement automatic annotations with
manual annotations which are made by
readers during their reading. All the anno-
tations are then put together, making them
available for other readers to improve their
access by content to documents. As the
number of readers is significant, the number
of annotations can grow very fast if a tool
to manage them exists and if the process is
initiated with automatic annotations.
A PLATFORM FOR IMAGE
DOCUMENT ANNOTATIONS
We propose to build a platform for archive
document retrieval that could deal with
textual and geometric annotations at
the same level. We will develop the plat-
form with a classical architecture: a Web
server (Apache) with a servlet contain-
er (TomCat); the Java servlet access to a
relational database (PostGreSQL) to store
annotations and send them to the client: a
Java applet running in a Web browser (see
Figure 2).
U
sing this platform through
a Web browser, a user
can leaf through images of doc-
uments, with all associated
annotations presented on the
interface. As shown in Figure 2,
geometric annotations are drawn
on the image; textual annota-
tions are presented in tabs for
the strutural nodes and in field
boxes (e.g. name or date). The
reader can consult annotations,
add or modify an annotation (if he has the
right) but is limited by the allowed annota-
tion structure set by the configuration file,
according to the kind of document. The
system can also store various interpretations
if readers do not agree. Structured search or
full text search is possible on all the anno-
tations whether they have been produced
automatically or manually.
A
pen-based interface has been defined
on this platform. Using specific ges-
tures and online handwriting recognition,
it has been possible to design a new way to
interact with digital documents and paper
documents (see Figure 3).
EXAMPLES OF ARCHIVES DOCUMENTS
REGISTER OF BIRTHS, MARRIAGES AND DEATHS
Automatic Annotations
Documents like these are difficult to anno-
tate automatically owing to the weak
structure and the poor quality of the hand-
written text. The documents are scanned in
double pages and we defined a grammar in
EPF describing the page. With the DMOS
method we have been able to produce a
recognition system that detects the posi-
tion of each page and produces automatic
annotations. A test was performed on over
4000 double pages: 99.3% were correct-
ly detected. With these page annotations, a
reader can leaf through a register page by
page with an automatic zoom function for
ease of use.
Collective Annotations
Collective annotations can be added, for
example, the type of certificate (birth, mar-
riage, death), and details such as the name
and surname of the child, place of birth,
and the name of the mother. The posi-
tion of the certificate can be defined by the
reader or the automatic annotation of page.
There is no obligation for a reader to fill in
all the fields.
Applications
A demonstration of the platform on civil
status registers is available online at http://
imadoc-ar.irisa.fr/index.html. It shows
materials from two districts with automat-
ic annotations on the position of pages and
Figure 2: One interface to consult, annotate, retrieve Archives documents.
Figure 3: A pen-based interface on the platform for annotations

IMADOC
,

2004

IMADOC
,

2004