background image
Mayenne). Our platform for searching hand-
written last names on these military forms
will be available in the reading room at the
Archives départementales des Yvelines as well as
on the Internet in 2004 (when ready, a link
will be placed on
Initially 28,000 pages will be open to the
public, which will be increased to 250,000
pages during 2004.
Automatic Annotations
These documents are from the end of the
nineteenth and the beginning of the twen-
tieth centuries. Documents can be fully
hand- or type-written and are organised in
two columns with a paragraph concern-
ing one person, whose name is usually the
first name in the paragraph. Retrieving a
decree for one person is very tedious: the
reader needs to leaf through all pages of all
ompared with the military forms, the
structure is very weak ­ merely para-
graphs of handwritten text. Owing to the
generic nature of the DMOS method, we
have been able to define an EPF gram-
mar describing the organisation of decrees
in handwritten text, in paragraphs and col-
umns, using only the connected compo-
nents detected in the image. From this
description, by compilation, a recognition
system has been produced which is able to
detect the position of the name and the file
number. We tested 1126 images and 99.43%
of positions were correctly detected. These
positions are the automatic annotations
which are added in the platform.
Collective Annotations
When a reader finds the name he was
looking for, the platform presents the origi-
nal page with all the existing annotations.
The reader can then leaf through the orig-
inal pages and, if he wants, add further
This work on naturalisation decrees has
been carried out in co-operation with the
French National Archives: Centre Historique
des Archives Nationales (http://www.archive- A fast
leaf-through capability using automatic
annotations has also been included in the
platform. A reader can then select the name
he is looking for and has direct access to
the full document (see Figure 6).
e presented a platform to improve
access to archival documents con-
taining handwritten text. For fast access,
annotations are needed. We consider that
annotations for archived documents can
be geometric or textual. The platform we
propose to manage annotations deals with
annotations produced in two complemen-
tary ways: automatically with document
recognition and collectively with the help
of the readers during their reading. The dif-
ferent documents (civil status registers, mil-
itary forms and naturalisation decrees) on
which we present the annotation platform
show the importance of a generic system
for document recognition. With the DMOS
method we have been able to produce new
recognition systems with a minimum of
development work. The DMOS method
has been tested on 95,000 pages and can
be applied to highly structured and also less
structured documents.
oth the platform and the methodology
are important to facilitate access to the
content of handwritten documents even
if they are difficult to read. The automatic
annotations become more or less important
depending on how accurately the docu-
ment can be recognised. The platform and
the automatic production of annotations by
document image analysis will be validated
at a very large scale for the Archives dépar-
tementales des Yvelines. In 2004, 28,000
pages of military forms will be automati-
cally accessible on the Internet. 1,200,000
images of civil status registers will also
be available on the Internet for collective
annotation, and 250,000 pages of military
forms will be introduced.
his work has been carried out in co-
operation with the Archives dépar
tementales de la Mayenne, the Archives
départementales d'Ille-et-Vilaine, the Archives
départementales des Yvelines and the Centre
Historique des Archives Nationales in France,
with the support of the French Ministry of
Culture, the regions of Brittany and Pays de
la Loire, and the Conseil Général des Yvelines.
Figure 6: Fast leaf-through using automatic annotations on naturalisation
decrees. Names are presented in a table at the left. The reader can then access
the complete page and annotate it if desired.