background image
gaze on it much longer than on the images.
Most of them used the text as an anchor
from which to make judgements about the
search results and the images as confirmato-
ry evidence for their selections.
ur current work is more focused on
the second phase of the user interface
research: to create `Agile View' interfaces.We
aim to give people several classes of views:
overviews of collections of video segments;
previews of specific video objects; reviews
or history views of past searches or commu-
nity activities; peripheral views of related
objects; and shared views instantiated by
others active in the sharium environment
(using recommendations based on user pat-
terns captured in the server logs).
here are a lot of challenges in building
a digital video library, such as creating
video surrogates, cataloguing and indexing
videos at different levels, and sharing videos
among different users. As we gain more
experience, the backend processes of manag-
ing a video digital library become more
routinised.We have developed a variety of
tools and scripts in this regard.Three tools
have been developed or modified to help to
generate video metadata.We modified the
MERIT system [Kobla et al., 1998] to
extract keyframes on our Linux platform.
VAST [Mu et al., 2003b] is a semi-automat-
ic tool to generate storyboards and fast-for-
ward surrogates.VIVO [Yang et al., 2003a] is
a tool to help digital video librarians to
input, edit and manage video metadata ele-
ments on different levels.We have also devel-
oped some video sharing application tools.
ISEE [Mu et al., 2003a] is a real-time collab-
orative multimedia distance learning tool.
Brief descriptions of these three tools follow.
(Video Annotation and
Summarization Tool) [Mu et al.,
2003b] is a novel video metadata authori-
sation system that integrates both semantic
and visual metadata (see Figure 2). Balance
1997,Tse et al., 1998, Lee et al., 2002,
etc.]. A slide show [Tse et al, 1998] is a
dynamic video surrogate, which presents
individual key frames consecutively. A fast
forward is a new kind of surrogate we have
been developing and investigating
[Wildemuth et al., 2003]; it mimics the
fast-forward function of a VCR. A fast for-
ward plays the whole video content at a
much faster speed than a VCR and, like the
analogue fast forward, no audio is provided.
o view the fast-forward surrogate of
this video clip, click here
(requires Quicktime) or to see the full
video represented by this storyboard please
everal user studies have been conducted
to evaluate the effectiveness of video
surrogates, based on user performance (i.e.
object and action recognition, gist determi-
nation, and visual gist [Yang et al., 2003b,
and Wildemuth et al. 2002]) compared five
multi-modal video surrogates in relation to
their usefulness and usability in accomplish-
ing specific tasks.Those surrogates included
storyboards with text or audio keywords,
slide shows with text or audio keywords
and fast-forward.The results showed that no
surrogate was uni-versally judged `best', but
the fast-forward surrogate garnered strong
support, particularly from experienced
video users. Additionally, users suggested
that the interaction between textual and
visual information was quite important to
understand the videos. In a subsequent
study, we evaluated four speeds of fast-for-
ward surrogate [Wildemuth et al., 2003]
and, based upon users' performance, we set
the default speed for fast-forward to be 1:64
in the Open Video archive.These perform-
ance studies were augmented with an eye-
tracking study to see "where, when, and
how long users looked at text and image
surrogates" [Hughes et al., 2003, p1].The
results showed that the participants looked
at textual information first and fixed their
ur second goal is to create and eval-
uate highly interactive user interfaces
that allow people to select representations,
and control them quickly and easily to
achieve their information-seeking needs
[Marchionini et al., 2002].We are working
towards this goal in two phases. In the first
phase, we evaluated various kinds of video
representations (known as surrogates) indi-
vidually, trying to see how much people
could understand from watching only
these surrogates [see Wildemuth et al.,
2002, and Wildemuth et al., 2003] and
how they interacted with textual and visu-
al information [Hughes et al., 2003]. In the
second phase, we have integrated these sur-
rogates together to create `Agile View'
interfaces for users to explore video con-
tent [see Marchionini et al., 2000; Geisler,
2002]. At present, we are creating and eval-
uating previews, overviews, reviews (or his-
tory views), peripheral views and shared
views, and are initiating user studies
focused on their combination.
ue to the complex characteristics of
the video content (e.g. multiple
channels, long download times), various
kinds of video surrogates need to be creat-
ed for users to search and browse before
they decide to download and use the
whole video. O'Connor [1985] first sug-
gested that key frames - still images repre-
sentative of scenes extracted from the
video itself - could be used to construct
video abstracts or `contour maps'. Several
kinds of video surrogates (e.g. poster frame,
storyboard, slideshow and fast forward) as
well as the traditional textual surrogates
(e.g. title, keywords and descriptions) have
been applied in the design of Web inter-
faces of the Open Video archive. A poster
frame [Christel et al., 1997] is a single
`thumbnail' image representing a video
clip. A storyboard (see Figure 1) displays
multiple key frames simultaneously and has
been widely used as a video surrogate in
video research projects [Christel et al.,
Quicktime movie