background image
22 DigiCULT
EPSRC project: POW!
Personalisation of Web
Searches through Ostension
& Summarisation, http://
By Joemon M. Jose
earching for information from digital archives is
part of our day-to-day life. Formulating a good
query is the first step in this process. It is widely
acknowledged that query formulation, the trans-
formation of a user's information need into a list of
query terms, is a difficult task. Users don't find this
process of serialising their thoughts to be intuitive,
often leading to poor queries and a widening of the
gap between the actual need and stated request. The
problem can be circumvented once users learn how
documents are indexed and also have a basic idea of
how the retrieval engine judges relevance. Howev-
er, this knowledge is not something we can take for
This problem is magnified when the informa-
tion need is extremely vague (`I don't know what
I'm looking for, but I'll know it when I find it'). This
common scenario typically results in the formula-
tion of short queries, consisting of 1-3 non-discrimi-
natory terms. This leads to the associated problem of
information overload where millions of documents
are returned for a broad query, e.g. in the context of
Web search. Certain users may trawl through pag-
es and pages of results but in reality most will view
the first few result pages before accepting these to
be the best possible results, albeit mistakenly in most
cases. This typically leads to extended or unproduc-
tive search sessions where any initial time constraints
are waived and users become dissatisfied. In addition,
they unknowingly miss relevant documents.
During the search process, a user's information
need is constantly developing and so therefore is
the query or queries associated with the search ses-
sion. As users think, digest, interpret and problem
solve, this information need can be influenced by any
number of factors, for example, the quality of search
results or the content of a particular document. Cur-
rent search techniques do not address these funda-
mental problems adequately.
As part of a recently concluded project
by the Engineering and Physical Sciences Research
Council (EPSRC), the Information Retrieval Group
of the University of Glasgow's Department of Com-
puting Science
worked on the development of
prototypes to model users' current and long-term
information needs.
Three systems based on the implicit feedback tech-
niques are described below and evaluated based on
a task-oriented and user-centred approach for the
evaluation of personalised IR systems. Based on the
commonly applied relevance feedback systems, the
implicit technique uses relevance indicators gath-
ered unobtrusively from searcher interaction, to mod-
ify the initial query. Whilst not as accurate as explicit
feedback, it has been demonstrated that implicit feed-
back can be an effective substitute for explicit feed-
back in interactive information seeking environments.
In the `fetch' approach the system models users'
long-term information needs and proactively fetch-
es relevant documents for the user. The system uses a
bundling technique allowing users to develop strat-
egies for coping with the loss of context occurring
when a variety of independent sources are viewed
together. Over a period of time, through the obser-
vation of this bundling, the agent can build an accu-
rate profile of the multifaceted information need and
eventually recommend relevant Web pages without
the need for users to mark documents explicitly.
elevance feedback (RF) is the main post-query
method for automatically improving a system's
representation of a searcher's information need. How-
ever, the technique relies on explicit relevance assess-
ments provided by the searcher (i.e. indications of
which documents contain relevant information) and
creates a revised query attuned to those marked.
RF systems suffer from a number of problems that
hinder its practical usage. RF depends on a series of
relevance assessments made explicitly by the search-
er. The nature of the process is such that search-
ers must visit a number of documents and explicitly
mark each as either relevant or non-relevant. This is