Friday, October 3, 2008

Rosie Jones - Thursday October 9th, 2008

3002 Newell-Simon Hall
Thursday, October 9, 2008

Speaker : Rosie Jones

Title: Web Search Sessions

Abstract: Traditionally, information retrieval examines the search query in isolation: a query is used to retrieve documents, and the relevance of the documents returned are evaluated in relation to that query. However, users typically conduct web and other types of searches in sessions, issuing a query, examining results, and the re-issuing a modified query to improve the results. We decribe the properties of real web search sessions, and show that users conduct searches for both broad and finer grained tasks, which can be both interleaved and nested. We show that user search reformulations can be mined to identify related terms, and that we can identify the boundaries between tasks with greater accuracy than previous methods.

Rosie Jones is a Senior Research Scientist at Yahoo!. Her research interests include web search, geographic information retrieval, and natural language processing. She received her PhD from the Language Technologies Institute at Carnegie Mellon University under the supervision of Tom Mitchell, where her doctoral thesis was titled Learning to Extract Entities from Labeled and Unlabeled Text. She is co-organizing the WSDM 2009 Workshop on Web Search Click Data (WSCD09). She served on the Senior PC for SIGIR in 2007 and 2008, and is a Senior Member of the ACM.

Thursday, October 2, 2008

Le Zhao - Thursday 2nd Oct 2008

We are going to have Le Zhao to give our first IR talk in this
semester. Reception will provided by Yahoo!. Here is the talk information:

Date: Thursday 2nd Oct 2008
Time: 2pm
Place: Wean Hall 7220

Speaker: Le Zhao
Title: A Generative Retrieval Model for Structured Documents

Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrieval models when there are mismatches between the structured query and the actual structure in a relevant document or erroneous structure introduced by an annotator. This paper makes three contributions. First, a new generative retrieval model is proposed to deal with the mismatch problem. This new model extends the basic keyword language model by treating structure as hidden variable during the generation process. Second, variations of the model are compared. Third, term-level and structure-level smoothing strategies are studied. Evaluation was conducted with INEX XML retrieval and question-answering retrieval tasks. Experimental results indicate that the optimal structured retrieval model is task dependent, two-level Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer smoothing, and with accurate structured queries, the proposed structured retrieval model outperforms keyword retrieval significantly, on both QA and INEX datasets.

Based on work accepted at CIKM'08.