CMU Information Retrieval Discussion Series

Monday, May 11, 2009

Hua Ai -- Friday May 15, 2009, noon

Speaker: Hua Ai (Intelligent Systems Program at University of Pittsburgh)

Date/Time: Friday May 15, 2009, noon
Location: 3002 Newell-Simon Hall (NSH)

User Simulation for Spoken Dialog System Development

Abstract:
In this talk, I will present my thesis study on investigating how to
evaluate and how to build user simulations to help dialog system
development. When evaluating user simulations, I use both human judges and
automatic evaluation measures to assess the simulation model qualities.
When building user simulations, I examine three factors that impact
simulation models in the tasks of dialog strategy learning and dialog
system development.

The talk is based on the author's ACL 2009 paper.

Friday, October 3, 2008

Rosie Jones - Thursday October 9th, 2008

3002 Newell-Simon Hall
Thursday, October 9, 2008
11:00am-12pm

Speaker : Rosie Jones

Title: Web Search Sessions

Abstract: Traditionally, information retrieval examines the search query in isolation: a query is used to retrieve documents, and the relevance of the documents returned are evaluated in relation to that query. However, users typically conduct web and other types of searches in sessions, issuing a query, examining results, and the re-issuing a modified query to improve the results. We decribe the properties of real web search sessions, and show that users conduct searches for both broad and finer grained tasks, which can be both interleaved and nested. We show that user search reformulations can be mined to identify related terms, and that we can identify the boundaries between tasks with greater accuracy than previous methods.

Bio:
Rosie Jones is a Senior Research Scientist at Yahoo!. Her research interests include web search, geographic information retrieval, and natural language processing. She received her PhD from the Language Technologies Institute at Carnegie Mellon University under the supervision of Tom Mitchell, where her doctoral thesis was titled Learning to Extract Entities from Labeled and Unlabeled Text. She is co-organizing the WSDM 2009 Workshop on Web Search Click Data (WSCD09). She served on the Senior PC for SIGIR in 2007 and 2008, and is a Senior Member of the ACM.

Thursday, October 2, 2008

Le Zhao - Thursday 2nd Oct 2008

We are going to have Le Zhao to give our first IR talk in this
semester. Reception will provided by Yahoo!. Here is the talk information:

Date: Thursday 2nd Oct 2008
Time: 2pm
Place: Wean Hall 7220

Speaker: Le Zhao
Title: A Generative Retrieval Model for Structured Documents

Abstract
Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrieval models when there are mismatches between the structured query and the actual structure in a relevant document or erroneous structure introduced by an annotator. This paper makes three contributions. First, a new generative retrieval model is proposed to deal with the mismatch problem. This new model extends the basic keyword language model by treating structure as hidden variable during the generation process. Second, variations of the model are compared. Third, term-level and structure-level smoothing strategies are studied. Evaluation was conducted with INEX XML retrieval and question-answering retrieval tasks. Experimental results indicate that the optimal structured retrieval model is task dependent, two-level Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer smoothing, and with accurate structured queries, the proposed structured retrieval model outperforms keyword retrieval significantly, on both QA and INEX datasets.

Based on work accepted at CIKM'08.

Friday, May 16, 2008

Justin Betteridge - Friday May 23rd

Please join us for an upcoming talk.

Lunch will be provided by Yahoo!

Title:
Linguistic Pattern Learning for Web Information Extraction

Who: Justin Betteridge
When: Friday, May 23rd, 12:00pm
Where: NSH 3002

Abstract:
Most approaches to automatically extracting structured information from the web
rely on surface text patterns. However, the manner in which such patterns are
defined, learned, and employed in the larger system varies with each case. In
this talk, I will outline the spectrum of previous work in this area and argue
for a linguistically-motivated definition, a hybrid heuristic/classifier-based
assessment, and a multi-purpose employment of textual patterns in the context of
Web Information Extraction (WIE). I will also give preliminary results from
adopting such an approach in our WIE system.

Wednesday, May 7, 2008

Grace, Hui Yang - Friday May 16th

Please join us for an upcoming talk.

Lunch will be provided by Yahoo!

Title:
Ontology Learning by Supervised Hierarchical Clustering

Who: Grace, Hui Yang
When: Friday, May 16th, 12:00pm
Where: NSH 3002

Abstract:
This work makes novel use of supervised clustering as the basic
framework to construct concept ontology interactively or
automatically. Supervised hierarchical clustering is used to
organize ontology fragments, which are identified by techniques in
natural language processing and information retrieval, into
hierarchies. At each clustering iteration, a distance metric is
learned from the clustering given by either pseudo or real
feedback. K-medoids clustering with sampling is then used to group
the concepts at the higher level. A web-based cluster naming
algorithm is also presented. By conducting a user evaluation, the
system is shown to be effective to save human efforts in the
interactive runs. Both automatic and interactive runs of the
experiments show that the approach is effective.

Friday, March 28, 2008

Nico Schlaefer - Friday, April 4, 12:00pm, NSH 3002

Please join us for an upcoming talk from Nico Schlaefer.

Lunch will be provided!

Title:
The Ephyra Question Answering System: Recent Results and Current Directions

Who: Nico Schlaefer
When: Friday, April 4, 12:00pm
Where: NSH 3002

Abstract:
This talk gives an overview of recent work on English question answering (QA) at CMU and our participation in last year’s TREC evaluation. QA is the task of retrieving accurate answers to natural language questions from a knowledge source such as the Web. The presentation includes a brief introduction to QA and the TREC competition, thus prior knowledge on QA is not required though helpful.

The talk focuses on the challenges that an end-to-end QA system needs to address, and the architectural and algorithmic solutions implemented in Ephyra, our English QA system. Ephyra is a modular and extensible framework that facilitates the integration of different QA techniques. The system is organized as a pipeline of reusable standard components for question analysis, query generation, search, answer extraction, and answer selection. The most recent setup combines a syntactic pattern learning and matching approach with answer-type based extraction techniques and a semantic answer extractor that is based on semantic role labeling.

Recently we have placed the Ephyra QA system into open source, making most of our code available to the research community. I will discuss why we took this step, and how you may benefit from our open source system - OpenEphyra - for your own research.

Wednesday, February 20, 2008

Upcoming IR Talk at CMU: John Tait

The upcoming IR-related LTI Seminar talk, John Tait on Patent Retrieval.