Friday, October 3, 2008
Thursday, October 9, 2008
Speaker : Rosie Jones
Title: Web Search Sessions
Abstract: Traditionally, information retrieval examines the search query in isolation: a query is used to retrieve documents, and the relevance of the documents returned are evaluated in relation to that query. However, users typically conduct web and other types of searches in sessions, issuing a query, examining results, and the re-issuing a modified query to improve the results. We decribe the properties of real web search sessions, and show that users conduct searches for both broad and finer grained tasks, which can be both interleaved and nested. We show that user search reformulations can be mined to identify related terms, and that we can identify the boundaries between tasks with greater accuracy than previous methods.
Rosie Jones is a Senior Research Scientist at Yahoo!. Her research interests include web search, geographic information retrieval, and natural language processing. She received her PhD from the Language Technologies Institute at Carnegie Mellon University under the supervision of Tom Mitchell, where her doctoral thesis was titled Learning to Extract Entities from Labeled and Unlabeled Text. She is co-organizing the WSDM 2009 Workshop on Web Search Click Data (WSCD09). She served on the Senior PC for SIGIR in 2007 and 2008, and is a Senior Member of the ACM.
Thursday, October 2, 2008
semester. Reception will provided by Yahoo!. Here is the talk information:
Date: Thursday 2nd Oct 2008
Place: Wean Hall 7220
Speaker: Le Zhao
Title: A Generative Retrieval Model for Structured Documents
Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrieval models when there are mismatches between the structured query and the actual structure in a relevant document or erroneous structure introduced by an annotator. This paper makes three contributions. First, a new generative retrieval model is proposed to deal with the mismatch problem. This new model extends the basic keyword language model by treating structure as hidden variable during the generation process. Second, variations of the model are compared. Third, term-level and structure-level smoothing strategies are studied. Evaluation was conducted with INEX XML retrieval and question-answering retrieval tasks. Experimental results indicate that the optimal structured retrieval model is task dependent, two-level Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer smoothing, and with accurate structured queries, the proposed structured retrieval model outperforms keyword retrieval significantly, on both QA and INEX datasets.
Based on work accepted at CIKM'08.
Friday, May 16, 2008
Lunch will be provided by Yahoo!
Linguistic Pattern Learning for Web Information Extraction
Who: Justin Betteridge
When: Friday, May 23rd, 12:00pm
Where: NSH 3002
Most approaches to automatically extracting structured information from the web
rely on surface text patterns. However, the manner in which such patterns are
defined, learned, and employed in the larger system varies with each case. In
this talk, I will outline the spectrum of previous work in this area and argue
for a linguistically-motivated definition, a hybrid heuristic/classifier-based
assessment, and a multi-purpose employment of textual patterns in the context of
Web Information Extraction (WIE). I will also give preliminary results from
adopting such an approach in our WIE system.
Wednesday, May 7, 2008
Lunch will be provided by Yahoo!
Ontology Learning by Supervised Hierarchical Clustering
Who: Grace, Hui Yang
When: Friday, May 16th, 12:00pm
Where: NSH 3002
This work makes novel use of supervised clustering as the basic
framework to construct concept ontology interactively or
automatically. Supervised hierarchical clustering is used to
organize ontology fragments, which are identified by techniques in
natural language processing and information retrieval, into
hierarchies. At each clustering iteration, a distance metric is
learned from the clustering given by either pseudo or real
feedback. K-medoids clustering with sampling is then used to group
the concepts at the higher level. A web-based cluster naming
algorithm is also presented. By conducting a user evaluation, the
system is shown to be effective to save human efforts in the
interactive runs. Both automatic and interactive runs of the
experiments show that the approach is effective.
Friday, March 28, 2008
Lunch will be provided!
The Ephyra Question Answering System: Recent Results and Current Directions
Who: Nico Schlaefer
When: Friday, April 4, 12:00pm
Where: NSH 3002
This talk gives an overview of recent work on English question answering (QA) at CMU and our participation in last year’s TREC evaluation. QA is the task of retrieving accurate answers to natural language questions from a knowledge source such as the Web. The presentation includes a brief introduction to QA and the TREC competition, thus prior knowledge on QA is not required though helpful.
The talk focuses on the challenges that an end-to-end QA system needs to address, and the architectural and algorithmic solutions implemented in Ephyra, our English QA system. Ephyra is a modular and extensible framework that facilitates the integration of different QA techniques. The system is organized as a pipeline of reusable standard components for question analysis, query generation, search, answer extraction, and answer selection. The most recent setup combines a syntactic pattern learning and matching approach with answer-type based extraction techniques and a semantic answer extractor that is based on semantic role labeling.
Recently we have placed the Ephyra QA system into open source, making most of our code available to the research community. I will discuss why we took this step, and how you may benefit from our open source system - OpenEphyra - for your own research.
Wednesday, February 20, 2008
Monday, February 18, 2008
Lunch will be provided by Yahoo!
Speaker: Jan Wiebe
Professor, Department of Computer Science
Director, Intelligent Systems Program
University of Pittsburgh
Date/Time: Friday, 22nd, 12:00 pm (noon)
Location: 3002 Newell-Simon Hall (NSH)
Title: Subjectivity Analysis
Abstract: A growing area of research, "subjectivity analysis", is the computational study of affect, opinions, and sentiments expressed in text. Blogs, editorials, reviews (of products, movies, books, etc.), and even "objective" newspaper articles (which include many opinions and sentiments) are just some of the genres for which accurate identification and interpretation of opinions is critical for full text understanding. Subjectivity analysis will support developing tools for information analysts in governmental, commercial, and political domains who want to automatically track attitudes and feelings in the news and on-line forums. How do people feel about the latest iPod? Is there a change in the support for the new Medicare bill? A system able to automatically identify and extract opinions and sentiments from text would be an enormous help to someone sifting through the vast amounts of news and web data, trying to answer these kinds of questions. In this talk, I will first give an overview of our work in subjectivity analysis, and then will focus on experiments exploring interactions between subjectivity and word sense, showing that subjectivity is a property that can be associated with word meanings and that subjectivity classification can be beneficial for word sense disambiguation.
Bio: My research areas are artificial intelligence and natural language processing (NLP). My work with students and colleagues has been in discourse processing, pragmatics, word-sense disambiguation, and probabilistic classification in NLP. Our most recent work investigates automatically recognizing and interpretating expressions of opinions and sentiments in text, to support NLP applications such as question answering, information extraction, text categorization, and summarization.
Tuesday, January 1, 2008
Title: CMU at TREC 2007
Speakers: Jonathan Elsas, Le Zhao and Yangbo Zhu (CMU)
Friday, October 5, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: Estimating and Exploiting Uncertainty in Pseudo-Relevance Feedback
Speakers: Kevyn Collins-Thompson (CMU)
Friday, July 13, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: Utility-based Information Distillation Over Temporally Sequenced Documents
Speakers: Yiming Yang (CMU)
Friday, May 18, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: Collaborative Web Search - Exploiting User Activity for User Benefit
Speaker: Jill Freyne (University College Dublin)
Friday, January 19, 2007 - 12:00 NSH 3002
Title: Using Graphs and Random Walks to Discover Latent Similarities in Text
Speaker: Gunes Erkan
Friday, November 10, 2006, 2007 - 12:00 NSH 3002
Title: Personal Metasearch
Speaker: Paul Thomas
Friday, May 19, 2006 - 12:00 NSH 3002
Title: Collaborative Adaptive User Profile with Implicit and Explicit User Feedback
Speaker: Yi Zhang
Wednesday, April 19, 2006 - 12:00, NSH 3002
Title: Deriving Marketing Intelligence from Online Discussion
Speaker: Matthew Hurst and Natalie Glance
Wednesday, April 5, 2006 - 12:00, NSH 3002
Title: A Graphical Framework for Contextual Search and Name Disambiguation in Email
Speaker: Einat Minkov
Wednesday, March 8, 2006 - 12:00, NSH 3002
Title: Structured and Dynamic Topic Models
Speaker: John Lafferty
Wednesday, February 22, 2006 - 12:00, NSH 3002
Title: Automatically Labeling Hierarchical Clusters
Speaker: Pucktada (Puck) Treeratpituk
Title: PageRank without Hyperlinks: Structural Re-ranking using Links Induced by Language Models
Speaker: Oren Kurland
Wednesday, April 27, 2005 - 4:30, WeH 4601
Title: Dynamic Construction of Content-Based Topologies in Hierarchical Peer-to-Peer Networks
Speaker: Jie Lu
Wednesday, March 16, 2005 - 4:30, WeH 4601
Title: Modeling Search Engine Effectiveness for Federated Search
Speaker: Luo Si
Wednesday, March 2, 2005 - 4:30, WeH 4623
Title: What is the matter? Explorations in text categorization
Speaker: Lillian Lee
Wednesday, January 19th, 2005 - 4:30, WeH 4601
Title: Detecting Action-Items in E-mail
Speaker: Paul N. Bennett
Wednesday, December 1, 2004 - 3:00, WeH 4625
Title: Probabilistic Models of Text and Images
Speaker: David Blei
Wednesday, November 17, 2004 - 3:00, WeH 4625
Title: Merging Rank Lists from Multiple Sources in Video Classification
Speaker: Wei-Hao Lin
Wednesday, November 10, 2004 - 3:00, WeH 4625
Title: Associating Names with Persons in Broadcast News Video
Speaker: Jun Yang
Wednesday, October 20, 2004 - 3:00, WeH 4625
Title: Graph Mining
Speaker: Christos Faloutsos
Wednesday, October 6, 2004 - 3:00, WeH 4625
Topic: Review of the SIGIR 2004 Best Paper, “ A Formal Study of Information Retrieval Heuristics” by Hui Fang, Tao Tao, and ChengXiang Zhai
Speaker: Kevyn Collins-Thompson
Friday, October 1, 2004 - 1:30, NSH 4513
Title: Combining Language Modeling Approach with String-matching in Near-Duplicate Detection in E-Rulemaking
Speaker: Puck Treeratpituk
Wednesday, September 22, 2004 - 2:30, NSH 4632
Learning to Summarize Interviews for Project Reports
Thursday, August 26, 2004 - 3:30, WeH 4625
Analyzing Time Series Gene Expression Data
Tuesday, August 17, 2004 - 2:00, WeH 4625
Learning Table Extraction from Examples
Thursday, August 12, 2004 - 3:30, WeH 4625
Learning to Classify Email into "Speech Acts"
Thursday, July 8, 2004 - 3:30, WeH 4625
Resource Selection for Domain-Specific Cross-Lingual IR
Tuesday, January 22, 2004 - 12:00, NSH 4513
Dynamic Recommender System on User Taste Tendency Model
Thursday, December 4, 2003 - 12:00, NSH 4513
The Robustness of Content-Based Search in Hierarchical Peer to Peer Networks
M. Elena Renda
Thursday, October 30, 2003 - 12:00, NSH 4632
Boosting Support Vector Machines for Text Classification through Parameter-free Threshold Relaxation
Dr. James G. Shanahan
Thursday, October 23, 2003 - 12:00, NSH 4632
Content-Based Retrieval in Hybrid Peer-to-Peer Networks
Thursday, October 16, 2003 - 12:00, NSH 4632
The Utility of Question Analysis in an Open-Domain Question Answering System
Thursday, August 28, 2003 - 3:30, NSH 4632
Searching Peer-to-Peer Networks
Dr. Bin Yu
Thursday, August 14, 2003 - 3:30, NSH 3001
Flexible Mixture Model for Collaborative Filtering
Modified Logistic Regression: An Approximation to SVM and its Applications in Large-Scale Text Categorization
Thursday, June 19, 2003 - 3:30, NSH 3001
Improving Text Classifier Probability Estimates
Thursday, June 5, 2003 - 3:30, NSH 3002
Radio Station Playlist Generation
Andrew P. Widdowson
Thursday, May 22, 2003 - 3:30, NSH 3001
Discussion on Secondary Structure Prediction for Protein Sequences
Thursday, May 8, 2003 - 3:30, NSH 3001
Negative Pseudo Relevance Feedback for Multimedia Retrieval
Thursday, April 10, 2003 - 3:30, NSH 3001
Web Image Retrieval Re-Ranking with Relevance Model
Thursday, March 27, 2003 - 3:30, NSH 3001
Thursday, March 13, 2003 - 3:30, NSH 3001
Exploration and Exploitation in Adaptive Filtering Based on Bayesian Active Learning
Thursday, February 27, 2003 - 3:30, NSH 3001
Beyond Independent Topical Relevance: Evaluation Metrics and Methods for Aspect Retrieval
Dr. William Cohen
Thursday, February 13, 2003 - 3:30, NSH 3001
Overview of Database Selection Methods
Topics and Techniques in (Structured) Document Retrieval
- When preparing your presentation, view this as a normal conference talk and prepare accordingly.
- Please prepare a short (20-30 minute) talk or a long (45 minute) talk according to the time slot the organizer has reserved.
- You should assume that the audience is knowledgeable in IR and many of the techniques commonly used in the field. Unless the purpose of your talk is a general overview of a research problem, you should assume that the related research can be covered very briefly (one or two slides).
- Focus on presenting your thoughts, issues, and contributions to the problem at hand.
- If you have extra material that won't fit in the talk, prepare slides for them as it is very likely that we will be willing to hear more about the subject after the main talk is over.
- Please don't be afraid to present work in progress. Even with the change of presentation format to conference talk style , we are still driven by our original goals of learning about current research and fostering collaboration on work in progress.
Thanks are due to Yiming Yang for some helpful suggestions.
- learning about each others' research,
- discussing the big (and little) problems of a research area, and
- fostering collaboration across groups.
One of the goals of the series is to strike a nice balance between area overview presentations and technical presentations on specific approaches. Because many of us work on quite different areas of Information Retrieval, we often find it beneficial to have discussions that focus on the important problems in our respective research areas and the techniques that have been found to be broadly useful (and occasionally the spectacular failures). In order to keep grounded, we also have some technical discussions on specific techniques and approaches.
In an effort to make this series a valuable resource for others, we plan on posting the authors' slides (with permission). We also ask that authors provide a short reading list of articles (preferably online) for people who want to learn about the topics in more depth.
For LTI students: a presentation in the IR Discussion Series can fulfill your annual LTI talk requirement. Let us know if you wish to do this more than a week in advance, so that we can advertise the talk according to policy. You will still be required to make sure two faculty are present and that they fill out the form after your presentation.