Friday, May 16, 2008

Justin Betteridge - Friday May 23rd

Please join us for an upcoming talk.

Lunch will be provided by Yahoo!

Title:
Linguistic Pattern Learning for Web Information Extraction

Who: Justin Betteridge
When: Friday, May 23rd, 12:00pm
Where: NSH 3002

Abstract:
Most approaches to automatically extracting structured information from the web
rely on surface text patterns. However, the manner in which such patterns are
defined, learned, and employed in the larger system varies with each case. In
this talk, I will outline the spectrum of previous work in this area and argue
for a linguistically-motivated definition, a hybrid heuristic/classifier-based
assessment, and a multi-purpose employment of textual patterns in the context of
Web Information Extraction (WIE). I will also give preliminary results from
adopting such an approach in our WIE system.

Wednesday, May 7, 2008

Grace, Hui Yang - Friday May 16th

Please join us for an upcoming talk.

Lunch will be provided by Yahoo!

Title:
Ontology Learning by Supervised Hierarchical Clustering

Who: Grace, Hui Yang
When: Friday, May 16th, 12:00pm
Where: NSH 3002

Abstract:
This work makes novel use of supervised clustering as the basic
framework to construct concept ontology interactively or
automatically. Supervised hierarchical clustering is used to
organize ontology fragments, which are identified by techniques in
natural language processing and information retrieval, into
hierarchies. At each clustering iteration, a distance metric is
learned from the clustering given by either pseudo or real
feedback. K-medoids clustering with sampling is then used to group
the concepts at the higher level. A web-based cluster naming
algorithm is also presented. By conducting a user evaluation, the
system is shown to be effective to save human efforts in the
interactive runs. Both automatic and interactive runs of the
experiments show that the approach is effective.

Friday, March 28, 2008

Nico Schlaefer - Friday, April 4, 12:00pm, NSH 3002

Please join us for an upcoming talk from Nico Schlaefer.

Lunch will be provided!

Title:
The Ephyra Question Answering System: Recent Results and Current Directions

Who: Nico Schlaefer
When: Friday, April 4, 12:00pm
Where: NSH 3002

Abstract:
This talk gives an overview of recent work on English question answering (QA) at CMU and our participation in last year’s TREC evaluation. QA is the task of retrieving accurate answers to natural language questions from a knowledge source such as the Web. The presentation includes a brief introduction to QA and the TREC competition, thus prior knowledge on QA is not required though helpful.

The talk focuses on the challenges that an end-to-end QA system needs to address, and the architectural and algorithmic solutions implemented in Ephyra, our English QA system. Ephyra is a modular and extensible framework that facilitates the integration of different QA techniques. The system is organized as a pipeline of reusable standard components for question analysis, query generation, search, answer extraction, and answer selection. The most recent setup combines a syntactic pattern learning and matching approach with answer-type based extraction techniques and a semantic answer extractor that is based on semantic role labeling.

Recently we have placed the Ephyra QA system into open source, making most of our code available to the research community. I will discuss why we took this step, and how you may benefit from our open source system - OpenEphyra - for your own research.

Wednesday, February 20, 2008

Upcoming IR Talk at CMU: John Tait

The upcoming IR-related LTI Seminar talk, John Tait on Patent Retrieval.

Monday, February 18, 2008

Jan Wiebe -- Subjectivity Analysis -- Friday, Feburary 22nd 2008, 12:00 pm (noon)

Please join us for our first IR Series talk this spring!

Lunch will be provided by Yahoo!

Speaker: Jan Wiebe
Professor, Department of Computer Science
Director, Intelligent Systems Program
University of Pittsburgh

Date/Time: Friday, 22nd, 12:00 pm (noon)

Location: 3002 Newell-Simon Hall (NSH)

Title: Subjectivity Analysis

Abstract: A growing area of research, "subjectivity analysis", is the computational study of affect, opinions, and sentiments expressed in text. Blogs, editorials, reviews (of products, movies, books, etc.), and even "objective" newspaper articles (which include many opinions and sentiments) are just some of the genres for which accurate identification and interpretation of opinions is critical for full text understanding. Subjectivity analysis will support developing tools for information analysts in governmental, commercial, and political domains who want to automatically track attitudes and feelings in the news and on-line forums. How do people feel about the latest iPod? Is there a change in the support for the new Medicare bill? A system able to automatically identify and extract opinions and sentiments from text would be an enormous help to someone sifting through the vast amounts of news and web data, trying to answer these kinds of questions. In this talk, I will first give an overview of our work in subjectivity analysis, and then will focus on experiments exploring interactions between subjectivity and word sense, showing that subjectivity is a property that can be associated with word meanings and that subjectivity classification can be beneficial for word sense disambiguation.

Bio: My research areas are artificial intelligence and natural language processing (NLP). My work with students and colleagues has been in discourse processing, pragmatics, word-sense disambiguation, and probabilistic classification in NLP. Our most recent work investigates automatically recognizing and interpretating expressions of opinions and sentiments in text, to support NLP applications such as question answering, information extraction, text categorization, and summarization.

New Home

Welcome to the new home of the CMU Information Retrieval Discussion Series. We're in the process of moving the old site here, so please be patient.

Tuesday, January 1, 2008

Past IR-Series Presentations

Friday, November 2, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: CMU at TREC 2007
Speakers: Jonathan Elsas, Le Zhao and Yangbo Zhu (CMU)

Friday, October 5, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: Estimating and Exploiting Uncertainty in Pseudo-Relevance Feedback
Speakers: Kevyn Collins-Thompson (CMU)

Friday, July 13, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: Utility-based Information Distillation Over Temporally Sequenced Documents
Speakers: Yiming Yang (CMU)

Friday, May 18, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: Collaborative Web Search - Exploiting User Activity for User Benefit
Speaker: Jill Freyne (University College Dublin)
Details

Friday, January 19, 2007 - 12:00 NSH 3002
Title: Using Graphs and Random Walks to Discover Latent Similarities in Text
Speaker: Gunes Erkan
Details

Friday, November 10, 2006, 2007 - 12:00 NSH 3002
Title: Personal Metasearch
Speaker: Paul Thomas
Details

Friday, May 19, 2006 - 12:00 NSH 3002
Title: Collaborative Adaptive User Profile with Implicit and Explicit User Feedback
Speaker: Yi Zhang
Details

Wednesday, April 19, 2006 - 12:00, NSH 3002
Title: Deriving Marketing Intelligence from Online Discussion
Speaker: Matthew Hurst and Natalie Glance
Details

Wednesday, April 5, 2006 - 12:00, NSH 3002
Title: A Graphical Framework for Contextual Search and Name Disambiguation in Email
Speaker: Einat Minkov
Details

Wednesday, March 8, 2006 - 12:00, NSH 3002
Title: Structured and Dynamic Topic Models
Speaker: John Lafferty
Details

Wednesday, February 22, 2006 - 12:00, NSH 3002
Title: Automatically Labeling Hierarchical Clusters
Speaker: Pucktada (Puck) Treeratpituk
Details

Friday, June 3, 2005 - 3:30, WeH 5409
Title: PageRank without Hyperlinks: Structural Re-ranking using Links Induced by Language Models
Speaker: Oren Kurland
Details

Wednesday, April 27, 2005 - 4:30, WeH 4601
Title: Dynamic Construction of Content-Based Topologies in Hierarchical Peer-to-Peer Networks
Speaker: Jie Lu
Details

Wednesday, March 16, 2005 - 4:30, WeH 4601
Title: Modeling Search Engine Effectiveness for Federated Search
Speaker: Luo Si
Details

Wednesday, March 2, 2005 - 4:30, WeH 4623
Title: What is the matter? Explorations in text categorization
Speaker: Lillian Lee
Details

Wednesday, January 19th, 2005 - 4:30, WeH 4601
Title: Detecting Action-Items in E-mail
Speaker: Paul N. Bennett
Details

Wednesday, December 1, 2004 - 3:00, WeH 4625
Title: Probabilistic Models of Text and Images
Speaker: David Blei
Details

Wednesday, November 17, 2004 - 3:00, WeH 4625
Title: Merging Rank Lists from Multiple Sources in Video Classification
Speaker: Wei-Hao Lin
Details

Wednesday, November 10, 2004 - 3:00, WeH 4625
Title: Associating Names with Persons in Broadcast News Video
Speaker: Jun Yang
Details

Wednesday, October 20, 2004 - 3:00, WeH 4625
Title: Graph Mining
Speaker: Christos Faloutsos
Details

Wednesday, October 6, 2004 - 3:00, WeH 4625
Topic: Review of the SIGIR 2004 Best Paper, “ A Formal Study of Information Retrieval Heuristics” by Hui Fang, Tao Tao, and ChengXiang Zhai
Speaker: Kevyn Collins-Thompson
Details

Friday, October 1, 2004 - 1:30, NSH 4513
Title: Combining Language Modeling Approach with String-matching in Near-Duplicate Detection in E-Rulemaking
Speaker: Puck Treeratpituk
Details

Wednesday, September 22, 2004 - 2:30, NSH 4632
Learning to Summarize Interviews for Project Reports
Nikesh Garera
Details

Thursday, August 26, 2004 - 3:30, WeH 4625
Analyzing Time Series Gene Expression Data
Jason Ernst
Details

Tuesday, August 17, 2004 - 2:00, WeH 4625
Learning Table Extraction from Examples
Ashwin Tengli
Details

Thursday, August 12, 2004 - 3:30, WeH 4625
Learning to Classify Email into "Speech Acts"
Vitor Carvalho
Details

Thursday, July 8, 2004 - 3:30, WeH 4625
Resource Selection for Domain-Specific Cross-Lingual IR
Monica Rogati
Details

Tuesday, January 22, 2004 - 12:00, NSH 4513
Dynamic Recommender System on User Taste Tendency Model
Soojung Lee
Details

Thursday, December 4, 2003 - 12:00, NSH 4513
The Robustness of Content-Based Search in Hierarchical Peer to Peer Networks
M. Elena Renda
Details

Thursday, October 30, 2003 - 12:00, NSH 4632
Boosting Support Vector Machines for Text Classification through Parameter-free Threshold Relaxation
Dr. James G. Shanahan
Details

Thursday, October 23, 2003 - 12:00, NSH 4632
Content-Based Retrieval in Hybrid Peer-to-Peer Networks
Jie Lu
Details

Thursday, October 16, 2003 - 12:00, NSH 4632
The Utility of Question Analysis in an Open-Domain Question Answering System
Yifen Huang
Details

Thursday, August 28, 2003 - 3:30, NSH 4632
Searching Peer-to-Peer Networks
Dr. Bin Yu
Details

Thursday, August 14, 2003 - 3:30, NSH 3001
Flexible Mixture Model for Collaborative Filtering
Luo Si

Modified Logistic Regression: An Approximation to SVM and its Applications in Large-Scale Text Categorization
Jian Zhang
Details

Thursday, June 19, 2003 - 3:30, NSH 3001
Improving Text Classifier Probability Estimates
Paul Bennett
Details

Thursday, June 5, 2003 - 3:30, NSH 3002
Radio Station Playlist Generation
Andrew P. Widdowson
Details

Thursday, May 22, 2003 - 3:30, NSH 3001
Discussion on Secondary Structure Prediction for Protein Sequences
Yan Liu
Details

Thursday, May 8, 2003 - 3:30, NSH 3001
Negative Pseudo Relevance Feedback for Multimedia Retrieval
Rong Yan
Details

Thursday, April 10, 2003 - 3:30, NSH 3001
Web Image Retrieval Re-Ranking with Relevance Model
Wei-Hao Lin
Details

Thursday, March 27, 2003 - 3:30, NSH 3001
Clustering Genes
Fan Li
Details

Thursday, March 13, 2003 - 3:30, NSH 3001
Exploration and Exploitation in Adaptive Filtering Based on Bayesian Active Learning
Yi Zhang
Details

Thursday, February 27, 2003 - 3:30, NSH 3001
Beyond Independent Topical Relevance: Evaluation Metrics and Methods for Aspect Retrieval
Dr. William Cohen
Details

Thursday, February 13, 2003 - 3:30, NSH 3001
Overview of Database Selection Methods
Luo Si
Details

Tuesday, January 14, 2003 - 11:00-12:30, Wean 4632
Topics and Techniques in (Structured) Document Retrieval
Paul Ogilvie
Details