Time: Noon
Location: TBA
Speaker: Jaime Arguello, Language Technologies Institute, School of Computer Science, CMU
Title: Vertical Selection in the Presence of Unlabeled Verticals.
Abstract:
Vertical aggregation is the task of incorporating results from specialized search engines or verticals (e.g., images, video, news) into Web search results. Vertical selection is the subtask of deciding, given a query, which verticals, if any, are relevant. State of the art approaches use machine learned models to predict which verticals are relevant to a query. When trained using a large set of labeled data, a machine learned vertical selection model outperforms baselines which require no training data. Unfortunately, whenever a new vertical is introduced, a costly new set of editorial data must be gathered. In this paper, we propose methods for reusing training data from a set of existing (source) verticals to learn a predictive model for a new (target) vertical. We study methods for learning robust, portable, and adaptive cross-vertical models. Experiments show the need to focus on different types of features when maximizing portability (the ability for a single model to make accurate predictions across multiple verticals) than when maximizing adaptability (the ability for a single model to make accurate predictions for a specific vertical). We demonstrate the efficacy of our methods through extensive experimentation for 11 verticals.
This is joint work with Fernando Diaz and Jean-Francois Paiement from Yahoo! Labs and will be presented at SIGIR 2010.