Speaker: Lee Jensen (Ancestry.com)
Date/Time: Thursday, June 10, 2010, noon
Place: GHC 6115
Title: Exploiting Sequential Relationships for Familial Classification
Information hidden in sequences of related data provides a significant exploitable source for information extraction. In this work we demonstrate techniques for increasing the signal of classifications from sequences of dependent data. This includes using algorithms, as well as generating features, that take advantage of the sequence oriented nature of the data. The efficacy of these techniques is evaluated with the classification of familial relationships within United States census data. The census data proves an interesting corpus for this work because of the highly sequential nature of the instances, and the explicit classification relationship between one instance and an instance previously found in the sequence.