Data Science Seminar, Fall 2015

Past Seminars: Spring 2014 , Fall 2014
Date & Time Location Speaker/Title
E1 103 The seminar usually meets at Tuesday 11:25 am.

Tuesday, Sep 8
11:25 am--12:40 pm
E1-103 Dr. Jeffrey Larson, Argonne National Lab

Title: Exploiting problem-specific knowledge and computational resources in derivative-free optimization

Abstract: This talk begins with a comparison of methods for optimizing computationally expensive functions which lack reliable gradient information. We highlight recently developed algorithms that utilize the structure of common problems, and demonstrate their efficacy on relevant applications. We then show how such algorithms can be incorporated into an asynchronous, multi-start framework. Theoretical results and practical performance of such a framework concludes the talk.

Tuesday, Nov 17
11:25 am--12:40 pm
E1-103 Dr. Sou-Cheng Choi Larson, Senior Statistician in NORC at the University of Chicago, and Research Assistant Professor in the Department of Applied Math at IIT.

Title: Probabilistic Record Linkage and Address Standardization

Abstract: Probabilistic record linkage (PRL) refers to the process of matching records from different data sources such as database tables with missing data in primary key. It can be applied to join or deduplicate records or to impute missing data, resulting in better data quality in any case. An important subproblem in PRL is to parse or standardize a text field such as address into its component fields, e.g., street number, street name, city, state, zip code, and country. Often, various modern data analysis techniques such as natural language processing and machine learning methods are gainfully employed in both PRL and address standardization to achieve high accuracies of linking or prediction. In a study, we compare the performance of a few widely used open source PRL packages, namely FRIL, Link Plus, R RecordLinkage, and SERF. In addition, we evaluate the baseline performance and sensitivity of a number of address-parsing web services including the U.S. address parser, Google Maps APIs,, and Data Science Toolkit. We will present strengths and limitations of the software and services we have evaluated. This is joint work with Edward Mulrow, NORC at the University of Chicago.

For more information contact Lulu Kang.