Data Science Seminar, Fall 2015

Past Seminars: Spring 2014 , Fall 2014
Date & Time Location Speaker/Title
SB-220 The seminar usually meets at Tuesday 11:25 am.

Tuesday, Sep 8
11:25 am--12:40 pm
SB-220 Dr. Jeffrey Larson, Argonne National Lab

Title: Exploiting problem-specific knowledge and computational resources in derivative-free optimization

Abstract: This talk begins with a comparison of methods for optimizing computationally expensive functions which lack reliable gradient information. We highlight recently developed algorithms that utilize the structure of common problems, and demonstrate their efficacy on relevant applications. We then show how such algorithms can be incorporated into an asynchronous, multi-start framework. Theoretical results and practical performance of such a framework concludes the talk.

Tuesday, Oct 13
11:25 am--12:40 pm
SB-220 Dr. Pan Chen , Senior Director, Business Analytics at HAVI Global Solutions.

Title: A practitioner’s perspective on Big Data analysis

Abstract: In this talk, the speaker would like to share his own observation on the analytics business trends, some of the current gaps between the promise and reality, and how analytics professionals and business professionals can work together to bridge these gaps. Lastly, the speaker would like to share his own opinions on what this means to schools that produce analytics talents.

Tuesday, Oct 20
11:25 am--12:40 pm
SB-220 Dr. Sydeaka Watson , Research Associate (Assistant Professor), Department of Public Health Science, University of Chicago.

Title: TBA

Abstract: TBA

Tuesday, Nov 17
11:25 am--12:40 pm
SB-220 Dr. Sou-Cheng Choi, Senior Statistician in NORC at the University of Chicago, and Research Assistant Professor in the Department of Applied Math at IIT.

Title: Probabilistic Record Linkage and Address Standardization

Abstract: Probabilistic record linkage (PRL) refers to the process of matching records from different data sources such as database tables with missing data in primary key. It can be applied to join or deduplicate records or to impute missing data, resulting in better data quality in any case. An important subproblem in PRL is to parse or standardize a text field such as address into its component fields, e.g., street number, street name, city, state, zip code, and country. Often, various modern data analysis techniques such as natural language processing and machine learning methods are gainfully employed in both PRL and address standardization to achieve high accuracies of linking or prediction. In a study, we compare the performance of a few widely used open source PRL packages, namely FRIL, Link Plus, R RecordLinkage, and SERF. In addition, we evaluate the baseline performance and sensitivity of a number of address-parsing web services including the U.S. address parser, Google Maps APIs,, and Data Science Toolkit. We will present strengths and limitations of the software and services we have evaluated. This is joint work with Edward Mulrow, NORC at the University of Chicago.

Tuesday, Nov 24
11:25 am--12:40 pm
SB-220 Prof. William S. Cleveland, Shanti S. Gupta Professor of Statistics, Purdue University.

Title: Divide & Recombine (D&R) with Tessera: High Performance Computing for Deep Analysis of Big Data and Small

Abstract: TBA

For more information contact Lulu Kang.