Data Science Seminar, Spring 2014
Date & Time Location Speaker/Title
SB 107 The seminar usually meets at Wednesday 11:25 am--12:40 pm.
The Spring 2014 seminar will begin the week January 20th.
Jan 29 ,
11:25 am-12:40 pm
SB 107 Dr. Boris Glavic, Assistant Professor of Computer Science, IIT

Title: Computing Provenance for Database Updates and Transactions

Abstract: Data provenance, information about the origin and creation process of data, has been used to debug queries and clean data in data warehouses, to understand and correct complex data integration transformations, for auditing, and to understand the quality of data in Big Data analytics and Data Science. Automatic provenance generation is of immense importance in Big Data and data science where the data size, its heterogeneity, and the time requirements for analysis results to be available make it infeasible to generate provenance information manually. Most of the literature on database provenance has focused on tracing the provenance of queries, i.e., to map each output row of a query to all rows from the query's input that where used to compute this output row. However, use cases such as auditing need to be able to trace the origin of a row through database updates which are usually executed as part of a transaction to preserve consistency under concurrent access and recovery from failures. In this talk I given an overview of my group's research on computing provenance for updates and transactions. Similar to most approaches for computing the provenance of queries, we use query rewrite techniques to generate queries that compute provenance as a side-effect. Our approach is based on transaction time histories for tables and an encoding of update as queries over past states of tables. This work is partially supported by the Oracle.

Feb 12 ,
11:25 am-12:40 pm
SB 107 Dr. Shlomo Engelson Argamon, Professor of Computer Science, IIT

Title: A Conversation with the DS Students.

Abstract: In this seminar, Prof. Argamon will discuss in details about the goal of the DS program, how to successfully graduate and find a job as Data Scientist. Besides he will also advise students on course selection. Students' questions, feedbacks and comments are welcome.

Feb 19 ,
11:25 am-12:40 pm
SB 107 Celestine McGee, Assistant Director, Career Management Center, IIT

Title: Career Development for Data Science Students.

Abstract: In this talk the speaker will provide guidance to students on how to prepare themselves for future career development, including details on how to search for jobs, how to prepare application documents, and how to prepare interviews, etc. Q\& A session will be followed.

Feb 26,
11:25 am-12:40 pm
SB 107 Dr. Aron Culotta, Assistant Professor of Computer Science, IIT

Title: Understanding Public Health using Twitter

Abstract: Twitter and other online social networks provide unprecedented, real-time insight into the state of the world. My research investigates machine learning and natural language processing algorithms to use this data to inform public health applications. I will review a few of our recent projects using Twitter data to (1) track the national flu rate; (2) track alcohol consumption; (3) infer user attributes such as location, race, and age; (4) infer health statistics (obesity, diabetes rates) of a community. I will then propose a framework for and outline the challenges of conducting Web scale observational studies of health to answer epidemiological questions such as how health is affected by proximity to a landfill or contaminated water source.

Mar 12,
11:25 am-12:40 pm
SB 107 Dr. Mustafa Bilgic, Assistant Professor of Computer Science, IIT

Title: Rich and Transparent Active Learning

Abstract: A fundamental task of machine learning is prediction. Applications include detecting spam, recommending products, and diagnosing patients. Machine learning algorithms need to be trained on exemplars that are annotated by humans. The accuracy of the models often improves with the increased size of annotated data. Yet, annotating data takes time and effort. Active learning aims to minimize the annotation effort by enabling the algorithms to direct the human attention to the most informative exemplars. In traditional active learning approaches, algorithms are limited in the types of information they can acquire, and they often do not provide any rationale to the user as to why a particular exemplar is chosen for annotation. In this talk, I will describe our research on enriching the interaction between the algorithms and users for more effective training of predictive models.

Mar 26,
11:25 am-12:40 pm
SB 107 Dr. Megan Fulton, Data Analyst, Data and Analytics Technology

Title: The Science of Influence

Abstract: Dr. Fulton will speak about the soft skills of around technology jobs--building a network inside and outside of the company, understanding team dynamics and communicating to executives about your ideas.

April 2,
11:25 am-12:40 pm
SB 107 Mr. Huayin Wang, VP Data Science, Accuen

Title: The challenge of attribution modeling

Abstract: Attribution modeling is about the process of allocating credit of marketing success to each individual campaigns based on their contributions. With our growing ability to collect marketing touch point data, we have a new challenge: how should we partition a success credit among multiple marketing touch points? This talk will discuss the analytical challenge, the theories and practices of attribution modeling within the context of advertising.

April 9,
11:25 am-12:40 pm
SB 107 Dr. Sonja Petrovic , Assistant Professor of Applied Math, IIT

Title: Goodness-of-fit for log-linear network models: Dynamic Markov bases using hypergraphs

Abstract: Social networks and other large sparse data sets pose significant challenges for statistical inference, as many standard statistical methods for testing model fit are not applicable in such settings. Algebraic statistics offers a theoretically justified approach to goodness-of-fit testing that relies on the theory of Markov bases and is intimately connected with the geometry of the model as described by its fibers. Most current practices require the computation of the entire basis, which is infeasible in many practical settings. We present a dynamic approach to explore the fiber of a model, which bypasses this issue, and is based on the combinatorics of hypergraphs arising from the toric algebra structure of log-linear models. We demonstrate the approach on the Holland-Leinhardt p1 model for random directed graphs that allows for reciprocated edges.

April 23 ,
11:25 am-12:40 pm
SB 107 Dr. Larry Birnbaum, Professor of Electrical Engineering and Computer Science, Northwestern University

Title: Scaling Human Editorial Judgment

Abstract: Systems that present people with information inescapably make editorial judgments in determining what information to show and how to show it. However the editorial values used to make these judgments are generally invisible to users and in many cases even to the engineers who design them. This work is aimed at developing news and media information technologies that provide explicit and visible editorial control, at scale. Some of Dr Birnbaum’s most exciting work in this area is aimed at automatically generating stories from data. A system based on this technology is already generating more than 10 thousand stories weekly in areas ranging from sports, to business, to politics. This system is the nation’s most prolific and published author of, among other things, women’s collegiate softball stories. The stories compare favorably to those written by human beings. Dr. Birnbaum will also present some more recent work on news and media technology developed in the Knight Lab, a joint initiative of the Schools of Engineering and Journalism at Northwestern.

Bio: Larry Birnbaum is Professor of Electrical Engineering and Computer Science, and of Journalism, at Northwestern University. He is a founder and PI of the Knight Lab, an interdisciplinary center for innovation in news and media technology at Northwestern, as well as co-Director of the Intelligent Information Laboratory there. Larry is also a Founder and Chief Scientific Advisor of Narrative Science Inc. His research encompasses artificial intelligence, natural language processing, machine learning, human-computer interaction, and intelligent information systems. He has authored or coauthored more than 130 articles and holds 17 patents. Larry received his B.S and Ph.D. degrees in Computer Science from Yale University (the latter in 1986) and joined the Northwestern faculty in 1989.

For more information contact Lulu Kang.