Fridays, 10:30-11:30 AM in Hill 206
Tuesdays, 2:30-3:30 PM in Hill 206
Our modern world is increasingly being driven by data. We increasingly see data determining which companies succeed, who wins elections, and even who marries whom. In this course, we will cover fundamental techniques in the emerging field of Data Science. This course is aimed at computer science students, so we will focus in particular on important computational aspects such as working with massive amounts of data ("Big Data") and learning from data ("machine learning").
|01-19 Fri||Data Representation, Preprocessing|
|01-23 Tue||Data Preprocessing ("Data Wrangling")|
|01-26 Fri||Data Wrangling and Data Management|
|01-30 Tue||Exploratory Data Analysis|
|02-02 Fri||Data Visualization (Guest Lecture by Professor James Abello)|
|02-06 Tue||No class|
|02-09 Fri||Exploratory Data Analysis / Big Data (Hadoop)|
|02-13 Tue||Big Data (Hadoop/Spark)|
|02-16 Fri||Big Data (Spark)|
|02-20 Tue||Data Streams and Big Data Algorithms|
|02-23 Fri||Learning from Data: Basics, Evaluation|
|02-27 Tue||Learning from Data: Algorithms|
|03-02 Fri||Learning from Data: Algorithms|
|03-06 Tue||Learning from Data: Algorithms|
|03-09 Fri||Learning from Data: Learning Representations|
|03-13 Tue||Spring Recess|
|03-16 Fri||Spring Recess|
|03-20 Tue||In-Class Mid-Term Exam|
|03-23 Fri||Data Mining Algorithms|
|03-27 Tue||Data Mining Algorithms|
|03-30 Fri||Social Networks, Link Analysis, Graph Data Mining|
|04-10 Tue||Dimensionality Reduction|
|04-13 Fri||Practical Issues: Data Integration|
|04-17 Tue||Practical Issues: Ethics and Data Science|
|04-24 Tue||Project Presentations|
|04-27 Fri||Project Presentations|
See also: Rutgers Academic Calendar.
Sakai is used to host slides, as well as to provide a forum for discussions.
The grades will be determined as follows:
Since we are focusing on the latest developments, this course does not strictly follow any designated coursebook. Rather, specific references for further reading will be posted at the end of the slides for each unit (typically the last slide). Still, the following (optional) books may be useful.
Note: The book is helpful but not required, especially since this is a fast-paced field and some of the latest changes to Spark are not yet covered in the book.
Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive Datasets
Note: Available for free online.