Office hours: Fridays, 11am-12pm in CoRE 246 (by appointment)
Office hours: Mondays 1-2:30pm at CBIM (by appointment)
Wednesdays, 10:20 - 13:20
SEC 117 (Pond Science & Eng. Resource Center), Busch Campus
In today's world of massive amounts of data, new methods and techniques are needed. In this course, we discuss methods to dive deeper into such data, covering storage and retrieval but also going beyond it to consider recent advances at the intersection of Big Data and Artificial Intelligence. In terms of areas, the course will focus on techniques for 1) Big Data processing (MapReduce and Spark), 2) Information Retrieval and indexing, and 3) Natural Language Processing (and some Computer Vision) to enable retrieval based on the content and semantics.
In terms of methods, much of the second half of the course will focus on Deep Learning and neural network methods for these areas.
The course will include hands-on practical work on real data sets, based on the Apache Spark platform as well as on deep learning frameworks.
Basic familiarity with data structures (from introductory computer science classes) and basic mathematics and probability theory.
Basic programming ability. Some of our examples will be based on Apache Spark. Prior knowledge of Spark (especially using the Scala programming language) is not required, but certainly won't harm. Other examples will use Deep Learning tools, most of which require knowledge of Python, C++, Java, or Scala.
|09-04||Logistics, Introduction to Massive Data|
|09-11||Big Data Processing with MapReduce and Spark|
|09-18||Big Data Processing with Spark|
|09-25||No class (due to conference presentation)|
|10-02||Big Data Processing: Spark, Data Streams, Data Storage|
|10-09||Information Retrieval: Models, Storage, and Indexing|
|10-16||Vector-based Storage and Retrieval|
|10-23||Vector-based Representation Learning|
|10-30||Representation/Deep Learning: Gradient-based Optimization|
|11-06||Representation/Deep Learning: Network Architectures|
|11-13||Representation/Deep Learning: Sequence Modeling|
|11-20||Semantic Content Analysis|
|11-27||No class due to Thanksgiving Recess|
|12-04||Information Retrieval: Question Answering, Recap|
|12-11||Short Project Presentations|
See also: Rutgers Academic Calendar.
Sakai is used to host slides, as well as to provide a forum for discussions.
The grades will be determined as follows:
Since we are focusing on the latest research and technology, this course does not strictly follow any designated coursebook. However, the following (optional) books may be useful.
Note: The book is helpful but not required, especially since this is a fast-paced field and some of the latest changes to Spark are not yet covered in the book.
Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive Datasets
Note: Available for free online.
For problems or questions about this site, please contact Gerard de Melo. Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers web sites to the instructor or to firstname.lastname@example.org, or complete the Report Accessibility Barrier / Provide Feedback form.