Gerard de Melo
CBIM 8, Dept. of Computer Science
Office hours: Wednesdays, 5-6 PM
(Note: Office hours on Nov. 14 moved to Nov. 13 5-6pm)

Teaching Assistants

Abu Shoeb
Email: as2352@scarletmail.rutg...
Office hours: Thursdays, 10–11am in Hill 264A

Sepehr Janghorbani
Email: sj620@scarletmail.rutg...
Office hours: Mondays 2-3:30pm in CBIM 17



In today's world of massive amounts of data, new methods and techniques are needed. In this course, we discuss methods to dive deeper into such data, covering storage and retrieval but also going beyond it to consider recent advances at the intersection of Big Data and Artificial Intelligence. In terms of areas, the course will focus on techniques for 1) Big Data processing (MapReduce and Spark), 2) Information Retrieval and indexing, and 3) Natural Language Processing (and some Computer Vision) to enable retrieval based on the content and semantics.

In terms of methods, much of the second half of the course will focus on Deep Learning and neural network methods for these areas.

The course will include hands-on practical work on real data sets, based on the Apache Spark platform as well as on deep learning frameworks.


Basic familiarity with data structures (from introductory computer science classes) and basic mathematics and probability theory.

Basic programming ability. Some of our examples will be based on Apache Spark. Prior knowledge of Spark (especially using the Scala programming language) is not required, but certainly won't harm. Other examples will use Deep Learning tools, most of which require knowledge of Python, C++, Java, or Scala.


09-05Logistics, Introduction to Massive Data
09-12No class
09-19Big Data Processing with MapReduce and Spark
09-26Big Data Processing with Spark
10-03Big Data Processing: Spark, Data Streams, Data Storage
10-10Information Retrieval: Models, Storage, and Indexing
10-17Vector-based Storage and Retrieval
10-24Vector-based Representation Learning
10-31Representation/Deep Learning: Gradient-based Optimization
11-07Representation/Deep Learning: Network Architectures
11-14Representation/Deep Learning: Sequence Modeling
11-21No class due to Thanksgiving Recess
11-28Semantic Content Analysis
12-05Information Retrieval: Question Answering, Recap
12-12Short Project Presentations

See also: Rutgers Academic Calendar.

Slides, Discussion Forum

Sakai is used to host slides, as well as to provide a forum for discussions.

Grading and Course Project

The grades will be determined as follows:

The main course requirement will be a semester-long course project, involving Apache Spark and/or Deep Learning. See the introduction and course project slides on Sakai for further details. Additionally, there will be graded in-class quizzes. These will be announced in advance. Graded homework assignments will be announced on Sakai. Make sure to enable e-mail notifications.



Since we are focusing on the latest research and technology, this course does not strictly follow any designated coursebook. However, the following (optional) books may be useful.


For problems or questions about this site, please contact Gerard de Melo. Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers web sites to: or complete the Report Accessibility Barrier / Provide Feedback form.