Instructor

Gerard de Melo
CBIM 8, Dept. of Computer Science
Office hours: Wednesdays, 5-6 PM
(Note: no office hours on Nov. 6)

Teaching Assistants

Abu Shoeb
Email: as2352@scarletmail.rutg...
Office hours: Fridays, 11am-12pm in CoRE 246 (by appointment)

Sepehr Janghorbani
Email: sj620@scarletmail.rutg...
Office hours: Mondays 1-2:30pm at CBIM (by appointment)

Announcements

Overview

In today's world of massive amounts of data, new methods and techniques are needed. In this course, we discuss methods to dive deeper into such data, covering storage and retrieval but also going beyond it to consider recent advances at the intersection of Big Data and Artificial Intelligence. In terms of areas, the course will focus on techniques for 1) Big Data processing (MapReduce and Spark), 2) Information Retrieval and indexing, and 3) Natural Language Processing (and some Computer Vision) to enable retrieval based on the content and semantics.

In terms of methods, much of the second half of the course will focus on Deep Learning and neural network methods for these areas.

The course will include hands-on practical work on real data sets, based on the Apache Spark platform as well as on deep learning frameworks.

Prerequisites

Basic familiarity with data structures (from introductory computer science classes) and basic mathematics and probability theory.


Basic programming ability. Some of our examples will be based on Apache Spark. Prior knowledge of Spark (especially using the Scala programming language) is not required, but certainly won't harm. Other examples will use Deep Learning tools, most of which require knowledge of Python, C++, Java, or Scala.

Topics

DateTopics
09-04Logistics, Introduction to Massive Data
09-11Big Data Processing with MapReduce and Spark
09-18Big Data Processing with Spark
09-25No class (due to conference presentation)
10-02Big Data Processing: Spark, Data Streams, Data Storage
10-09Information Retrieval: Models, Storage, and Indexing
10-16Vector-based Storage and Retrieval
10-23Vector-based Representation Learning
10-30Representation/Deep Learning: Gradient-based Optimization
11-06Representation/Deep Learning: Network Architectures
11-13Representation/Deep Learning: Sequence Modeling
11-20Semantic Content Analysis
11-27No class due to Thanksgiving Recess
12-04Information Retrieval: Question Answering, Recap
12-11Short Project Presentations

See also: Rutgers Academic Calendar.

Slides, Discussion Forum

Sakai is used to host slides, as well as to provide a forum for discussions.

Grading and Course Project

The grades will be determined as follows:

The main course requirement will be a semester-long course project, involving Apache Spark and/or Deep Learning. See the introduction and course project slides on Sakai for further details. Additionally, there will be graded in-class quizzes. These will be announced in advance. Graded homework assignments will be announced on Sakai. Make sure to enable e-mail notifications.

Policies:

References

Since we are focusing on the latest research and technology, this course does not strictly follow any designated coursebook. However, the following (optional) books may be useful.

Contact

For problems or questions about this site, please contact Gerard de Melo. Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers web sites to the instructor or to accessibility@rutgers.edu, or complete the Report Accessibility Barrier / Provide Feedback form.