Neural Vector Representations beyond Words: Sentence and Document Embeddings

Instructor

Gerard de Melo
Rutgers University, NJ, USA

Time

Monday, January 28, 2019
8:30 AM – 12:30 pm
(with 10:30 – 11:00 am break)

Location

Hilton Hawaiian Village
Honolulu, Hawaii, USA

Overview

While word embeddings such as those produced by word2vec and GloVe are widely known as a simple means of working with textual data, there has recently been substantial progress on improved methods that yield better embeddings. In particular, one may wish to induce neural vector representations not just of individual words but also of longer units of language, including 1) multi-word phrases, 2) entire sentences, or even 3) complete documents.

Algorithms for these settings can draw on large corpora, but may also exploit supervision from other kinds of data such as document labels, lexical resources, or natural language inference datasets. Sentence embeddings are of particular interest, because they may need to properly account for quite subtle distinctions between overall rather similar sentences. Moreover, new techniques have been developed to develop embeddings for multilingual and cross-lingual settings.

This tutorial will thus provide an overview of recent state-of-the-art methods that go beyond word2vec and better model the semantics of longer units such as sentences and documents, both monolingually and cross-lingually. The tutorial will start with a brief refresher of word2vec and and how it relates to classic methods for distributional semantics, so no prior knowledge is required.

Topics and Slides

	Section	Topics
1	Introduction, Words	Motivation History, Distributional vs. Distributed Semantics Refresher: word2vec Coping with rare words
2	Phrase Vectors	Phrase Detection in word2vec External Supervision
3	Sentence Vectors	word2vec-inspired Approaches Supervision from Various Sources Simple Aggregation
4	Document Vectors	Word Vector-based Deep IR methods
5	Applications, Conclusion	Applications, e.g. Matching, IR, Unsupervised NMT Sentiment Embeddings, Visual Font Embeddings, Graph Embeddings Common Sense

References

Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing

Contact

For problems or questions about this site, please contact Gerard de Melo.

Neural Vector Representations beyond Words:Sentence and Document Embeddings

Contact

Neural Vector Representations beyond Words:
Sentence and Document Embeddings