Instructor
Gerard de Melo
Rutgers University, NJ, USA
Gerard de Melo
Rutgers University, NJ, USA
Monday, January 28, 2019
8:30 AM – 12:30 pm
(with 10:30 – 11:00 am break)
Hilton Hawaiian Village
Honolulu, Hawaii, USA
While word embeddings such as those produced by word2vec and GloVe are widely known as a simple means of working with textual data, there has recently been substantial progress on improved methods that yield better embeddings. In particular, one may wish to induce neural vector representations not just of individual words but also of longer units of language, including 1) multi-word phrases, 2) entire sentences, or even 3) complete documents.
Algorithms for these settings can draw on large corpora, but may also exploit supervision from other kinds of data such as document labels, lexical resources, or natural language inference datasets. Sentence embeddings are of particular interest, because they may need to properly account for quite subtle distinctions between overall rather similar sentences. Moreover, new techniques have been developed to develop embeddings for multilingual and cross-lingual settings.
This tutorial will thus provide an overview of recent state-of-the-art methods that go beyond word2vec and better model the semantics of longer units such as sentences and documents, both monolingually and cross-lingually. The tutorial will start with a brief refresher of word2vec and and how it relates to classic methods for distributional semantics, so no prior knowledge is required.
Section | Topics | ||
---|---|---|---|
1 | Introduction, Words |
Motivation History, Distributional vs. Distributed Semantics Refresher: word2vec Coping with rare words | |
2 | Phrase Vectors |
Phrase Detection in word2vec External Supervision | |
3 | Sentence Vectors |
word2vec-inspired Approaches Supervision from Various Sources Simple Aggregation | |
4 | Document Vectors |
Word Vector-based Deep IR methods | |
5 | Applications, Conclusion |
Applications, e.g. Matching, IR, Unsupervised NMT Sentiment Embeddings, Visual Font Embeddings, Graph Embeddings Common Sense |
For problems or questions about this site, please contact Gerard de Melo.