Lecture 6. Synonyms and near-synonyms detection — различия между версиями

Версия 03:23, 3 сентября 2015

Содержание

1 Examples
2 Approaches to synonyms and near-synonyms detection

Examples

Synonyms: Netherlands and Holland, buy and purchase
Near synonyms: pants, trousers and slacks, mistake and error

Approaches to synonyms and near-synonyms detection

Thesaurus-based approach
Distributional semantics
Context-based approach
word2vec
Web search-based approach

Synonyms in WordNet

Given a word, look for synonyms in every synset.

WordNet NLTK interface

In[1]: for i,j in enumerate(wn.synsets('error')):

In[2]: print "Meaning",i, "NLTK ID:", j.name()

In[3]: print "Definition:",j.definition()

In[4]: print "Synonyms:", ", ".join(j.lemma names())

Wordnet Web interface: [1]

Distributional semantics

Exercise 6.1

Calculate PPMI for Table 1.

Exercise 6.2

Input: def.txt or your own text

Output 1: term-context matrix

Output 2: term-term similarity matrix (use cosine similarity)

Output 3: 2D visualization by means of LSA

Hint: use cfd = nltk.ConditionalFreqDist((term, context) for ...) for computing conditional frequency dictionary

Hint: use R for SVD and visualization

word2vec [Mikolov, Chen, Corrado, Dean, 2013]

Very complex machine learning (deep learning) applied to term-context matrices.

There are two regimes:

CBOW predicts the current word based on the context
Skip-gram predicts surrounding words given the current word

word2vec project page: [2] demo: [3]

Example: vec(Madrid) - vec(Spain) + vec(France) = vec(Paris)

@@ Строка 31: / Строка 31: @@
 === Distributional semantics ===
+[[Файл:L6p1.jpg|500px|слева]]
+<br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br>
+ <br> <br> <br> <br> <br> <br>
+==== Exercise 6.1 ====
+Calculate PPMI for Table 1.
+==== Exercise 6.2 ====
+Input: def.txt or your own text
+Output 1: term-context matrix
+Output 2: term-term similarity matrix (use cosine similarity)
+Output 3: 2D visualization by means of LSA
+Hint: use cfd = nltk.ConditionalFreqDist((term, context) for ...) for computing conditional frequency dictionary
+Hint: use R for SVD and visualization
 === word2vec [Mikolov, Chen, Corrado, Dean, 2013] ===

Lecture 6. Synonyms and near-synonyms detection — различия между версиями

Версия 03:23, 3 сентября 2015

Содержание

Examples

Approaches to synonyms and near-synonyms detection

Synonyms in WordNet

WordNet NLTK interface

Distributional semantics

Exercise 6.1

Exercise 6.2

word2vec [Mikolov, Chen, Corrado, Dean, 2013]

Context-based approach (1) [Lin, 1998]

Web or corpus search approach

Навигация

Персональные инструменты

Пространства имён

Варианты

Просмотры

Действия

Поиск

Навигация

Инструменты