Lecture 6. Synonyms and near-synonyms detection — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
(Новая страница: «== Examples == * '''Synonyms''': Netherlands and Holland, buy and purchase * '''Near synonyms''': pants, trousers and slacks, mistake and error == Approaches to…»)
(нет различий)

Версия 01:49, 24 августа 2015

Examples

  • Synonyms: Netherlands and Holland, buy and purchase
  • Near synonyms: pants, trousers and slacks, mistake and error

Approaches to synonyms and near-synonyms detection

  • Thesaurus-based approach
  • Distributional semantics
  • Context-based approach
  • word2vec
  • Web search-based approach

Synonyms in WordNet

Given a word, look for synonyms in every synset.

WordNet NLTK interface

In[1]: for i,j in enumerate(wn.synsets('error')):

In[2]: print "Meaning",i, "NLTK ID:", j.name()

In[3]: print "Definition:",j.definition()

In[4]: print "Synonyms:", ", ".join(j.lemma names())

Wordnet Web interface: [1]

Distributional semantics

word2vec [Mikolov, Chen, Corrado, Dean, 2013]

Very complex machine learning (deep learning) applied to term-context matrices.

There are two regimes:

  • CBOW predicts the current word based on the context
  • Skip-gram predicts surrounding words given the current word

word2vec project page: [2] demo: [3]

Example: vec(Madrid) - vec(Spain) + vec(France) = vec(Paris)

Context-based approach (1) [Lin, 1998]

Web or corpus search approach