Lecture 1. Introduction — различия между версиями

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск
(NLP techniques)
 
(не показаны 3 промежуточные версии ещё одного участника)
Строка 1: Строка 1:
''' Lecture 1. Introduction '''
+
Ekaterina Chernyak, Dmitry Ilvovsky
  
 
== Brief history of NLP ==
 
== Brief history of NLP ==
Строка 49: Строка 49:
  
 
== Main problems ==
 
== Main problems ==
 +
 +
* Ambiguity
 +
** Lexical ambiguity:
 +
** Time flies like an arrow; fruit flies like a banana.
 +
* Syntactic ambiguity
 +
** Police help dog bite victim.
 +
** Wanted: a nurse for a baby about twenty years old.
 +
* Neologism: unfriend, retweet, instagram
 +
* Different spelling: NY, New York City, New-York
 +
* Non-standard language: HIIII, how are u? miss u SOOOO much:((((
  
 
== About this course ==
 
== About this course ==
 +
 +
It covers the following topics:
 +
* Tokenization
 +
* POS tagging
 +
* Key word and phrase extraction
 +
* Parsing
 +
* Synonyms detection
 +
* Language sources
 +
* Topic modeling
 +
* Text visualisation
 +
 +
You can try to use Python and R for various tasks.

Текущая версия на 22:48, 5 ноября 2016

Ekaterina Chernyak, Dmitry Ilvovsky

Brief history of NLP

  • January 7, 1954 — Georgetown experiment. Russian to English machine translation;
  • 1957 — Noam Chomsky introduced "universal grammar";
  • since 1961 — Brown Corpus;
  • the late 1960's — ELIZA, a simulation of a psychotherapist;
  • 1975 — Vector Space Model by Salton;
  • up to the early 1980's — rule based approaches;
  • after the early 1980's — machine learning, corpus linguistics;
  • 1998 — Language Model by Ponte and Croft;
  • since 1999 — topic modeling (LSI, pLSI, LDA, etc);
  • 1999 — "Foundations of Statistical Natural Language Processing" by Manning and Shuetze;
  • 2009 — "Natural Language Processing with Python" by Bird, Klein, and Loper.

Major tasks of NLP

  • Machine Translation
  • Text classification
    • Sentiment analysis
    • Spam filtering
    • Classification by topic or by genre
  • Text clustering
  • Named entity recognition
  • Question answering
  • Automatic summarization
  • Natural language generation
  • Speech recognition
  • Spell checking
  • User study design and evaluation


NLP techniques

  • The level of characters:
    • Word segmentation
    • Sentence breaking
  • The level of words — morphology:
    • Part of speech (POS) tagging
    • Word sense disambiguation
  • The level of sentences — syntax:
    • Parsing
  • The level of senses — semantics:
    • Coreference resolution
    • Discourse analysis
    • Semantic role labeling
    • Synonymy detection

Main problems

  • Ambiguity
    • Lexical ambiguity:
    • Time flies like an arrow; fruit flies like a banana.
  • Syntactic ambiguity
    • Police help dog bite victim.
    • Wanted: a nurse for a baby about twenty years old.
  • Neologism: unfriend, retweet, instagram
  • Different spelling: NY, New York City, New-York
  • Non-standard language: HIIII, how are u? miss u SOOOO much:((((

About this course

It covers the following topics:

  • Tokenization
  • POS tagging
  • Key word and phrase extraction
  • Parsing
  • Synonyms detection
  • Language sources
  • Topic modeling
  • Text visualisation

You can try to use Python and R for various tasks.