Lecture 1. Introduction — различия между версиями
Материал из Wiki - Факультет компьютерных наук
Polidson (обсуждение | вклад) (→NLP techniques) |
Katya (обсуждение | вклад) |
||
(не показаны 3 промежуточные версии ещё одного участника) | |||
Строка 1: | Строка 1: | ||
− | + | Ekaterina Chernyak, Dmitry Ilvovsky | |
== Brief history of NLP == | == Brief history of NLP == | ||
Строка 49: | Строка 49: | ||
== Main problems == | == Main problems == | ||
+ | |||
+ | * Ambiguity | ||
+ | ** Lexical ambiguity: | ||
+ | ** Time flies like an arrow; fruit flies like a banana. | ||
+ | * Syntactic ambiguity | ||
+ | ** Police help dog bite victim. | ||
+ | ** Wanted: a nurse for a baby about twenty years old. | ||
+ | * Neologism: unfriend, retweet, instagram | ||
+ | * Different spelling: NY, New York City, New-York | ||
+ | * Non-standard language: HIIII, how are u? miss u SOOOO much:(((( | ||
== About this course == | == About this course == | ||
+ | |||
+ | It covers the following topics: | ||
+ | * Tokenization | ||
+ | * POS tagging | ||
+ | * Key word and phrase extraction | ||
+ | * Parsing | ||
+ | * Synonyms detection | ||
+ | * Language sources | ||
+ | * Topic modeling | ||
+ | * Text visualisation | ||
+ | |||
+ | You can try to use Python and R for various tasks. |
Текущая версия на 22:48, 5 ноября 2016
Ekaterina Chernyak, Dmitry Ilvovsky
Содержание
Brief history of NLP
- January 7, 1954 — Georgetown experiment. Russian to English machine translation;
- 1957 — Noam Chomsky introduced "universal grammar";
- since 1961 — Brown Corpus;
- the late 1960's — ELIZA, a simulation of a psychotherapist;
- 1975 — Vector Space Model by Salton;
- up to the early 1980's — rule based approaches;
- after the early 1980's — machine learning, corpus linguistics;
- 1998 — Language Model by Ponte and Croft;
- since 1999 — topic modeling (LSI, pLSI, LDA, etc);
- 1999 — "Foundations of Statistical Natural Language Processing" by Manning and Shuetze;
- 2009 — "Natural Language Processing with Python" by Bird, Klein, and Loper.
Major tasks of NLP
- Machine Translation
- Text classification
- Sentiment analysis
- Spam filtering
- Classification by topic or by genre
- Text clustering
- Named entity recognition
- Question answering
- Automatic summarization
- Natural language generation
- Speech recognition
- Spell checking
- User study design and evaluation
NLP techniques
- The level of characters:
- Word segmentation
- Sentence breaking
- The level of words — morphology:
- Part of speech (POS) tagging
- Word sense disambiguation
- The level of sentences — syntax:
- Parsing
- The level of senses — semantics:
- Coreference resolution
- Discourse analysis
- Semantic role labeling
- Synonymy detection
Main problems
- Ambiguity
- Lexical ambiguity:
- Time flies like an arrow; fruit flies like a banana.
- Syntactic ambiguity
- Police help dog bite victim.
- Wanted: a nurse for a baby about twenty years old.
- Neologism: unfriend, retweet, instagram
- Different spelling: NY, New York City, New-York
- Non-standard language: HIIII, how are u? miss u SOOOO much:((((
About this course
It covers the following topics:
- Tokenization
- POS tagging
- Key word and phrase extraction
- Parsing
- Synonyms detection
- Language sources
- Topic modeling
- Text visualisation
You can try to use Python and R for various tasks.