Lecture 1. Introduction

Материал из Wiki - Факультет компьютерных наук
Перейти к: навигация, поиск

Ekaterina Chernyak, Dmitry Ilvovsky

Brief history of NLP

  • January 7, 1954 — Georgetown experiment. Russian to English machine translation;
  • 1957 — Noam Chomsky introduced "universal grammar";
  • since 1961 — Brown Corpus;
  • the late 1960's — ELIZA, a simulation of a psychotherapist;
  • 1975 — Vector Space Model by Salton;
  • up to the early 1980's — rule based approaches;
  • after the early 1980's — machine learning, corpus linguistics;
  • 1998 — Language Model by Ponte and Croft;
  • since 1999 — topic modeling (LSI, pLSI, LDA, etc);
  • 1999 — "Foundations of Statistical Natural Language Processing" by Manning and Shuetze;
  • 2009 — "Natural Language Processing with Python" by Bird, Klein, and Loper.

Major tasks of NLP

  • Machine Translation
  • Text classification
    • Sentiment analysis
    • Spam filtering
    • Classification by topic or by genre
  • Text clustering
  • Named entity recognition
  • Question answering
  • Automatic summarization
  • Natural language generation
  • Speech recognition
  • Spell checking
  • User study design and evaluation


NLP techniques

  • The level of characters:
    • Word segmentation
    • Sentence breaking
  • The level of words — morphology:
    • Part of speech (POS) tagging
    • Word sense disambiguation
  • The level of sentences — syntax:
    • Parsing
  • The level of senses — semantics:
    • Coreference resolution
    • Discourse analysis
    • Semantic role labeling
    • Synonymy detection

Main problems

  • Ambiguity
    • Lexical ambiguity:
    • Time flies like an arrow; fruit flies like a banana.
  • Syntactic ambiguity
    • Police help dog bite victim.
    • Wanted: a nurse for a baby about twenty years old.
  • Neologism: unfriend, retweet, instagram
  • Different spelling: NY, New York City, New-York
  • Non-standard language: HIIII, how are u? miss u SOOOO much:((((

About this course

It covers the following topics:

  • Tokenization
  • POS tagging
  • Key word and phrase extraction
  • Parsing
  • Synonyms detection
  • Language sources
  • Topic modeling
  • Text visualisation

You can try to use Python and R for various tasks.