Lecture 3. POS tagging. Key word and phrase extraction
Материал из Wiki - Факультет компьютерных наук
Версия от 00:46, 24 августа 2015; Polidson (обсуждение | вклад)
Содержание
- 1 Part of speech (POS)
- 2 POS ambiguation
- 3 POS taggers
- 4 Exercise 3.1 Genre comparison
- 5 Key word and phrase extraction
- 6 Supervised methods for key word and phrase extraction
- 7 Unsupervised methods for key word and phrase extraction from a single text
- 8 Bigram association measures
- 9 TextRank: using graph centrality measures for key word and phrase extraction (1) [Mihalcea, Tarau, 2004]
- 10 Unsupervised methods for key word and phrase selection from a text in a collection
- 11 Variants of TF and IDF weights
- 12 TF-IDF in NLTK
- 13 TF-IDF alternatives
- 14 Using TF-IDF to measure text similarity
Part of speech (POS)
Part of speech [Manning, Shuetze, 1999]
Words of a language are grouped into classes which show similar syntactic behavior. These word classes are called parts of speech (POS). Three important parts of speech are noun, verb, and adjective. The major types of morphological process are in ection, derivation, and compounding.
There are around 9 POS according to different schools:
- Nouns (NN, NP), pronouns (PN, PRP), adjectives (JJ): number, gender, case
- Adjective (JJ): comparative, superlative, short form
- Verbs (VB): subject number, subject person, tense, aspect, modality, participles, voice
- Adverbs (RB), prepositions (IN), conjunctions (, CS), articles (AT)
and particles (RP): nothing