Lecture 5. Language sources — различия между версиями
Материал из Wiki - Факультет компьютерных наук
Polidson (обсуждение | вклад) (Новая страница: «==Types of language sources == * Word list * Dictionary: definitions for words * Thesaurus: words grouped together according to similarity of meaning * Ontology:…») |
Polidson (обсуждение | вклад) |
||
Строка 11: | Строка 11: | ||
* Wikipedia (DBpedia) | * Wikipedia (DBpedia) | ||
* Test datasets | * Test datasets | ||
+ | |||
+ | === Word lists === | ||
+ | * List of stopwords (in NLTK, too) | ||
+ | * Moby words[http://icon.shef.ac.uk/Moby/mwords.html|http://icon.shef.ac.uk/Moby/mwords.html] | ||
+ | * List of Wikipedia articles | ||
+ | * Lists of words for language learners | ||
+ | * Lists of German compounds | ||
+ | * Lists of common spam words [http://emailmarketing.comm100.com/|http://emailmarketing.comm100.com/], email-marketing-ebook/spam-words.aspx. |
Версия 01:25, 24 августа 2015
Types of language sources
- Word list
- Dictionary: definitions for words
- Thesaurus: words grouped together according to similarity of meaning
- Ontology: formal naming and definitions of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse
- Corpus
- Text corpus: a large and structured set of texts
- Speech corpus: a large set of speech audio files
- Web corpus: text corpus, collected from Web
- Wikipedia (DBpedia)
- Test datasets