Lecture 5. Language sources — различия между версиями

Версия 01:30, 24 августа 2015

Word list
Dictionary: definitions for words
Thesaurus: words grouped together according to similarity of meaning
Ontology: formal naming and definitions of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse
Corpus
- Text corpus: a large and structured set of texts
- Speech corpus: a large set of speech audio files
- Web corpus: text corpus, collected from Web
Wikipedia (DBpedia)
Test datasets

Wiktionary: collaborative project to produce a free-content multilingual dictionary. It aims to describe all words of all languages

using definitions and descriptions in English. [3]

- Wiktionary as a source for automatic pronunciation extraction
- Extracting lexical semantic knowledge from Wikipedia and Wiktionary
- Using Wikipedia and Wiktionary in domain-specific information retrieval
- Wiktionary and NLP: Improving synonymy networks
FreeLing dictionaries [4]

@@ Строка 33: / Строка 33: @@
 * English-Spanish large statistical dictionary of in ectional forms
 * Exploiting web-based collective knowledge for micropost normalisation
+== Thesaurus ==
+== Ontology ==
+== Text corpus ==
+== Speech corpus ==
+== Web corpus ==