Glossary
A collection of technical terms commonly used in corpus research
(Kopie 1)
the process of annotating corpus data with (interpretative) linguistic information
the number of occurrences of a linguistic feature in a corpus
a list of occurrences of a word/expression in a corpus, usually in a specific context
Keyword/entry in a dictionary; it is generally assumed that a lemma encompasses all forms of a word paradigm, with further distinctions according to PoS (part of speech).
the central word or search term in a collocation or concordance
a corpus consisting of source texts and their translation corpus
a process that analyses the sentences in a corpus into their constituent parts, also known as treebanking or bracketing
Part of speech or morphosyntactic category
an alternative term for annotation such as PoS tagging and semantic tagging
the actual occurrence of a particular word form, as opposed to its type
a word form, as opposed to its individual occurrences (tokens) in a text
Sources
Glossary adapted from:
Weisser, Martin (2016). Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis. Hoboken: John Wiley & Sons.