maxLength |
100 |
|
enumeration |
alignment |
Establishment of translational equivalences between structural units (words, sentences
etc.) of a text in a given language and a text with similar meaning in other language(s)
|
enumeration |
phraseAlignment |
Alignment at phrase level
|
enumeration |
sentenceAlignment |
Alignment at sentence level
|
enumeration |
wordAlignment |
Alignment at word level
|
enumeration |
webCrawling |
The use of bots that crawl the web (crawlers) in order to spot pages that match user-set
criteria and download them to create large datasets
|
enumeration |
languageIdentification |
The task/process of guessing what natural language a text or text segment is written
in.
|
enumeration |
termExtraction |
The act/process of identifying and extracting candidate terms from a domain-specific
corpus
|
enumeration |
lexiconAcquisitionFromCorpora |
The task/process of constructing lexical resources from corpora
|
enumeration |
lexiconExtractionFromLexica |
The task/process of constructing lexical resources based on the restructuring of lexical
information contained in lexica (e.g. by parsing definitions or using syntactic information
attached to lemmas)
|
enumeration |
bilingualLexiconInduction |
The task/process of inducing word translations from monolingual or comparable corpora
in two languages
|
enumeration |
spellChecking |
The task/process of checking the accuracy of spelling of a word in a text and correcting
it according to the accepted form
|
enumeration |
languageModelling |
The construction of statistical or Machine Learning language models
|
enumeration |
trainingOfLanguageModels |
The task/process of training (statistical) language models that that can estimate
the distribution of natural language as accurately as possible.
|
enumeration |
annotation |
The process/task of adding annotations (annotation types) to an item.
|
enumeration |
annotationOfDocumentStructure |
The task/process of annotating the internal structure of a document (e.g. book chapters,
sections in a journal article, title, preface etc.)
|
enumeration |
structuralAnnotation |
The task/process of segmenting a text and recognizing textual structural units (paragraphs,
sentences, words etc.)
|
enumeration |
sentenceSplitting |
The task/process of recognizing and tagging sentence boundaries in a text
|
enumeration |
paragraphSplitting |
The task/process of segmenting a text into paragraphs and marking their boundaries
|
enumeration |
tokenization |
The task/process of recognizing and tagging tokens (words, punctuation marks, digits
etc.) in a text
|
enumeration |
lemmatization |
Lemmatisation (or lemmatization) in linguistics is the process of grouping together
the inflected forms of a word so they can be analysed as a single item, identified
by the word's lemma, or dictionary form. [Wikipedia]
|
enumeration |
stemming |
The task/process of cutting off the ends of words (mainly inflectional affixes but
sometimes also derivational affixes) aiming to relate words to a base form.
|
enumeration |
poSTagging |
The task/process of marking words with the part of speech (word category, e.g. noun,
verb etc.) to which they belong
|
enumeration |
belowPoSTagging |
The annotation of words with morphological information besides the part of speech
and dependent upon it (e.g. for nouns: gender, number and case; for verbs: tense,
label number, person etc.)
|
enumeration |
wordSegmentation |
The task/process of segmenting (cutting) a word into root and affixes
|
enumeration |
annotationOfCompounds |
The task/process of marking compounds (multi-word units considered as a whole) and
their parts
|
enumeration |
annotationOfDerivationalFeatures |
The task/process of adding annotations relevant to the derivational level of analysis
(e.g. recognizing derivational affixes, tagging their meaning etc.)
|
enumeration |
chunking |
The task/process of dividing a sentence into chunks (non-overlapping text segments
consisting of a head and preceding function words and/or modifiers)
|
enumeration |
parsing |
The task/process of recognizing and marking the syntactic structure of a text or text
segment
|
enumeration |
constituencyParsing |
The task/process of identifying and marking constituents (phrases, governed by a head
and including function words and/or modifiers ) in a text or text segment
|
enumeration |
dependencyConversion |
The task/process of converting constituency structures to dependency trees
|
enumeration |
dependencyParsing |
The task/process of identifying and marking the grammatical structure of a sentence,
establishing relationships between "head" words and words which modify those heads
|
enumeration |
namedEntityRecognition |
A subtask of information extraction that seeks to locate and classify named entities
in text into pre-defined categories such as the names of persons, organizations, locations,
expressions of times, quantities, monetary values, percentages, etc.
|
enumeration |
semanticAnnotation |
The task/process of marking text units with semantic types (e.g. semantic classes,
emotions etc.)
|
enumeration |
semanticClassLabelling |
The task/process of classifying words in a text according to a set of semantic classes
(types).
|
enumeration |
semanticRelationLabelling |
The task/process of attaching tags indicating the semantic relation that holds between
units of a text.
|
enumeration |
semanticRoleLabelling |
The task/process of attaching labels that correspond to the roles that the arguments
of a predicate take in an event
|
enumeration |
frameSemanticParsing |
The task/process of recognising and labelling in a text predicate argument structures
and the semantic roles of the constituents, in accordance to the frame semantics theory.
|
enumeration |
coReferenceAnnotation |
The task/process of attaching tags to a text unit and linking it to other text units
that refer to the same entity.
|
enumeration |
formatConversion |
The task/process of converting (changing) the format of a resource into another (e.g.
PDF to TXT or XML)
|
enumeration |
evaluation |
The task/process of assessing the quality of a resource, e.g. based on the contents
(for a dataset) or performance (for a tool or service)
|
enumeration |
textCategorization |
The process/task of assigning documents into classes or categories.
|
enumeration |
topicDetection |
The task/process of identifying the topic of a text or dataset (e.g. by clustering
keywords or using topic models)
|
enumeration |
validation |
The task/process of confirming that a system/data resource meets the specifications
and fulfills its intended purpose
|
enumeration |
corpusViewing |
The task/process of viewing the contents of a corpus as performed by human beings
|
enumeration |
machineTranslation |
The task/process of translating text or speech from one language to another
|
enumeration |
other |
|