Croatian Lemmatization Server

Croatian Lemmatization Server is unique web-service for retrieving lexical entries from Croatian Morphological Lexicon and its usage in the computational linguistic processes of:
1) generation of all Croatian word-forms (all cases in singular and plural for nouns, all persons and all tenses for verbs, all cases of all genders for adjectives etc.)
2) analysis of all Croatian word-forms i.e. converting them to a base form ? lemma. For now, the lemmatization is being done on unigram level without any correspondence to left of right context. In this way for each token all possible lemmas that it could belong to are being retrieved.

Since Croatian is highly inflective language, web-pages retrieval using only base word-form (lemma) or using jocker characters (e.g. glav* for glava) gives inadequate results. Croatian Lemmatization Server enables automatical generation of queries according to all word-forms and only all word-forms of Croatian words thus serving as a starting point for precise and thorough retrieval of Croatian web-pages with Google.


Languages: Croatian (hr)