Croatian Lemmatization Server
Croatian Lemmatization Server is unique web-service for retrieving lexical entries from Croatian Morphological Lexicon and its usage in the computational linguistic processes of:
1) generation of all Croatian word-forms (all cases in singular and plural for nouns, all persons and all tenses for verbs, all cases of all genders for adjectives etc.)
2) analysis of all Croatian word-forms i.e. converting them to a base form ? lemma. For now, the lemmatization is being done on unigram level without any correspondence to left of right context. In this way for each token all possible lemmas that it could belong to are being retrieved.
Since Croatian is highly inflective language, web-pages retrieval using only base word-form (lemma) or using jocker characters (e.g. glav* for glava) gives inadequate results. Croatian Lemmatization Server enables automatical generation of queries according to all word-forms and only all word-forms of Croatian words thus serving as a starting point for precise and thorough retrieval of Croatian web-pages with Google.
Languages: Croatian (hr)
People who looked at this resource also viewed the following:
- Croatian Language Web Services
- Language identifier for Bosnian, Croatian and Serbian
- Bilingual hr-en parallel corpus from the National and University Library in Zagreb website (Processed)
- Croatian-English corpus with statistical reports and studies from the Croatian Bureau of Statistics website (Processed)