magyarlanc: a toolkit for linguistic processing of Hungarian

magyarlanc

The toolkit called magyarlanc aims at the basic linguistic processing of Hungarian texts. The toolkit consists of only JAVA modules (there are no wrappers for other programming languages), which guarantees its platform independency and its ability to be integrated into bigger systems (e.g. web servers).

The modules of magyarlanc 3.0 are:

Sentence splitter
Tokenizer
POS tagger and lemmatizer
A modified version of the purePOS tagger
The morphological parser is a code based on the finite state automata written by György Gyepesi, which was built on the resource morphdb.hu.
The result of the morphological parsing (KR code) is converted to the Universal Morphology format.
The model was trained on the Szeged Treebank, converted to Universal Morphology.
Stopword filtering
Dependency parser (a version of the Bohnet parser adapted to Hungarian)
Constituency parser (a version of the Berkeley parser adapted to Hungarian)
magyarlanc 3.0 runs under Java 8. The toolkit has full compatibility with previous versions, i.e. the API has not changed. There is no need for external resources: the downloaded jar file can be used as it is.


Languages: Hungarian (hu)