magyarlanc: a toolkit for linguistic processing of Hungarian


Zsibrita, János; Vincze, Veronika; Farkas, Richárd 2013: magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of RANLP 2013, pp. 763-771

The toolkit called magyarlanc aims at the basic linguistic processing of Hungarian texts. The toolkit consists of only JAVA modules (there are no wrappers for other programming languages), which guarantees its platform independency and its ability to be integrated into bigger systems (e.g. web servers).

The modules of magyarlanc 3.0 are:
- Sentence splitter
- Tokenizer
- POS tagger and lemmatizer
- A modified version of the purePOS tagger:
-The morphological parser is a code based on the finite state automata written by György Gyepesi, which was built on the resource
-The result of the morphological parsing (KR code) is converted to the Universal Morphology format.
-The model was trained on the Szeged Treebank, converted to Universal Morphology.
- Stopword filtering
- Dependency parser (a version of the Bohnet parser adapted to Hungarian)
- Constituency parser (a version of the Berkeley parser adapted to Hungarian)

magyarlanc 3.0 runs under Java 8. The toolkit has full compatibility with previous versions, i.e. the API has not changed. There is no need for external resources: the downloaded jar file can be used as it is.

Languages: Hungarian (hu)