magyarlanc: a toolkit for linguistic processing of Hungarian 
magyarlanc

Zsibrita, János; Vincze, Veronika; Farkas, Richárd 2013: magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of RANLP 2013, pp. 763-771
The toolkit called magyarlanc aims at the basic linguistic processing of Hungarian texts. The toolkit consists of only JAVA modules (there are no wrappers for other programming languages), which guarantees its platform independency and its ability to be integrated into bigger systems (e.g. web servers).
The modules of magyarlanc 3.0 are:
- Sentence splitter
- Tokenizer
- POS tagger and lemmatizer
- A modified version of the purePOS tagger:
-The morphological parser is a code based on the finite state automata written by György Gyepesi, which was built on the resource morphdb.hu.
-The result of the morphological parsing (KR code) is converted to the Universal Morphology format.
-The model was trained on the Szeged Treebank, converted to Universal Morphology.
- Stopword filtering
- Dependency parser (a version of the Bohnet parser adapted to Hungarian)
- Constituency parser (a version of the Berkeley parser adapted to Hungarian)
magyarlanc 3.0 runs under Java 8. The toolkit has full compatibility with previous versions, i.e. the API has not changed. There is no need for external resources: the downloaded jar file can be used as it is.
People who looked at this resource also viewed the following:
- Bulgarian-English corpus of legislation from the Republic of Bulgaria Ministry of Energy website
- Bilingual extracts from Malta International Airport Newsletter (Processed)
- Statistics Finland's Finnish to Swedish translation memory (Processed)
- Bilingual corpus from the Publications Office of the EU on the medical domain (EN-ES)