TreeTagger - a part-of-speech tagger for many languages


Helmut Schmid (1995): Improvements in Part-of-Speech Tagging with an Application to German. Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.

The TreeTagger is a tool for annotating text with part-of-speech and lemma information. It was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Danish, Dutch, Spanish, Bulgarian, Russian, Portuguese, Galician, Greek, Chinese, Swahili, Slovak, Slovenian, Latin, Estonian, Polish, Romanian, Czech, Coptic and old French texts and is adaptable to other languages if a lexicon and a manually tagged training corpus are available.

Languages: Romanian; Moldavian; Moldovan (ro), Slovak (sk), Polish (pl), Portuguese (pt), Modern Greek (1453-) (el), Italian (it), Slovenian (sl), Spanish; Castilian (es), Czech (cs), Danish (da), Dutch; Flemish (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Bulgarian (bg),