CURLICAT Hungarian corpus

This is the Hungarian language subcorpus of the collection of curated and analysed language data compiled by the CURLICAT project. It consists of over 2.75 million sentences, 61.2 m tokens linguistically analized, and enriched with IATE and domain specific terminology extracted from the subcorpus. The structure of the corpus as regards sources shows a predominance of longer texts in book publications covering the following domains: culture, economy, science and social issues. For more information see the delivery reports D1.1 and D2 of the curlicat website (

