CURLICAT Romanian corpus

The Romanian corpus contains 26,477 files, which represent our contribution to the CURLICAT project. It contains texts from 7 domains: science, politics, culture, economy, health, education, nature. Each file has multiple levels of annotation: tokenized, lemmatized, morphologically annotated, dependency parsed, named entities, nominal phrases, IATE terms and automatic domain-specific terms were identified as well. All processing tools are available within the RELATE platform.