CEF Data Marketplace multilingual benchmark for the evaluation of cleaning and clustering tools

CEF-DM Multilingual Benchmark

Five parallel corpora (En-Cs, En-De, En-It, En-Lv, De-It) manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether (i) it is clean - i.e. the translation is correct and fully equivalent to its source text, and (ii) it belongs to the Legal domain. The resulting gold standards were used to evaluate the Cleaning and Clustering services offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following:
- COVID-19 EU presscorner v1 dataset. Multilingual (CEF languages)
- English-Finnish parallel corpus from National Audit Office of Finland
- CEF Data Marketplace second multilingual benchmark for the evaluation of cleaning tools
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
People who downloaded this resource also downloaded the following: