CEF Data Marketplace multilingual benchmark for the evaluation of cleaning and clustering tools

CEF-DM Multilingual Benchmark

Five parallel corpora (En-Cs, En-De, En-It, En-Lv, De-It) manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether (i) it is clean - i.e. the translation is correct and fully equivalent to its source text, and (ii) it belongs to the Legal domain. The resulting gold standards were used to evaluate the Cleaning and Clustering services offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following:
- Bilingual corpus from the Publications Office of the EU on the medical domain v.2 (EN-DA)
- English-Finnish parallel corpus from National Audit Office of Finland
- Compilation of Greek-Irish parallel corpora resources used for training of NTEU Machine Translation engines.
- CEF Data Marketplace second multilingual benchmark for the evaluation of cleaning tools
People who downloaded this resource also downloaded the following: