CEF Data Marketplace second multilingual benchmark for the evaluation of cleaning tools 
Five parallel corpora (En-Bg, En-Da, En-El, En-Hu, En-Ro) belonging to the Legal domain and manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether it is "clean" - i.e. the translation is correct and fully equivalent to its source text, "partially clean" or "not clean". The resulting gold standards were used in the second evaluation cycle of the CEF project to evaluate the Cleaning service offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following:
- Compilation of English-Romanian; Moldavian; Moldovan parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
- Compilation of Bulgarian-English parallel corpora resources used for training of NTEU Machine Translation engines.
- Anonymised ParaCrawl release 9 English-Romanian; Moldavian; Moldovan
- Compilation of Croatian-Dutch parallel corpora resources used for training of NTEU Machine Translation engines.
People who downloaded this resource also downloaded the following:
- COVID-19 Parallel Global Voices dataset. Bilingual (EN-RO)
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- Compilation of English-Romanian; Moldavian; Moldovan parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.