CEF Data Marketplace second multilingual benchmark for the evaluation of cleaning tools 
Five parallel corpora (En-Bg, En-Da, En-El, En-Hu, En-Ro) belonging to the Legal domain and manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether it is "clean" - i.e. the translation is correct and fully equivalent to its source text, "partially clean" or "not clean". The resulting gold standards were used in the second evaluation cycle of the CEF project to evaluate the Cleaning service offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following:
- Compilation of English-Romanian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Bulgarian-Croatian parallel corpora resources used for training of NTEU Machine Translation engines.
- PRINCIPLE SDURDD Croatian-English Parallel Corpus in the legal domain
- Compilation of Czech-Estonian parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
People who downloaded this resource also downloaded the following:
- COVID-19 Parallel Global Voices dataset. Bilingual (EN-RO)
- Compilation of English-Romanian; Moldavian; Moldovan parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- Letter of rights for persons arrested and or detained (Processed)