CEF Data Marketplace second multilingual benchmark for the evaluation of cleaning tools
Five parallel corpora (En-Bg, En-Da, En-El, En-Hu, En-Ro) belonging to the Legal domain and manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether it is "clean" - i.e. the translation is correct and fully equivalent to its source text, "partially clean" or "not clean". The resulting gold standards were used in the second evaluation cycle of the CEF project to evaluate the Cleaning service offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following:
- Compilation of Bulgarian-Greek parallel corpora resources used for training of NTEU Machine Translation engines.
- Monolingual Icelandic corpus from the official journal Stjórnartíðindi
- Compilation of Bulgarian-Polish parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of English-Lithuanian parallel corpora resources used for training of NTEU Machine Translation engines.
People who downloaded this resource also downloaded the following:
- Competition Economics for Judges (Processed)
- Competition Economics for Judges
- Bilingual resource with Bulgarian strategic documents in the field of telecommunications and broadband (Bulgarian - English) (Processed)
- Compilation of Bulgarian-English parallel corpora resources used for training of NTEU Machine Translation engines.