CEF Data Marketplace multilingual benchmark for the evaluation of cleaning and clustering tools
CEF-DM Multilingual Benchmark
Five parallel corpora (En-Cs, En-De, En-It, En-Lv, De-It) manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether (i) it is clean - i.e. the translation is correct and fully equivalent to its source text, and (ii) it belongs to the Legal domain. The resulting gold standards were used to evaluate the Cleaning and Clustering services offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following:
- Bilingual corpus from the Publications Office of the EU on the medical domain v.2 (EN-NL)
- Bilingual corpus from the Publications Office of the EU on the medical domain (EN-NL)
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-NL).
- Bilingual corpus from the European Vaccination Portal (IT-EN)
People who downloaded this resource also downloaded the following: