CEF Data Marketplace multilingual benchmark for the evaluation of cleaning and clustering tools
CEF-DM Multilingual Benchmark
Five parallel corpora (En-Cs, En-De, En-It, En-Lv, De-It) manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether (i) it is clean - i.e. the translation is correct and fully equivalent to its source text, and (ii) it belongs to the Legal domain. The resulting gold standards were used to evaluate the Cleaning and Clustering services offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following: