CEF Data Marketplace second multilingual benchmark for the evaluation of cleaning tools
Five parallel corpora (En-Bg, En-Da, En-El, En-Hu, En-Ro) belonging to the Legal domain and manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether it is "clean" - i.e. the translation is correct and fully equivalent to its source text, "partially clean" or "not clean". The resulting gold standards were used in the second evaluation cycle of the CEF project to evaluate the Cleaning service offered by the CEF Data Marketplace platform.
People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: