Translation Memory Cleaner – ELRC-SHARE

42 Last view: 2024-05-30

3 Last update: 2021-12-21

Translation Memory Cleaner

TM-Cleaner

https://github.com/hlt-mt/TM-Cleaner

Translation Memory (TM) cleaning service offered within the CEF Data Marketplace platform, aimed to remove wrong or dirty translation units (TUs) from the TMs uploaded to the Marketplace. The TM-cleaner is based on the sentence embeddings provided by the LASER suite. Given a TU, the sentence embeddings are extracted for both the source and target sentences, each with respect to their own language. Then the cosyne similarity between the source embeddings and the target embeddings is computed: if it reaches a given threshold then the TU is labeled as clean, otherwise as dirty.
It is worth noticing that LASER is able to manage at least 93 languages, giving the tool the ability to support multilinguality.

Distribution

Availability: Under Review

Licences

Distribution Details

Download location : https://github.com/h...

Distribution Medium: Data Downloadable

Contact Persons

Roldano Cattoni

toolService

Tool (Other)

Language Dependent

Evaluation

Evaluated: True

Evaluation level: Diagnostic

Evaluation criteria: Extrinsic, Intrinsic

Evaluation measure: Automatic, Human

Resource Creation

Resource Creator

Fondazione Bruno Kessler

Funding Project

CEF Data Marketplace (CEF-Data-Marketplace)

URL: https://ec.europa.eu...

Funding Type: Eu Funds

Metadata

Created: 29/10/2020

Last Updated: 29/10/2020

Metadata Language: English (en)

Metadata Creator

Luisa Bentivogli

Version

Version: 1.0

People who looked at this resource also viewed the following:

Resources from the same project

Resources from the same creators