Domain Adaptation Filter for parallel corpora – ELRC-SHARE

45 Last view: 2024-07-04

4 Last update: 2020-04-13

Domain Adaptation Filter for parallel corpora

https://github.com/paracrawl/Domain_Adaptation

Filters parallel corpora by language model similarity to in-domain text. These tools in this sub-project of ParaCrawl are designed to extract domain-specific parallel corpora from a large body of unknown domain corpora using a monolingual corpus as a filtering and scoring mechanism. These tools do not analyze the quality of the translations in the parallel corpora, that is a different task, which is addressed by a number of sister technologies within the ParaCrawl project. This approach operates only on one side of a parallel corpus to determine whether it is in a similar domain to a provided monolingual corpus.

Distribution

Availability: Under Review

Licences

Distribution Details

Download location : https://github.com/p...

Distribution Medium: Data Downloadable

IPR Holders

University of Edinburgh

Omniscien Technologies

Contact Person

toolService

Tool (Other)

Language Independent

Resource Creation

Funding Project

Broader Web-Scale Provision of Parallel Corpora for European Languages (Paracrawl)

URL: http://paracrawl.eu/

Funding Type: Eu Funds

Funder: European Commission

Metadata

Created: 23/05/2019

Last Updated: 23/05/2019

Metadata Language: English (en)

People who looked at this resource also viewed the following:

Resources from the same project