Polish Ministry of Foreign Affairs Regional Dataset (Processed)
A collection of Polish-English whitepapers published by the Polish Ministry of Foreign Affairs, including "Eastern Partnership" (10K words in 492 segments) and "Poland's 10 years in the EU" (129K words in 3146 segments). The translations were manually aligned at the sentence level and encoded in the XLiFF format. It was converted into a 3653-TUs English-Polish resource in TMX format.
Creation mode details: The dataset was provided as a collection of two xlf files. They were merged into a TMX file. As a post-processing task several filters were applied to discard/annotate alignments that might be incorrect or of limited use for training MT systems.
Creation mode: Automatic
Resource Creation
Funding Project
European Language Resource Coordination LOT3 (ELRC Data - Tools and Resources for CEF Automated Translation-LOT3 (SMART 2015/1091-30-CE-0816766/00-92))