The dataset is a 274K-token Polish-English parallel resource in XLIFF format created on the basis of "Diagnosis-Related Groups in Europe" publication of the Polish National Health Fund. It was converted into a 5208-TUs Polish-English parallel resource in TMX format
Creation mode details: The dataset was provided as one xlf file. It was converted into a TMX (and invalid XML characters were replaced/removed). As a post-processing task several filters were applied to discard/annotate alignments that might be incorrect or of limited use for training MT systems.
Creation mode: Automatic
Resource Creation
Funding Project
European Language Resource Coordination LOT3 (ELRC Data - Tools and Resources for CEF Automated Translation-LOT3 (SMART 2015/1091-30-CE-0816766/00-92))