hrenWaC 2.0 Croatian-English Parallel Corpus
View resource name in all available languages
hrenWaC 2.0 Hrvatsko-engleski paralelni korpus
hrenWaC 2.0 Croatian-English Parallel Corpus by Nikola Ljubešić available for use of DGT for eTranslation development with permission from corpus author.
hrenWaC 2.0 Croatian-English Parallel Corpus contains documents in the general domain, totaling 1,554,912 sentence pairs. The corpus contains texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor) with the accuracy of the extracted bitext on the segment level of around 80%. A manual content and alignment check was performed on a sample. Contains 6228 TMX files. Data are contributed exclusively for use of DGT for eTranslation development.
View resource description in all available languages