Bilingual (EN-SQ) corpus from websites of government of North Macedonia v.1.0
Bilingual dataset (EN, SQ) based on the content of websites of the government of North Macedonia. It was generated by crawling the websites in February 2021, detecting pairs of parallel documents, identifying parallel sentence pairs and filtering the results.
People who looked at this resource also viewed the following:
- October Newsletter intro 2021
- COVID-19 Health Service Executive of Ireland dataset v1. Bilingual (EN-PT)
- COVID-19 landlaeknir dataset v2. Multilingual (EN, IS, PL, DE, ES, FR, LT)
- Compilation of Modern Greek (1453-)-Romanian; Moldavian; Moldovan parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
People who downloaded this resource also downloaded the following: