MARCELL Croatian-English Parallel Corpus of Legislative Texts 
MARCELL Croatian-English Parallel Corpus of Legislative Texts contains the total body of Croatian legislative documents (1563 documents) which are translated into English and a set of Croatia’s international treaties (253 documents), totaling to 1816 documents. The size in tokens is 14,379,657 in Croatian and 17,673,788 in English. This parallel corpus is processed at the level of paragraph and sentence splitting, segment alignment and each of 396,984 translation units (TUs) was manually checked for alignment. The file format is TMX (v1.4) while in the header additional metadata on document type, year of production, attributed EUROVOC descriptor or descriptors, and domain is stored.
People who looked at this resource also viewed the following:
- Tuairisc Bliana 2011
- Bilingual hr-en parallel corpus from Croatian Mine Action website
- Compilation of Greek-Irish parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Czech-Estonian parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.