MARCELL Croatian-English Parallel Corpus of Legislative Texts 
MARCELL Croatian-English Parallel Corpus of Legislative Texts contains the total body of Croatian legislative documents (1563 documents) which are translated into English and a set of Croatia’s international treaties (253 documents), totaling to 1816 documents. The size in tokens is 14,379,657 in Croatian and 17,673,788 in English. This parallel corpus is processed at the level of paragraph and sentence splitting, segment alignment and each of 396,984 translation units (TUs) was manually checked for alignment. The file format is TMX (v1.4) while in the header additional metadata on document type, year of production, attributed EUROVOC descriptor or descriptors, and domain is stored.
People who looked at this resource also viewed the following:
- Belgian parallel corpus about Belgium and the justice system
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-PT).
- Belgian parallel corpus about education, health and environment
- CEF Data Marketplace second multilingual benchmark for the evaluation of cleaning tools