Web-acquired data related to culture (Part I). Multilingual (BG, CS, DA, DE, EL, EN, ET, FI, FR, HR, IS, IT, LT, LV, MK, MT, RU, SK, SV) collection of files in TMX format.

Multilingual (BG, CS, DA, DE, EL, EN, ET, FI, FR, HR, IS, IT, LT, LV, MK, MT, RU, SK, SV) corpus based on the content of culture-related websites. The total number of Tus is 186584.
de-fr 39853
de-it 35028
en-bg 259
en-cs 4709
en-da 5315
en-de 20998
en-el 150
en-et 2686
en-fi 2673
en-fr 5759
en-hr 350
en-is 2778
en-it 3664
en-lt 3513
en-lv 5563
en-mk 2726
en-mt 71
en-ru 1296
en-sk 3488
en-sv 3799
et-ru 1139
fi-sv 9772
fr-it 30995
DSI Relevance: Europeana
People who looked at this resource also viewed the following:
- Compilation of Czech-Slovenian parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
- ELG API for GreynirSeq Translation (English->Icelandic)
- Compilation of Hungarian-Lithuanian parallel corpora resources used for training of NTEU Machine Translation engines.
- Manufactured data based on ParaCrawl release 8 German-English, medical terms
People who downloaded this resource also downloaded the following:
- Multilingual content acquired from advocacy and law associations/firms, conciliation/arbitration/co-operation institutes, dispute prevention and resolution agencies (part 1 , v.1).
- Multilingual content acquired from advocacy and law associations/firms, conciliation/arbitration/co-operation institutes, dispute prevention and resolution agencies (part1, v.0).
- SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in TMX format.
- SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in MOSES format.