SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in MOSES format. 
Collection of 31 pairs of MOSES-like files for EN-X language pairs, where X is BG, CS, DE, EL, EN, ES, ET, FI, FR, HR, HU, IS, IT, LT, LV, MK, NB, NN, PL, PT, RU, SK, SL, SQ, SV. It also contains small collection for a few more language combinations. It was generated by processing abstracts of Bachelor, Master and PhD Theses available at academic repositories and archives. The total number of Tus is 9172462.
de-es 268
de-fr 281
de-ru 198
en-bg 2301
en-cs 1064384
en-de 890184
en-el 742986
en-es 354459
en-et 83478
en-fi 457341
en-fr 1123121
en-hr 806580
en-hu 27421
en-is 110830
en-it 31279
en-lt 177436
en-lv 347472
en-mk 4940
en-nb 56055
en-nn 2380
en-pl 862075
en-pt 974167
en-ru 3063
en-sk 60467
en-sl 300016
en-sq 7779
en-sv 670815
es-fr 4915
es-ru 728
fr-ru 1333
mk-sq 3710
People who looked at this resource also viewed the following:
- Bosnian-English parallel corpus MaCoCu-bs-en 1.0
- Compilation of Czech-Danish parallel corpora resources used for training of NTEU Machine Translation engines.
- Koolisõnastikud 2005–2010
- Compilation of German-Modern Greek (1453-) parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
People who downloaded this resource also downloaded the following:
- Multilingual content acquired from advocacy and law associations/firms, conciliation/arbitration/co-operation institutes, dispute prevention and resolution agencies (part1, v.0).
- Multilingual content acquired from advocacy and law associations/firms, conciliation/arbitration/co-operation institutes, dispute prevention and resolution agencies (part 1 , v.1).
- Report on the Best European practices on initial training programs (Processed)
- SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in TMX format.