SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in MOSES format.
Collection of 31 pairs of MOSES-like files for EN-X language pairs, where X is BG, CS, DE, EL, EN, ES, ET, FI, FR, HR, HU, IS, IT, LT, LV, MK, NB, NN, PL, PT, RU, SK, SL, SQ, SV. It also contains small collection for a few more language combinations. It was generated by processing abstracts of Bachelor, Master and PhD Theses available at academic repositories and archives. The total number of Tus is 9172462.
de-es 268
de-fr 281
de-ru 198
en-bg 2301
en-cs 1064384
en-de 890184
en-el 742986
en-es 354459
en-et 83478
en-fi 457341
en-fr 1123121
en-hr 806580
en-hu 27421
en-is 110830
en-it 31279
en-lt 177436
en-lv 347472
en-mk 4940
en-nb 56055
en-nn 2380
en-pl 862075
en-pt 974167
en-ru 3063
en-sk 60467
en-sl 300016
en-sq 7779
en-sv 670815
es-fr 4915
es-ru 728
fr-ru 1333
mk-sq 3710
People who looked at this resource also viewed the following:
- SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in TMX format.
- OpenEdition culture-related publications. Multilingual (AR, DE, EL, EN, ES, FR, HR, IT, NL, PL, PT, RO, RU, SL, SV) collection of TMX files.
- Dublin Declaration adopted by IOI General Assembly
- Tuarascáil Bhliantúil 2018 ón Roinn Dlí agus Cirt agus Comhionannais
People who downloaded this resource also downloaded the following:
- Web-acquired data related to Scientific research (Part I). Multilingual (BG, CS, DA, DE, EN, ES, ET, FR, GA, HR, IT, LT, LV, NB, NL, PL, PT, RU, SK, SV, UK) collection of files in TMX format.
- Web-acquired data related to culture (Part I). Multilingual (BG, CS, DA, DE, EL, EN, ET, FI, FR, HR, IS, IT, LT, LV, MK, MT, RU, SK, SV) collection of files in TMX format.
- SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in TMX format.
- Report on the Best European practices on initial training programs (Processed)