SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in MOSES format.
Collection of 31 pairs of MOSES-like files for EN-X language pairs, where X is BG, CS, DE, EL, EN, ES, ET, FI, FR, HR, HU, IS, IT, LT, LV, MK, NB, NN, PL, PT, RU, SK, SL, SQ, SV. It also contains small collection for a few more language combinations. It was generated by processing abstracts of Bachelor, Master and PhD Theses available at academic repositories and archives. The total number of Tus is 9172462. de-es 268 de-fr 281 de-ru 198 en-bg 2301 en-cs 1064384 en-de 890184 en-el 742986 en-es 354459 en-et 83478 en-fi 457341 en-fr 1123121 en-hr 806580 en-hu 27421 en-is 110830 en-it 31279 en-lt 177436 en-lv 347472 en-mk 4940 en-nb 56055 en-nn 2380 en-pl 862075 en-pt 974167 en-ru 3063 en-sk 60467 en-sl 300016 en-sq 7779 en-sv 670815 es-fr 4915 es-ru 728 fr-ru 1333 mk-sq 3710
Related Resource:
Abstracts of scientific works (PARTII). Multilingual (BG, CS, DE, EL, EN, ES, ET, FI, FR, HR, HU, IS, IT, LT, LV, MK, NB, NN, PL, PT, RU, SK, SL, SQ, SV) collection of Moses like format files
Relation Type: Has Version
Related Resource:
Abstracts of scientific works. Multilingual (CS, DE, EL, EN, ES, ET, FI, FR, HU, IS, IT, LV, NB, PL, PT, RU, SL, SV) collection of files in Moses-like format.