Bilinguis Free Books v.1.04. Multilingual (CS, DE, EN, ES, FI, FR, IT, NL, PL, PT) corpus from the http://bilinguis.com/ website.
Multilingual (CS, DE, EN, ES, FI, FR, IT, NL, PL, PT) dataset based on the content of the http://bilinguis.com/ website. It includes 151927 Translation Units in total. It was generated by harvesting the website in October 2021, identifying parallel sentence pairs and filtering the results. The number of TUs are:
cs-de 1322
cs-es 1247
cs-fr 1207
cs-it 1225
cs-nl 939
cs-pl 975
cs-pt 1386
de-es 11912
de-fi 3290
de-fr 12159
de-it 5088
de-nl 955
de-pl 947
de-pt 1884
en-cs 1377
en-de 12378
en-es 12748
en-fi 3228
en-fr 16261
en-it 5167
en-nl 4575
en-pl 947
en-pt 2185
es-fi 3197
es-fr 12455
es-it 5210
es-nl 901
es-pl 1127
es-pt 1956
fi-fr 2776
fi-it 3169
fr-it 4691
fr-nl 3826
fr-pl 1031
fr-pt 1807
it-nl 879
it-pl 958
it-pt 1866
nl-pl 692
nl-pt 993
pl-pt 991
DSI Relevance: Europeana
People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following:
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- Belgian government bilingual parallel corpus
- Avibase (processed)
- Web-acquired data related to health/covid-19 (Part I). Multilingual (BG, CS, DA, DE, EL, EN, ET, ES, FI, FR, GA, HR, HU, IS, IT, LT, LV, MK, MT, NL, NB, NN, NO, PL, PT, RO, SK, SL, SQ, SV) collection of files in TMX format.