Filter by:
Polish (11)
Portuguese (11)
Russian (11)
English (11)
French (11)
German (11)
Italian (11)
Spanish; Castilian (11)
Czech (9)
Dutch; Flemish (8)
Croatian (7)
Norwegian Bokmål (7)
Swedish (7)
Bulgarian (6)
Latvian (6)
Slovak (6)
Arabic (5)
Estonian (5)
Lithuanian (5)
Ukrainian (5)
Albanian (4)
Danish (4)
Finnish (4)
Hungarian (4)
Persian (4)
Slovenian (4)
Turkish (4)
Bengali (3)
Chinese (3)
Korean (3)
Macedonian (3)
Vietnamese (3)
Icelandic (2)
Indonesian (2)
Irish (2)
Serbian (2)
Tamil (2)
Afrikaans (1)
Azerbaijani (1)
Basque (1)
Belarusian (1)
Bosnian (1)
Esperanto (1)
Galician (1)
Hebrew (1)
Hindi (1)
Malagasy (1)
Malayalam (1)
Norwegian (1)
Tagalog (1)
Tai languages (1)
Telugu (1)
Thai (1)
Urdu (1)
Corpus (11)
Text (11)
Multilingual (11)
Parallel (11)
SOCIAL QUESTIONS (7)
SCIENCE (4)
Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Language Description: | |
Tool/Service: |
11 Language Resources
Order by:
COVID-19 Government of Canada dataset v2. Multilingual (EN, FR, DE, ES, EL, IT, PL, PT, RO, KO, RU, ZH, UK, VI, TA, TL)
17
37
- Chinese
- English
- French
- German
- Italian
- Korean
- Modern Greek (1453-)
- Philippine languages
- Polish
- Portuguese
- Romanian; Moldavian; Moldovan
- Russian
- Spanish; Castilian
- Tamil
- Ukrainian
- Vietnamese
- CC-BY-NC-4.0
COVID-19 - HEALTH Wikipedia dataset. Multilingual (52 EN-X language pairs)
108
205
- Afrikaans
- Albanian
- Arabic
- Azerbaijani
- Basque
- Belarusian
- Bengali
- Bosnian
- Bulgarian
- Catalan; Valencian
- Chinese
- Croatian
- Czech
- Danish
- Dutch; Flemish
- English
- Esperanto
- Estonian
- Finnish
- French
- Galician
- German
- Hebrew
- Hindi
- Hungarian
- Indonesian
- Italian
- Korean
- Latvian
- Lithuanian
- Macedonian
- Malay (macrolanguage)
- Malayalam
- Modern Greek (1453-)
- Norwegian
- Persian
- Polish
- Portuguese
- Romanian; Moldavian; Moldovan
- Russian
- Serbian
- Slovak
- Slovenian
- Spanish; Castilian
- Swahili (macrolanguage)
- Swedish
- Tagalog
- Tamil
- Telugu
- Thai
- Turkish
- Ukrainian
- Vietnamese
- CC-BY-SA-3.0
COVID-19 Parallel Global Voices dataset. Multilingual (EN, ES, FR, IT, EL, RU, AR, MG, NL, SR, BN, PT, PL, DE, RO, CS)
23
103
- Arabic
- Bengali
- Czech
- Dutch; Flemish
- English
- French
- German
- Italian
- Malagasy
- Modern Greek (1453-)
- Polish
- Portuguese
- Romanian; Moldavian; Moldovan
- Russian
- Serbian
- Spanish; Castilian
- CC-BY-3.0
COVID-19 Voltaire dataset v1. Multilingual (EN, AR, CS, DE, EL, ES, FA, FR, IT, NB, NL, NN, PL, PT, RO, RU, TR)
2
16
- Arabic
- Czech
- Dutch; Flemish
- English
- French
- German
- Italian
- Modern Greek (1453-)
- Norwegian Bokmål
- Norwegian Nynorsk
- Persian
- Polish
- Portuguese
- Romanian; Moldavian; Moldovan
- Russian
- Spanish; Castilian
- Turkish
- CC-BY-NC-ND-4.0
COVID-19 Voltaire dataset v2. Multilingual (EN, AR, CS, DE, EL, ES, FA, FR, IT, NB, NL, NN, PL, PT, RO, RU, TR)
12
35
- Arabic
- Czech
- Dutch; Flemish
- English
- French
- German
- Italian
- Modern Greek (1453-)
- Norwegian Bokmål
- Norwegian Nynorsk
- Persian
- Polish
- Portuguese
- Romanian; Moldavian; Moldovan
- Russian
- Spanish; Castilian
- Turkish
- CC-BY-NC-ND-4.0
HRW dataset v1. Multilingual (EN, AR, BG, BN, CS, DA, DE, EL, ES, FA, FI, FR, HR, HU, IN, IT, KO, LV, NB, NL, PL, PT, RU, SK, SQ, SV, TH, TL, TR, UK, UR, Vi, ZH)
21
59
- Albanian
- Bengali
- Bulgarian
- Chinese
- Croatian
- Czech
- Danish
- Dutch; Flemish
- English
- Filipino; Pilipino
- Finnish
- French
- German
- Hungarian
- Indonesian
- Italian
- Korean
- Latvian
- Modern Greek (1453-)
- Norwegian Bokmål
- Persian
- Polish
- Portuguese
- Russian
- Slovak
- Spanish; Castilian
- Swedish
- Tai languages
- Turkish
- Ukrainian
- Urdu
- Vietnamese
- CC-BY-NC-ND-3.0
OpenEdition culture-related publications. Multilingual (AR, DE, EL, EN, ES, FR, HR, IT, NL, PL, PT, RO, RU, SL, SV) collection of TMX files.
7
44
- Arabic
- Croatian
- Dutch; Flemish
- English
- French
- German
- Italian
- Modern Greek (1453-)
- Polish
- Portuguese
- Romanian; Moldavian; Moldovan
- Russian
- Slovenian
- Spanish; Castilian
- Swedish
- CC-BY-NC-ND-4.0
SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in MOSES format.
5
25
- Albanian
- Bulgarian
- Croatian
- Czech
- English
- Estonian
- Finnish
- French
- German
- Hungarian
- Icelandic
- Italian
- Latvian
- Lithuanian
- Macedonian
- Modern Greek (1453-)
- Norwegian Bokmål
- Norwegian Nynorsk
- Polish
- Portuguese
- Russian
- Slovak
- Slovenian
- Spanish; Castilian
- Swedish
- CC-BY-NC-SA-4.0
SciPar: A collection of parallel corpora from scientific abstracts (v. 2021) in TMX format.
44
111
- Albanian
- Bulgarian
- Croatian
- Czech
- English
- Estonian
- Finnish
- French
- German
- Hungarian
- Icelandic
- Italian
- Latvian
- Lithuanian
- Macedonian
- Modern Greek (1453-)
- Norwegian Bokmål
- Norwegian Nynorsk
- Polish
- Portuguese
- Russian
- Slovak
- Slovenian
- Spanish; Castilian
- Swedish
- CC-BY-NC-SA-4.0
Web-acquired data related to Scientific research (Part I). Multilingual (BG, CS, DA, DE, EN, ES, ET, FR, GA, HR, IT, LT, LV, NB, NL, PL, PT, RU, SK, SV, UK) collection of files in Moses format.
8
21
- Bulgarian
- Croatian
- Czech
- Danish
- Dutch; Flemish
- English
- Estonian
- French
- German
- Irish
- Italian
- Latvian
- Lithuanian
- Norwegian Bokmål
- Polish
- Portuguese
- Russian
- Slovak
- Spanish; Castilian
- Swedish
- Ukrainian
- Open Under-PSI
Web-acquired data related to Scientific research (Part I). Multilingual (BG, CS, DA, DE, EN, ES, ET, FR, GA, HR, IT, LT, LV, NB, NL, PL, PT, RU, SK, SV, UK) collection of files in TMX format.
14
51
- Bulgarian
- Croatian
- Czech
- Danish
- Dutch; Flemish
- English
- Estonian
- French
- German
- Irish
- Italian
- Latvian
- Lithuanian
- Norwegian Bokmål
- Polish
- Portuguese
- Russian
- Slovak
- Spanish; Castilian
- Swedish
- Ukrainian
- Open Under-PSI