COVID-19 Parallel Global Voices dataset. Multilingual (EN, ES, FR, IT, EL, RU, AR, MG, NL, SR, BN, PT, PL, DE, RO, CS)

122 Last view: 2025-09-21

1 Last update: 2020-06-02

25 Last download: 2024-12-19

COVID-19 Parallel Global Voices dataset. Multilingual (EN, ES, FR, IT, EL, RU, AR, MG, NL, SR, BN, PT, PL, DE, RO, CS)

Attribution details: "Covid Parallel Global Voices" dataset was created for the European Language Resources Coordination Action (ELRC) (http://lr-coordination.eu/) by researchers at the NLP group of the Institute for Language and Speech Processing (http://www.ilsp.gr/) with primary data copyrighted by Global Voices (https://globalvoices.org/) and is licensed under "CC-BY 3.0" (https://creativecommons.org/licenses/by/3.0/).

Multilingual (EN, ES, FR, IT, EL, RU, AR, MG, NL, SR, BN, PT, PL, DE, RO, CS) COVID-19-related corpus acquired from the website (https://globalvoices.org/) of GlobalVoices (28th April 2020). It contains 25755 TUs in total.
5459 EN-ES
4840 EN-FR
4056 EN-IT
3204 EN-EL
3127 EN-RU
1779 EN-AR
1045 EN-MG
675 EN-NL
434 EN-SR
384 EN-BN
276 EN-PT
193 EN-PL
178 EN-DE
66 EN-RO
39 EN-CS

DSI Relevance: eHealth

Distribution

Availability: Available

Licences

CC-BY-3.0

Conditions: Attribution

Distribution Details

Attribution Details: "Covid Parallel Global Voices" dataset was created for the European Language Resources Coordination Action (ELRC) (http://lr-coordination.eu/) by researchers at the NLP group of the Institute for Language and Speech Processing (http://www.ilsp.gr/) with primary data copyrighted by Global Voices (https://globalvoices.org/) and is licensed under "CC-BY 3.0" (https://creativecommons.org/licenses/by/3.0/).

IPR Holders

Global Voices

Contact Person

Prokopis Prokopidis

text

Multilingual text corpusLanguages

English (en)

Russian (ru)

Modern Greek (1453-) (el)

Dutch; Flemish (nl)

Arabic (ar)

Spanish; Castilian (es)

Italian (it)

French (fr)

Serbian

Malagasy (mg)

Portuguese (pt)

Bengali (bn)

German (de)

Polish (pl)

Czech (cs)

Romanian; Moldavian; Moldovan (ro)

Linguality

Linguality type: Multilingual

Multi-linguality type: Parallel

Text Format

TMX

Size

25,755 Translation Units

Character encoding

UTF-8

Domains

SOCIAL QUESTIONS Health (Eurovoc 2841)

Resource Creation

Created using ELRC Services

Funding Project

COVID-19 Initiative (COVID-19)

Funding Type: Other

Funding Country: European Union (EU)

European Language Resource Coordination 3.0 (ELRC3.0 - SMART 2019/1083 LC-01325001)

URL: http://www.lr-coordi...

Funding Type: Eu Funds

Funder: European Commission

Funding Country: European Union (EU)

Metadata

Created: 06/11/2019

Last Updated: 28/04/2020

Metadata Language: English (en)

Metadata Creator

Prokopis Prokopidis

Version

Version: 1.0

Last Updated: 28/04/2020

Relations

Relation Type: Has Part

People who looked at this resource also viewed the following:

Resources from the same project