COVID-19-related multilingual corpus from EU press Corner 2020 v.0.9 in TMX format
Multilingual dataset (CEF languages) based on the press releases from the ec.europa.eu portal during 2020. For example, https://ec.europa.eu/commission/presscorner/detail/en/ip_20_1680 and https://ec.europa.eu/commission/presscorner/detail/el/ip_20_1680 are two press releaseses in EN and EL). It contains 276 TMX files including 2514613 Translation Units in total.
DSI Relevance: eHealth
People who looked at this resource also viewed the following:
- COVID-19 OSHA-EUROPA dataset v1. Multilingual (CEF languages plus IS and NB but not Irish)
- COVID-19-related multilingual corpus from EU press Corner 2020 v.0.9 in Moses-like format
- COVID-19 Line 1177 of Sweden dataset v1. Multilingual (EN, BG, DE, ES, FI, FR, PL, RO, RU, SV, TR)
- COVID-19-related multilingual corpus from EU press Corner 2020 v.0.9 in TSV format
People who downloaded this resource also downloaded the following:
- COVID-19 OSHA-EUROPA dataset v1. Multilingual (CEF languages plus IS and NB but not Irish)
- COVID-19 - HEALTH Wikipedia dataset. Multilingual (52 EN-X language pairs)
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- Multilingual corpus from the European Vaccination Information Portal