Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.05) in TMX format. 
This dataset has been generated out of public content available through several websites of national agencies (https://www.ecdc.europa.eu/en/COVID-19/national-sources) and selected broadact websites like (Global Voices, Voxeurop, voltairenet, etc.)
The dataset contains 327 X-Y TMX files, where X and Y belong to the set {CEF language plus IS and NO} (3044961 TUs in total). Acquisition of data (from multi/bi-lingual websites), normalization, cleaning, deduplication and identification of parallel documents have been done by ILSP-FC tool. Multilingual embeddings (LASER) were used for alignment of segments. Merging/filtering of segment pairs has also been applied.
DSI Relevance: eHealth
People who looked at this resource also viewed the following:
- Web-acquired data related to health/covid-19 (Part I). Multilingual (BG, CS, DA, DE, EL, EN, ET, ES, FI, FR, GA, HR, HU, IS, IT, LT, LV, MK, MT, NL, NB, NN, NO, PL, PT, RO, SK, SL, SQ, SV) collection of files in TMX format.
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- COVID-19-related multilingual corpus from EU press Corner 2020 v.0.9 in TMX format
- Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.0) in TMX format.
People who downloaded this resource also downloaded the following:
- COVID-19 Federal Ministry of Social Affairs, Health, Care and Consumer Protection of Republic of Austria dataset v2. Multilingual (EN, DE, RO, HR, CS, TR, ES)
- COVID-19 Voltaire dataset v2. Multilingual (EN, AR, CS, DE, EL, ES, FA, FR, IT, NB, NL, NN, PL, PT, RO, RU, TR)
- COVID-19 Line 1177 of Sweden dataset v1. Multilingual (EN, BG, DE, ES, FI, FR, PL, RO, RU, SV, TR)
- Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.0) in TMX format.