Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.0) in TMX format. 
This dataset has been generated out of public content available through several websites of national agencies (https://www.ecdc.europa.eu/en/COVID-19/national-sources) and selected broadact websites like (Global Voices, Voxeurop, voltairenet, etc.)
The dataset contains 327 X-Y TMX files, where X and Y belong to the set {CEF language plus IS and NO} (3905604 TUs in total). Acquisition of data (from multi/bi-lingual websites), normalization, cleaning, deduplication and identification of parallel documents have been done by ILSP-FC tool. Multilingual embeddings (LASER) were used for alignment of segments. Merging/filtering of segment pairs has also been applied.
DSI Relevance: eHealth
People who looked at this resource also viewed the following:
- Compilation of German-Slovenian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Estonian-Latvian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Finnish-Irish parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
- COVID-19 Line 1177 of Sweden dataset v1. Multilingual (EN, BG, DE, ES, FI, FR, PL, RO, RU, SV, TR)
People who downloaded this resource also downloaded the following:
- COVID-19 Federal Ministry of Social Affairs, Health, Care and Consumer Protection of Republic of Austria dataset v2. Multilingual (EN, DE, RO, HR, CS, TR, ES)
- COVID-19 OSHA-EUROPA dataset v1. Multilingual (CEF languages plus IS and NB but not Irish)
- COVID-19 Health Service Executive of Ireland dataset v2. Multilingual (EN, BG, CS, DE, EL, ES, FR, GA, LV, LT, PL, PT, RO, SK, SQ)
- COVID-19 Line 1177 of Sweden dataset v1. Multilingual (EN, BG, DE, ES, FI, FR, PL, RO, RU, SV, TR)