Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.05) in TMX format. 
This dataset has been generated out of public content available through several websites of national agencies (https://www.ecdc.europa.eu/en/COVID-19/national-sources) and selected broadact websites like (Global Voices, Voxeurop, voltairenet, etc.)
The dataset contains 327 X-Y TMX files, where X and Y belong to the set {CEF language plus IS and NO} (3044961 TUs in total). Acquisition of data (from multi/bi-lingual websites), normalization, cleaning, deduplication and identification of parallel documents have been done by ILSP-FC tool. Multilingual embeddings (LASER) were used for alignment of segments. Merging/filtering of segment pairs has also been applied.
DSI Relevance: eHealth
People who looked at this resource also viewed the following:
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- EUIPO - Trade mark Guidelines (October 2017) (English-Romanian) (Processed)
- COVID-19-related multilingual corpus from EU press Corner 2020 v.0.9 in TMX format
- COVID-19 Line 1177 of Sweden dataset v1. Multilingual (EN, BG, DE, ES, FI, FR, PL, RO, RU, SV, TR)
People who downloaded this resource also downloaded the following:
- COVID-19 Federal Ministry of Social Affairs, Health, Care and Consumer Protection of Republic of Austria dataset v2. Multilingual (EN, DE, RO, HR, CS, TR, ES)
- COVID-19 Voltaire dataset v2. Multilingual (EN, AR, CS, DE, EL, ES, FA, FR, IT, NB, NL, NN, PL, PT, RO, RU, TR)
- COVID-19 Line 1177 of Sweden dataset v1. Multilingual (EN, BG, DE, ES, FI, FR, PL, RO, RU, SV, TR)
- Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.0) in TMX format.