Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.0) in TSV/MOSES-like format. 
This dataset has been generated out of public content available through several websites of national agencies (https://www.ecdc.europa.eu/en/COVID-19/national-sources) and selected broadact websites like (Global Voices, Voxeurop, voltairenet, etc.)
The dataset contains 327 X-Y TSV/MOSES-like (pairs of) files, where X and Y belong to the set {CEF language plus IS and NO} (3905604 TUs in total). Acquisition of data (from multi/bi-lingual websites), normalization, cleaning, deduplication and identification of parallel documents have been done by ILSP-FC tool. Multilingual embeddings (LASER) were used for alignment of segments. Merging/filtering of segment pairs has also been applied.
DSI Relevance: eHealth
People who looked at this resource also viewed the following:
- COVID-19 EC-EUROPA v1 dataset. Multilingual (CEF languages)
- COVID-19 EU presscorner v2 dataset. Multilingual (CEF languages)
- Compilation of Bulgarian-Lithuanian parallel corpora resources used for training of NTEU Machine Translation engines.
- Evroterm glossary of job names and functions in Slovenian public administration