Multilingual corpus from the Publications Office of the EU on the medical domain v.2
This dataset has been generated out of public content available through the Publications Office of the European Union (OP Portal), https://op.europa.eu/en/home, in April 2020
277780 sentence pairs (in 23 EN-X language pairs in total) extracted from the Publications Office of the EU on the medical domain. These are sourced from laws, studies, EC announcements, etc. labelled with concepts like epidemiology, epidemic, disease surveillance, health control, public hygiene, freedom of movement, distance learning, etc.
13149 en-bg
13160 en-cs
13242 en-da
13291 en-de
13091 en-el
13195 en-es
13016 en-et
12942 en-fi
13149 en-fr
412 en-ga
12836 en-hr
13025 en-hu
13059 en-it
12580 en-lt
13044 en-lv
3093 en-mt
13191 en-nl
12761 en-pl
13148 en-pt
13163 en-ro
12926 en-sk
13208 en-sl
13099 en-sv
DSI Relevance: eHealth
People who looked at this resource also viewed the following:
- Multilingual corpus from the Publications Office of the EU on the medical domain
- Multilingual corpus from the European Vaccination Information Portal
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- COVID-19 ANTIBIOTIC dataset. Multilingual (CEF languages)
People who downloaded this resource also downloaded the following:
- Multilingual corpus from the Publications Office of the EU on the medical domain
- Multilingual corpus from the European Vaccination Information Portal
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- COVID-19 - HEALTH Wikipedia dataset. Multilingual (52 EN-X language pairs)