ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020). 
This dataset has been generated out of public content available through European Medicines Agency: https://www.ema.europa.eu/, in February 2020
The dataset contains 300 X-Y TMX files, where X and Y are CEF languages (180312670 TUs in total). New methods for text extraction from pdf, sentence splitting, sentence alignment, and parallel corpus filtering have been applied. The following list holds the number of TUs per language pair:
bg-… Read More
DSI Relevance: eHealth
People who looked at this resource also viewed the following:
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-RO).
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-DE).
- Multilingual corpus in HEALTH (COVID-19) domain part_1a (v.1.05) in TMX format.
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-FI).
People who downloaded this resource also downloaded the following: