Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020), provided in Moses format. 
This dataset has been generated out of public content available through European Medicines Agency: https://www.ema.europa.eu/, in February 2020
The dataset contains 24 EN-X Moses (pair-) files, where X is a CEF language (17617914 TUs in total). New methods for text extraction from pdf, sentence splitting, sentence alignment, and parallel corpus filtering have been applied. The following list holds the number of TUs per EN-X language… Read More
DSI Relevance: eHealth