Manufactured data based on ParaCrawl release 8 Italian-English, it terms
Manufactured data from ParaCrawl 8 it-en
Italian-English manufactured parallel data, with it terms, from release 8 of the ParaCrawl project, specifically "Broader Web-Scale Provision of Parallel Corpora for European Languages". This data was created by taking existing parallel sentences and replacing aligned words with different words. It's intended to increase vocabulary coverage but also comes with the risk that the substitution may not make much logical sense. The underlying corpus, also available on ELRC-SHARE as ParaCrawl release 8 has been filtered with BiCleaner with a threshold of 0.5. Data was crawled from the web following robots.txt, as is standard practice. The crawl is not targeted to a particular domain, intending to provide broad coverage.
DSI Relevance: BusinessRegistersInterconnectionSystem, Cybersecurity, ElectronicExchangeOfSocialSecurityInformation, Europeana, OnlineDisputeResolution, OpenDataPortal, eHealth, eJustice, eProcurement, saferInternet
People who looked at this resource also viewed the following:
- English-Italian parallel corpus from CORDIS Project Results in Brief
- Documents from the Ministry of Environment of the Slovak Republic (EN-SK) (Processed)
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-CS).
- Bilingual corpus from the Publications Office of the EU on the medical domain v.2 (EN-SL)