ParaCrawl release 9 English-Spanish; Castilian - deferred files 
ParaCrawl 9 en-es

This file contains URLs and hashes of text to form a parallel corpus but not the sentences itself. You probably want the actual parallel data; see the version without "deferred files" in the title. To reconstruct a parallel corpus, use the deferred crawling tool at https://github.com/bitextor/deferred-crawling which will download pages and produce a corpus, which is probably smaller due to link rot. This format is intended to support parties whose lawyers believe it is ok to scrape websites directly but not ok to copy them from a third party. Based on English-Spanish; Castilian parallel from release 9 of the ParaCrawl project, specifically "Continued Web-Scale Provision of Parallel Corpora for European Languages". This version is filtered with BiCleaner AI. Data was crawled from the web following robots.txt, as is standard practice. The crawl is not targeted to a particular domain, intending to provide broad coverage.
DSI Relevance: BusinessRegistersInterconnectionSystem, Cybersecurity, ElectronicExchangeOfSocialSecurityInformation, Europeana, OnlineDisputeResolution, OpenDataPortal, eHealth, eJustice, eProcurement, saferInternet
People who looked at this resource also viewed the following:
- Monolingual corpus from Minutes of the Sittings of the Chamber of Deputies of Romania (2016-2018) (Processed)
- 2017 Activity Report Hohe Tauern National Park (Processed)
- ParaCrawl release 9 English-Romanian; Moldavian; Moldovan - deferred files
- Bilingual English-Norwegian (Nynorsk) parallel corpus from the Courts of Norway website