Manufactured data based on ParaCrawl release 7 Bulgarian-English
Manufactured data from ParaCrawl 7 bg-en
This data was created by taking existing parallel sentences and replacing aligned words with different words. It's intended to increase vocabulary coverage but also comes with the risk that the substitution may not make much logical sense. The underlying corpus, also available on ELRC-SHARE as ParaCrawl release 7, was filtered with BiCleaner with a threshold of 0.5. Data was crawled from the web following robots.txt, as is standard practice. The crawl is not targeted to a particular domain, intending to provide broad coverage.
DSI Relevance: BusinessRegistersInterconnectionSystem, Cybersecurity, ElectronicExchangeOfSocialSecurityInformation, Europeana, OnlineDisputeResolution, OpenDataPortal, eHealth, eJustice, eProcurement, saferInternet
People who looked at this resource also viewed the following:
- PRINCIPLE Ciklopea Croatian-English Parallel Corpus of Manuals for Medical Devices
- Manufactured data based on ParaCrawl release 7 Greek-English
- English-Danish EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
- PRINCIPLE Central Public Procurement Office of Republic of Croatia Croatian-English Procurement Parallel Corpus