GEnCaTA: a parallel Catalan-English corpus
GEnCaTA is a Catalan↔English parallel corpus composed of 38,595 segments. It has been compiled by leveraging parallel data from crawling the gencat.cat domain and subdomains, belonging to the Catalan Government, both in English and Catalan.
The file urls.txt includes the origin url per each aligned sentence.
The file scores.txt includes the scores given by vecalign.
People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: