PRINCIPLE Foras na Gaeilge parallel translation memory dataset (evaluated)

Aligned parallel corpus based on translation memory data from Foras na Gaeilge. The data originally came in an aligned format, and was since normalized and cleaned. The cleaned content was subsequently searched (automated) for obvious errors, and spot-checked (manually) for quality. This dataset has been used in the development of an MT system for the EN-GA language pair, and so is considered to be of high quality.
Languages: English-Irish
Domain: mixed (general-purpose with some eProcurement)
Size: 54141 translation units