PRINCIPLE English-Irish eJustice Corpus (Evaluated)

- Houses of the Oireachtas for original material - Dublin City University/ADAPT Centre for alignment and cleaning

Corpus consisting of aligned parallel eJustice data translated by Rannóg an Aistriúcháin. The content consists of Acts of the Oireachtas, Statutory Instruments, order papers and related material, and annual reports. The data originally came in various formats, mostly unaligned. The following processing was performed: automatic text extraction from raw documents, normalization, TU alignment, cleaning, automated error detection, manual spot-check for quality. This dataset has been used in the development of an MT system in the eJustice domain for the EN-GA language pair, and so is considered to be of high quality.

DSI Relevance: eJustice