Romanian - English news corpus (Processed)

Romanian – English news corpus was created for the European Language Resources Coordination Action (ELRC) ( by Tufis Dan, Institutul de Cercetari pentru Inteligenta Artificiala ”Mihai Draganescu”, Academia Romana ( with primary data copyrighted by SouthEast European Times and is licensed under "CC-BY 4.0" (

Bilingual Romanian – English news corpus built from SouthEast European Times (2008 dump). The texts are positionaly aligned, i.e. the sentence on line i in the English text is aligned with the sentence on line i in the Romanian text. Alignment was manually validated.