Romanian - English news corpus (Processed)

85 Last view: 2024-07-21

1 Last update: 2019-01-15

54 Last download: 2023-06-03

Romanian - English news corpus (Processed)

Attribution details: Romanian – English news corpus was created for the European Language Resources Coordination Action (ELRC) (http://lr-coordination.eu/) by Tufis Dan, Institutul de Cercetari pentru Inteligenta Artificiala ”Mihai Draganescu”, Academia Romana (www.racai.ro/) with primary data copyrighted by SouthEast European Times and is licensed under "CC-BY 4.0" (https://creativecommons.org/licenses/by/4.0/).

Bilingual Romanian – English news corpus built from SouthEast European Times (2008 dump). The texts are positionaly aligned, i.e. the sentence on line i in the English text is aligned with the sentence on line i in the Romanian text. Alignment was manually validated.

Distribution

Availability: Available

Licences

CC-BY-4.0

Conditions: Attribution

Distribution Details

Attribution Details: Romanian – English news corpus was created for the European Language Resources Coordination Action (ELRC) (http://lr-coordination.eu/) by Tufis Dan, Institutul de Cercetari pentru Inteligenta Artificiala ”Mihai Draganescu”, Academia Romana (www.racai.ro/) with primary data copyrighted by SouthEast European Times and is licensed under "CC-BY 4.0" (https://creativecommons.org/licenses/by/4.0/).

Distribution Medium: Data Downloadable

IPR Holders

Southeast European Times

Contact Person

Dan Tufis

text

Bilingual text corpusLanguages

Romanian; Moldavian; Moldovan (ro) (2,525,423 Words)

English (en) (2,382,849 Words)

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel

Text Format

TMX

Size

4,908,272 Words

98,098 Translation Units

Character encoding

UTF-8

Domains

EDUCATION & COMMUNICATIONS Documentation (Eurovoc 3221)

EDUCATION & COMMUNICATIONS Communications (Eurovoc 3226)

AnnotationAlignment

Segmentation level: Sentence

Creation

Creation mode details: Conversion from Moses-like format to TMX. As a post-processing process several filters were applied to discard/annotate alignments that might be incorrect.

Creation mode: Automatic

Resource Creation

Funding Project

European Language Resource Coordination LOT3 (ELRC Data - Tools and Resources for CEF Automated Translation-LOT3 (SMART 2015/1091-30-CE-0816766/00-92))

URL: http://www.lr-coordi...

Funding Type: Service Contract

Funder: European Commission

Funding Country: European Union (EU)

Project duration: 13/12/2016 - 12/02/2020

Metadata

Created: 24/11/2016

Last Updated: 20/12/2016

Metadata Language: English (en)

Metadata Creator

Dan Tufis