English translations of Europeana metadata

The resource includes a selection of bilingual metadata sourced from the Europeana platform. The bilingual pairs are from one of 21 European languages to English. For each language, training sets with bilingual segments in TSV format are provided along with test sets with monolingual segments. For language pairs with less than 1000 segments, only one file in TSV format that contains all segments is provided. TSV files have the first row in English and the second in the other.

The segments have been extracted from different metadata properties of the Europeana Data Model, that captures aspects of a CH item, such as the title of a painting or its description. The textual values have been selected based on the language tags declared in the metadata and have then undergone a segmentation and cleaning process. Metadata values with incorrect language tags have been automatically fixed using a language detector, and then split into sentences with an automatic segmenter. Further filtering has been applied to prune bad quality pairs.

DSI Relevance: Europeana