Translation Memories from Semantix AS

This corpus contains translation memories provided to the National Library of Norway by Semantix AS. The translations have been carried out on behalf of various public agencies and institutions.

The corpus is composed of texts of English or Norwegian Bokmål origin, with parallelized translations into the other language. There are some very few examples og translation into Norwegian Nynorsk in the material, but for simplicity, these have been classified as Norwegian Bokmål. The material have been set in a valid XML, properly encoded and lightly proofread.

All translations from English to Norwegian Bokmål are collected in a separate file, and vice versa with translations from Norwegian Bokmål to English. The files are in TMX 1.4 format (a variant of XML). In the files, every single translation unit (TU) is marked with the institution for which the translation has been carried out. A TU corresponds (more or less) to a meaningful linguistic unit, typically a sentence, a heading etc. A TU may also consist of a single word or several clauses.

The corpus contains a total of 1.325.013 TUs, distributed as follows:

- English > Norwegian Bokmål: 250.053 TUs
- Norwegian Bokmål > English: 1.074.960 TUs

The documentation file contains an overview of the agencies and institutions, and the number of TUs belonging to each institution.

DSI Relevance: eJustice, eProcurement