General Norwegian Crawling 
The General Norwegian Crawling is a 43-million-token corpus of Norwegian for the general domain built from the web by targeting a wide set of diverse urls. It consists of 43,424,915 tokens, 2,692,915 sentences and 108,470 documents.
Documents are separated by single new lines.
The corpus has been developed in the framework of the CEF project MT4ALL (http://ixa2.si.ehu.eus/mt4all/project)
We license the actual packaging of this data under a CC0 1.0 Universal License.
People who looked at this resource also viewed the following:
- Compilation of Hungarian-Latvian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Greek-Italian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Irish-Croatian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Greek-French parallel corpora resources used for training of NTEU Machine Translation engines.