General German Crawling 
The General German Crawling is a 67-million-token corpus of German for the general domain built from the web by targeting a wide set of diverse urls. It consists of 67,637,441 tokens, 4,880,000 sentences and 344,712 documents.
Documents are separated by single new lines.
The corpus has been developed in the framework of the CEF project MT4ALL (http://ixa2.si.ehu.eus/mt4all/project)
We license the actual packaging of this data under a CC0 1.0 Universal License.
People who looked at this resource also viewed the following:
- Compilation of Bulgarian-Maltese parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
- Compilation of Estonian-Lithuanian parallel corpora resources used for training of NTEU Machine Translation engines. Tier 3.
- Compilation of Hungarian-Slovak parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Greek-Croatian parallel corpora resources used for training of NTEU Machine Translation engines.