Legal Kazakh Crawling 
The Legal Kazakh Crawling is a 1-million-token corpus of Kazakh built from the web by targeting specific in-domain urls that belong to the legal sector such as legislation websites, governamental sites, and domains from the Court and the Parliament. It consists of 1,862,857 tokens, 119,711 sentences and 4,485 documents.
Documents are separated by single new lines.
The corpus has been developed in the framework of the CEF project MT4ALL (http://ixa2.si.ehu.eus/mt4all/project)
We license the actual packaging of this data under a CC0 1.0 Universal License.
People who looked at this resource also viewed the following:
- Compilation of Hungarian-Latvian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Greek-Lithuanian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Greek-Italian parallel corpora resources used for training of NTEU Machine Translation engines.
- Compilation of Spanish-Hungarian parallel corpora resources used for training of NTEU Machine Translation engines.