Legal Ukrainian Crawling

The Legal Ukrainian Crawling is a 69-million-token corpus of Ukrainian built from the web by targeting specific in-domain urls that belong to the legal sector such as legislation websites, governamental sites, and domains from the Court and the Parliament. It consists of 69,128,091 tokens, 7,544,396 sentences and 23,850 documents.
Documents are separated by single new lines.
The corpus has been developed in the framework of the CEF project MT4ALL (
We license the actual packaging of this data under a CC0 1.0 Universal License.