Financial Latvian Crawling

The Financial Latvian Crawling is an 8-million-token corpus of Latvian built from the web by targeting specific in-domain urls that belong to the finance sector such as bank websites, finance resource sites, finance blogs and forums on banking and economy-related issues. It consists of 8,827,703 tokens, 485,845 sentences and 15,930 documents. Documents are separated by single new lines.
The corpus has been developed in the framework of the CEF project MT4ALL (
We license the actual packaging of this data under a CC0 1.0 Universal License.