Customer Support Spanish Crawling

The Customer Support Spanish Crawling is a 1-million-token corpus of Spanish built from the web by targeting specific in-domain urls that belong to the customer support domain such as FAQ and help websites, as well as community sites and forums. It consists of 1,054,268 tokens, 58,490 sentences and 2,725 documents.
Documents are separated by single new lines.
The corpus has been developed in the framework of the CEF project MT4ALL (
We license the actual packaging of this data under a CC0 1.0 Universal License.