Creation mode details: The ILSP Focused Crawler was used for the acquisition of monolingual data from websites, and for the normalization, cleaning, deduplication. First, paragraphs were split into sentences. Sentences with length less than 5 characters (after removing non-letters) were discarded. Duplicate sentences (after removing non-letters) were discarded.