Creation mode details: The ILSP Focused Crawler was used for the acquisition of bilingual data from multilingual websites, and for the normalization, cleaning, (near) de-duplication and identification of parallel documents. The Maligna sentence aligner was used for extracting segment alignments from crawled parallel documents. As a post-processing step, alignments were merged into one TMX file. The following filters were applied: TMX files generated from document pairs which have been identified by non-aupdih methods were discarded ; TMX files with a zeroToOne_alignments/total_alignments ratio larger than 0.16, were discarded ; Alignments of non-[1:1] type(s) were discarded. ; Alignments with a TUV (after normalization) that has less than 3 tokens, were discarded/annotated ; Alignments with a l1/l2 TUV length ratio smaller than 0.6 or larger than 1.6, were discarded/annotated ; Alignments in which different digits appear in each TUV were kept and annotated. ; Alignments with identical TUVs (after normalization) were discarded/annotated ; Alignments with only non-letters in at least one of their TUVs were discarded/annotated ; Duplicate alignments were kept and were discarded/annotated There are 3786 TUs with no annotation, containing 75520 words and 13439 lexical types in el and 79567 words and 8142 lexical types in en