LASER: Language-Agnostic SEntence Representations

LASER

LASER is a library to calculate and use multilingual sentence embeddings. The toolkit works with more than 90 languages, including low-resource languages, written in 28 different alphabets. It can be used to transfer natural language processing (NLP) applications originally developed for a single language to many more languages. It has been applied to a number of tasks, such as parallel corpus mining, cross-lingual document classification, multilingual similarity search and cross-lingual natural language inference.


Languages: Portuguese (pt), Polish (pl), Persian (fa), Occitan (post 1500) (oc), Norwegian Bokmål (nb), Marathi (mr), Dhivehi; Divehi; Maldivian (dv), Malayalam (ml), Russian (ru), Romanian; Moldavian; Moldovan (ro), Slovenian (sl), Spanish; Castilian (es), Somali (so), Swahili (individual Language); Kiswahili (swh), Serbian (sr), Sindhi (sd), Sinhala; Sinhalese (si), Slovak (sk), Swedish (sv), Tagalog (tl), Ido (io), Icelandic (is), Interlingua ( International Auxiliary Language Association) (ia), Indonesian (id), Irish (ga), Interlingue; Occidental (ie), Japanese (ja), Italian (it), Kazakh (kk), Kabyle (kab), Malagasy (mg), Malay (individual Language) (zlm), Low German; Low Saxon (nds), Macedonian (mk), Lingua Franca Nova (lfn), Lithuanian (lt), Latvian (lv), Latin (la), Korean (ko), Kurdish (ku), Afrikaans (af), Estonian (et), Esperanto (eo), Czech (cs), Croatian (hr), Cornish (kw), Coastal Kadazan (kzj), English (en), Eastern Mari (mhr), Dutch; Flemish (nl), Danish (da), Albanian (sq), Amharic (am), Arabic (ar), Armenian (hy), Aymara (ay), Azerbaijani (az), Basque (eu), Belarusian (be), Bengali (bn), Hindi (hi), Hungarian (hu), Finnish (fi), French (fr), Galician (gl), Georgian (ka), German (de), Modern Greek (1453-) (el), Hausa (ha), Hebrew (he), Uzbek (uz), Urdu (ur), Turkish (tr), Thai (th), Ukrainian (uk), Uighur; Uyghur (ug), Tamil (ta), Tajik (tg), Telugu (te), Tatar (tt), Breton (br), Bulgarian (bg), Berber Languages (ber), Bosnian (bs), Central Dusun (dtp), Central Khmer (km), Burmese (my), Catalan; Valencian (ca), Chavacano (cbk), Chinese (zh), Yue Chinese (yue), Vietnamese (vi), Wu Chinese (wuu)