CURLICAT Bulgarian corpus

48 Last view: 2025-07-06

1 Last update: 2022-11-28

15 Last download: 2025-02-27

CURLICAT Bulgarian corpus

The Bulgarian CURLICAT corpus consists of texts from different sources, provided with appropriate licences for distribution. We used three general types of sources with regard to the metadata extraction: Bulgarian National Corpus (provided that they have redistributable licensing terms); some public repositories with open and copyright free data; blogs with redistributable licenses, open content websites, etc. The Bulgarian CURLICAT collection contains 113 087 documents, distributed in seven thematic domains: Culture, Education, European Union, Finance, Politics, Economics, and Science. For more information see the CURLICAT website (http:curlicat-project.eu/deliverables)

Distribution

Availability: Under Review

Licences

CC-BY-SA-4.0

Conditions: Attribution, Share Alike

Distribution Details

Contact Person

Svetla Koeva

text

Monolingual text corpusLanguages

Bulgarian (bg)

Linguality

Linguality type: Monolingual

Text Format

text with tab-separated-values

Size

2,158,765 Sentences

35,319,695 Words

Monolingual text corpusLanguages

Bulgarian (bg)

Linguality

Linguality type: Monolingual

Text Format

Size

2,609,503 Sentences

Monolingual text corpusLanguages

Bulgarian (bg)

Linguality

Linguality type: Monolingual

Text Format

text with tab-separated-values

Size

113,399 Files

Resource Creation

Creation lasted: 01/06/2021 - 31/10/2022

Funding Project

Curated Multilingual Language Resources for CEF.AT (CURLICAT - INEA/CEF/ICT/A2019/1926831))

URL: https://curlicat-pro...

Funding Type: Eu Funds

Funder: European Commission

Project duration: 30/11/2022 - 30/11/2022

Metadata

Created: 31/10/2022

Last Updated: 31/10/2022

Metadata Language: English (en)

Version

Version: 01

People who looked at this resource also viewed the following: