Croatian and English monolingual corpus from Croatian web resources

27 Last view: 2025-07-03

1 Last update: 2023-04-05

Croatian and English monolingual corpus from Croatian web resources

Attribution details: "Croatian and English monolingual corpus from Croatian web resources" compiled from corpora listed in ReadMe file by Consortium of National Language Technology Platform (NLTP) Project (Action number: 2018-EU-IA-0082). Published under CC-BY-SA-4.0 license.'}

https://elrc-share.eu/repository/browse/icelandic-and-english-monolingual-corpus-from-icelandic-web-resources/8dbc3e2ed38111eda54c00155d026706d250c51b34224f0e99091edfa01e0044/

Monolingual corpus of Croatian web resources collected during NLTP project.
Resource size:
Croatian: 1131719 sentences, 24303220 words
English: 218436 sentences, 5711041 words

Distribution

Availability: Under Review

Licences

CC-BY-SA-4.0

Conditions: Attribution, Share Alike

Distribution Details

Attribution Details: "Croatian and English monolingual corpus from Croatian web resources" compiled from corpora listed in ReadMe file by Consortium of National Language Technology Platform (NLTP) Project (Action number: 2018-EU-IA-0082). Published under CC-BY-SA-4.0 license.'}

Contact Person

Roberts Rozis

text

Monolingual text corpusLanguages

English (en)

Linguality

Linguality type: Monolingual

Text Format

Plain Text

Size

5,711,041 Words

218,436 Sentences

Character encoding

UTF-8

Monolingual text corpusLanguages

Croatian (hr)

Linguality

Linguality type: Monolingual

Text Format

Plain Text

Size

24,303,220 Words

1,131,719 Sentences

Character encoding

UTF-8

Resource Creation

Funding Project

National Language Technology Platform (NLTP - 2020-EU-IA-0084)

Funding Type: Eu Funds

Funding Country: European Union (EU)

Metadata

Created: 04/04/2023

Last Updated: 04/04/2023

Metadata Language: English (en)

Metadata Creator

Roberts Rozis

People who looked at this resource also viewed the following:

Resources from the same project

Croatian and English monolingual corpus from Croatian web resources

https://elrc-share.eu/repository/browse/icelandic-and-english-monolingual-corpus-from-icelandic-web-resources/8dbc3e2ed38111eda54c00155d026706d250c51b34224f0e99091edfa01e0044/

Monolingual corpus of Croatian web resources collected during NLTP project.Resource size: Croatian: 1131719 sentences, 24303220 wordsEnglish: 218436 sentences, 5711041 words

Monolingual corpus of Croatian web resources collected during NLTP project.
Resource size:
Croatian: 1131719 sentences, 24303220 words
English: 218436 sentences, 5711041 words