Home
Browse Resources
Help
About
What is ELRC-SHARE
LR Provision
Access to ELRC-SHARE Language Resources
Licensing LRs for the ELRC action
Notice and Takedown Policy
Disclaimers and Limitation of Liability
Log information, cookies and analytics
Data Protection Record
Register
Login
75
Last view: 2023-09-25
4
Last update: 2018-07-31
13
Last download: 2023-02-28
Monolingual Bulgarian corpus in the public administration domain
Monolingual Bulgarian corpus, containing 27028434 tokens and 932105 lexical types in the public administration domain.
Back
Download
Distribution
Availability:
Available
Licences
Open Under-PSI
Used for resources that fall under the scope of PSI (Public Sector Information) regulations, and for which no further information is required or available. For more information on the EU legislation on the reuse of Public Sector Information, see here: https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information.
Distribution Details
Contact Person
Prokopis Prokopidis
http://nlp.ilsp.gr/~...
Institute for Language and Speech Processing / Athena Research Center
ILSP / ATHENA R.C.
[javascript protected email address]
Artemidos 6 & Epidavrou
GR-151 25 Maroussi
Greece
Tel.: +30 2106875432
http://www.ilsp.gr/
,
http://www.athenarc.gr
ILSP / ATHENA R.C.
Greece
text
Monolingual text corpus
Languages
Bulgarian (bg)
Language Script:
Cyrillic
Linguality
Linguality type:
Monolingual
Text Format
XML
Size
932,105 Lexical Types
46,001 Files
27,028,434 Tokens
Character encoding
UTF-8
Domains
POLITICS
Executive Power And Public Service (Eurovoc 0436)
EUROVOC
Creation
Creation mode details:
The ILSP Focused Crawler was used for the acquisition of monolingual data from websites, and for the normalization, cleaning, (near)deduplication on document level.
Creation mode:
Automatic
Creation Tools
http://nlp.ilsp.gr/r...
Resource Creation
Created using ELRC Services
Funding Project
Connecting Europe Facility-European Language Resource Coordination
(CEF-ELRC - LANGUAGE RESOURCE COORDINATION-SMART 2014/1074-30-CE-0696785/00-64)
URL:
http://www.lr-coordi...
Funding Type:
Service Contract
Funder:
European Commission
Funding Country:
European Union (EU)
Project duration:
29/03/2015 - 16/04/2017
Metadata
Created:
22/09/2016
Last Updated:
22/09/2016
Metadata Language:
English (en)
Metadata Creator
Maria Giagkou
Institute for Language and Speech Processing / Athena Research Center
ILSP / ATHENA R.C.
[javascript protected email address]
Greece (GR)
http://www.ilsp.gr
,
http://www.athenarc.gr
ILSP / ATHENA R.C.
Greece
Version
Version:
1.0
Relations
Related Resource:
Monolingual Bulgarian corpus in the public administration domain (Processed)
Relation Type:
Has Version
Resources from the same project