Home
Browse Resources
Help
About
What is ELRC-SHARE
LR Provision
Access to ELRC-SHARE Language Resources
Licensing LRs for the ELRC action
Notice and Takedown Policy
Disclaimers and Limitation of Liability
Log information, cookies and analytics
Data Protection Record
Register
Login
65
Last view: 2024-11-21
6
Last update: 2020-02-19
9
Last download: 2023-01-14
Monolingual Romanian corpus in the culture domain
Monolingual Romanian corpus, containing 2922196 tokens and 413847 lexical types in the culture domain.
DSI Relevance:
Europeana
Back
Download
Distribution
Availability:
Available
Licences
Open Under-PSI
Used for resources that fall under the scope of PSI (Public Sector Information) regulations, and for which no further information is required or available. For more information on the EU legislation on the reuse of Public Sector Information, see here: https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information.
Distribution Details
IPR Holders
Institutul Național al Patrimoniului (INP)
http://cimec.ro/
Institutul Național al Patrimoniului (INP)
[javascript protected email address]
Romania (RO)
Contact Person
Prokopis Prokopidis
http://nlp.ilsp.gr/~...
Institute for Language and Speech Processing / Athena Research Center
ILSP / ATHENA R.C.
[javascript protected email address]
Artemidos 6 & Epidavrou
GR-151 25 Maroussi
Greece
Tel.: +30 2106875432
http://www.ilsp.gr/
,
http://www.athenarc.gr
ILSP / ATHENA R.C.
Greece
text
Monolingual text corpus
Languages
Romanian; Moldavian; Moldovan (ro)
Language Script:
Latin
Linguality
Linguality type:
Monolingual
Text Format
XML
Size
413,847 Lexical Types
6,384 Files
2,922,196 Tokens
Character encoding
UTF-8
Domains
SOCIAL QUESTIONS
Culture And Religion (Eurovoc 2831)
EUROVOC
Creation
Creation mode details:
The ILSP Focused Crawler was used for the acquisition of monolingual data from websites, and for the normalization, cleaning, (near)deduplication on document level.
Creation mode:
Automatic
Creation Tools
http://nlp.ilsp.gr/r...
Resource Creation
Created using ELRC Services
Funding Project
Connecting Europe Facility-European Language Resource Coordination
(CEF-ELRC - LANGUAGE RESOURCE COORDINATION-SMART 2014/1074-30-CE-0696785/00-64)
URL:
http://www.lr-coordi...
Funding Type:
Service Contract
Funder:
European Commission
Funding Country:
European Union (EU)
Project duration:
29/03/2015 - 16/04/2017
Metadata
Created:
22/09/2016
Last Updated:
22/09/2016
Metadata Language:
English (en)
Metadata Creator
Maria Giagkou
Institute for Language and Speech Processing / Athena Research Center
ILSP / ATHENA R.C.
[javascript protected email address]
Greece (GR)
http://www.ilsp.gr
,
http://www.athenarc.gr
ILSP / ATHENA R.C.
Greece
Relations
Related Resource:
Monolingual Romanian corpus in the culture domain (Processed)
Relation Type:
Has Part
Resources from the same project