Home
Browse Resources
Help
About
What is ELRC-SHARE
LR Provision
Access to ELRC-SHARE Language Resources
Licensing LRs for the ELRC action
Notice and Takedown Policy
Disclaimers and Limitation of Liability
Log information, cookies and analytics
Data Protection Record
Register
Login
70
Last view: 2024-10-16
4
Last update: 2020-02-19
9
Last download: 2023-01-14
Monolingual Polish corpus in the culture domain
Monolingual Polish corpus, containing 10245866 tokens and 987947 lexical types in the culture domain.
DSI Relevance:
Europeana
Back
Download
Distribution
Availability:
Available
Licences
Open Under-PSI
Used for resources that fall under the scope of PSI (Public Sector Information) regulations, and for which no further information is required or available. For more information on the EU legislation on the reuse of Public Sector Information, see here: https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information.
Distribution Details
IPR Holders
Warsaw Tourist Office
https://warsawtour.p...
Warsaw Tourist Office
Contact Person
Prokopis Prokopidis
http://nlp.ilsp.gr/~...
Institute for Language and Speech Processing / Athena Research Center
ILSP / ATHENA R.C.
[javascript protected email address]
Artemidos 6 & Epidavrou
GR-151 25 Maroussi
Greece
Tel.: +30 2106875432
http://www.ilsp.gr/
,
http://www.athenarc.gr
ILSP / ATHENA R.C.
Greece
text
Monolingual text corpus
Languages
Polish (pl)
Language Script:
Latin
Linguality
Linguality type:
Monolingual
Text Format
XML
Size
987,947 Lexical Types
27,532 Files
10,245,866 Tokens
Character encoding
UTF-8
Domains
SOCIAL QUESTIONS
Culture And Religion (Eurovoc 2831)
EUROVOC
Creation
Creation mode details:
The ILSP Focused Crawler was used for the acquisition of monolingual data from websites, and for the normalization, cleaning, (near)deduplication on document level.
Creation mode:
Automatic
Creation Tools
http://nlp.ilsp.gr/r...
Resource Creation
Created using ELRC Services
Funding Project
Connecting Europe Facility-European Language Resource Coordination
(CEF-ELRC - LANGUAGE RESOURCE COORDINATION-SMART 2014/1074-30-CE-0696785/00-64)
URL:
http://www.lr-coordi...
Funding Type:
Service Contract
Funder:
European Commission
Funding Country:
European Union (EU)
Project duration:
29/03/2015 - 16/04/2017
Metadata
Created:
22/09/2016
Last Updated:
12/04/2017
Metadata Language:
English (en)
Metadata Creator
Kanella Pouli
[javascript protected email address]
Greece
Maria Giagkou
Institute for Language and Speech Processing / Athena Research Center
ILSP / ATHENA R.C.
[javascript protected email address]
Greece (GR)
http://www.ilsp.gr
,
http://www.athenarc.gr
ILSP / ATHENA R.C.
Greece
Relations
Related Resource:
Monolingual Polish corpus in the culture domain (part1) (Processed)
Relation Type:
Has Part
Resources from the same project