A Web-Based Infrastructure for Bulgarian Data Processing
The Bulgarian Language Processing Chain includes the following types of text processing and linguistic annotation: Sentence segmentation; Tokenisation; POS tagging and grammatical annotation; Lemmatisation.
Three different types of access are provided to facilitate the user access to the system:
online access – suitable for users who need processing of relatively small amount of data occasionally;
access via RESTful API – suitable for software developers who can integrate the processing tools in high level applications;
asynchronous access – suitable for time-consuming tasks such as processing large corpora – the user uploads the archived corpus, it is processed on the server, a notification email is sent upon completion of the task, and the annotated corpus can be downloaded.
The major advantages of the infrastructure are: affords high-quality linguistic processing of Bulgarian language resources;
supplies complex and compatible multi-level annotations; based on state-of-the-art technologies;
provides different levels of access that cater for the particular needs of different types of users; highly scalable, can be distributed on different machines.
Languages: Bulgarian (bg)
People who looked at this resource also viewed the following:
- bgMWE – a tool for MWE recognition
- Croatian Language Web Services
- Bilingual Croatian-English Parallel Corpus (Processed)
- Bilingual Bulgarian-English corpus from the 2018 Proposal for a National Climate Change Adaptation Strategy and Action Plan from the website of the Bulgarian Ministry of Environment and Water (Processed)