A Web-Based Infrastructure for Bulgarian Data Processing

The Bulgarian Language Processing Chain includes the following types of text processing and linguistic annotation: Sentence segmentation; Tokenisation; POS tagging and grammatical annotation; Lemmatisation.
Three different types of access are provided to facilitate the user access to the system:

online access – suitable for users who need processing of relatively small amount of data occasionally;
access via RESTful API – suitable for software developers who can integrate the processing tools in high level applications;
asynchronous access – suitable for time-consuming tasks such as processing large corpora – the user uploads the archived corpus, it is processed on the server, a notification email is sent upon completion of the task, and the annotated corpus can be downloaded.
The major advantages of the infrastructure are: affords high-quality linguistic processing of Bulgarian language resources;
supplies complex and compatible multi-level annotations; based on state-of-the-art technologies;
provides different levels of access that cater for the particular needs of different types of users; highly scalable, can be distributed on different machines.


Languages: Bulgarian (bg)