Shallow Processing with Unification and Typed Feature Structures

41 Last view: 2025-09-08

2 Last update: 2020-02-14

Shallow Processing with Unification and Typed Feature Structures

SProUT

http://sprout.dfki.de/

SProUT is DFKI LT Lab's linguistic army knife, a flexible multi-purpose engine for domain-independent and domain-specific multilingual NLP tasks such as structured named entity recognition, information extraction, opinion mining, ontology extraction from text, and many more.
SProUT is also a platform for development of multilingual shallow text processing and information extraction systems.
It consists of several reusable Unicode-capable online linguistic processing components for basic linguistic operations ranging from tokenization to coreference matching. Since typed feature structures (TFS) are used as a uniform data structure for representing the input and output by each of these processing resources, they can be flexibly combined into a pipeline that produces several streams of linguistically annotated structures, which serve as an input for the shallow grammar interpreter, applied at the next stage.
The grammar formalism in SProUT, called XTDL is a blend of very efficient finite-state techniques and unification-based formalisms which are known to guarantee transparency and expressiveness. A grammar in SProUT consists of pattern/action rules, where the LHS of a rule is a regular expression over TFSs with functional operators and coreferences, representing the recognition pattern, and the RHS of a rule is a TFS specification of the output structure.
Furthermore, SProUT comes with an integrated grammar development and testing environment.
Currently, the platform provides linguistic processing resources for several languages including among other English, German, French, Italian, Durch, Spanish, Polish, Czech, Chinese, and Japanese.

Distribution

Availability: Available

Licences

Non-standard/ Other Licence/ Terms

http://sprout.dfki.de/Licencing.html

Distribution Details

IPR Holders

Deutsches Forschungszentrum für Kunstliche Intelligenz

Contact Person

DFKI Language Technology Lab

toolService

Suite Of Tools (Chunking, Dependency Parsing, Named Entity Recognition, Other, Semantic Class Labelling, Sentence Splitting, Tokenization)

Language Independent

Input

Media type: Text

Languages: German (de), French (fr), Italian (it), Dutch; Flemish (nl), Spanish; Castilian (es), Polish (pl), Czech (cs), Japanese (ja), English (en), Chinese (zh)

Resource Creation

Funding Project

Not Applicable (N/A)

Funding Type: Other

Metadata

Created: 15/04/2019

Last Updated: 15/04/2019

Metadata Language: English (en)

Metadata Creator

Kanella Pouli

People who looked at this resource also viewed the following:

Resources from the same project