Collocation and Term Extractor – ELRC-SHARE

70 Last view: 2024-05-29

3 Last update: 2019-06-06

Collocation and Term Extractor

CollTerm

http://nlp.ffzg.hr/resources/tools/collterm/

CollTerm is a language independent tool for collocation and term extraction. It is an application that collects collocation and term candidates based on five different co occurrence measures for multiword units (i.e. collocations) or distributional differences from large representative corpus by application of the TF-IDF measurement on singleword units. The language dependent part consists of stop-word list and list of MWU MSD-patterns that can be coded with regular expressions as well. The application is describe in the paper presented at TKE2012 by Pinnis, M., Ljubešić, N., Ştefănescu, D., Skadiņa, I, Tadić, Gornostay, T. Term Extraction, Tagging, and Mapping Tools for Under-Resourced Languages. The first version of this application is available as an integral part of ACCURAT Toolkit that is available under Apache 2.0 license (http://www.accurat-project.eu/index.php?p=accurat-toolkit). In this version of the tool a calibration of MWU MSD-patterns has been provided for Croatian thus enhancing the usability of the tool. The plan is to provide calibration for other CESAR languages as well.

Distribution

Availability: Available

Licences

Distribution Details

IPR Holders

University of Zagreb

Contact Person

Nikola Ljubešić

toolService

Tool (Term Extraction)

Language Independent

Input

Media type: Text

Resource type: Language Description

Output

Media type: Text

Resource type: Lexical Conceptual Resource

Resource Creation

Funding Project

Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation (ACCURAT)

Funding Types: Eu Funds, National Funds

Central and South-East European Resources (CESAR)

Funding Types: Eu Funds, National Funds

Metadata

Created: 16/04/2019

Last Updated: 16/04/2019

Metadata Language: English (en)

Metadata Creator

People who looked at this resource also viewed the following:

Resources from the same project