hunalign

50 Last view: 2024-07-07

3 Last update: 2019-06-04

http://mokk.bme.hu/resources/hunalign/

hunalign aligns bilingual text on the sentence level. Its input is tokenized and sentence-segmented text in two languages. In the
simplest case, its output is a sequence of bilingual sentence pairs (bisentences). In the presence of a dictionary, hunalign uses it, combining this information with Gale-Church sentence-length information. In the absence of a dictionary, it first falls back to sentence-length information, and then builds an automatic dictionary based on this alignment. Then it realigns the text in a second pass, using the automatic dictionary. Like most sentence aligners, hunalign does not deal with changes of sentence order: it is unable to come up with crossing alignments, i.e., segments A and B in one language corresponding to segments B’ A’ in the other language.
There is nothing Hungarian-specific in hunalign, the name simply reflects the fact that it is part of the hun* NLP toolchain. hunalign was written in portable C++. It can be built under basically any kind of operating system.

Distribution

Availability: Available

Licences

LGPL-2.1

Distribution Details

Download locations : ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.1.tgz, ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.1-windows.zip

Distribution Medium: Data Downloadable

Contact Person

Daniel Varga

toolService

Tool (Alignment)

Language Independent

Input

Media type: Text

Resource type: Language Description

Output

Media type: Text

Resource type: Lexical Conceptual Resource

Operation

Operating system: Linux, Windows

Resource Creation

Funding Project

Not Applicable (N/A)

Funding Type: Other

Metadata

Created: 16/04/2019

Last Updated: 16/04/2019

Metadata Language: English (en)

Metadata Creator

Kanella Pouli

People who looked at this resource also viewed the following:

Resources from the same project