Apache Tika - a content analysis toolkit

131 Last view: 2025-08-12

4 Last update: 2019-06-06

Apache Tika - a content analysis toolkit

https://tika.apache.org/

Apache Tika is a toolkit for detecting and extracting metadata and structured text content from from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Distribution

Availability: Available

Licences

Apache-2.0

Distribution Details

IPR Holders

Apache Software Foundation

Contact Person

Thierry Declerck

toolService

Tool (Annotation Of Document Structure)

Language Independent

Input

Media type: Text

Languages: Portuguese (pt), French (fr), Finnish (fi), Italian (it), Dutch; Flemish (nl), Modern Greek (1453-) (el), English (en), Hungarian (hu), Norwegian Bokmål (nb), Swedish (sv), German (de), Spanish; Castilian (es), Icelandic (is), Polish (pl), Danish (da), Estonian (et)

Resource Creation

Funding Project

Not Applicable (N/A)

Funding Type: Other

Metadata

Created: 24/03/2019

Last Updated: 24/03/2019

Metadata Language: English (en)

Metadata Creator

Thierry Declerck

People who looked at this resource also viewed the following:

Resources from the same project