Apache Tika - a content analysis toolkit
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Languages:
Portuguese (pt),
French (fr),
Finnish (fi),
Italian (it),
Dutch; Flemish (nl),
Modern Greek (1453-) (el),
English (en),
Hungarian (hu),
Norwegian Bokmål (nb),
Swedish (sv),
German (de),
Spanish; Castilian (es),
Icelandic (is),
Polish (pl),
Danish (da),
Estonian (et)
People who looked at this resource also viewed the following: