Apache Tika
Apache Software Foundation | |
Stable release | / 20 October 2023 |
---|---|
Apache License 2.0 | |
Website | tika |
Apache Tika is a content detection and
History
The project originated as part of the
Features
Tika provides capabilities for identification of more than 1400 file types from the Internet Assigned Numbers Authority taxonomy of MIME types. For most of the more common and popular formats,[3] Tika then provides content extraction, metadata extraction and language identification capabilities.
It can also get text from images by using the OCR software Tesseract.[4]
While Tika is written in
Notable uses
Tika is used by financial institutions including the
On April 4, 2016[11] Forbes published an article identifying Tika as one of the key technologies used by more than 400 journalists to analyze 11.5 million leaked documents that expose an international scandal involving world leaders storing money in offshore shell corporations. The leaked documents and the project to analyze them is referred to as the Panama Papers.
See also
References
- ^ "Apache Tika". Retrieved 2016-04-15.
- ^ "Tika Proposal". Retrieved 2016-04-15.
- ^ "The Apache Software Foundation". Apache Tika formats page. Retrieved 16 April 2016.
- ^ "TikaOCR". Apache Tika. 2019-03-26. Retrieved 2019-12-02.
- ^ "API Bindings for Tika". Apache Tika. Retrieved 2016-04-17.
- ^ "FICO to Engage Kaggle's Community of 180,000 Data Scientists to Drive Innovation in the FICO Analytic Cloud | FICO". FICO | Decisions. Archived from the original on 2016-06-03. Retrieved 2016-04-15.
- ^ "Goldman Sachs Puts Elasticsearch To Work - InformationWeek". InformationWeek. Retrieved 2017-06-21.
- ^ "Studying polar data with the help of Apache Tika". Opensource.com. Retrieved 2016-04-15.
- ^ "Text Extract for Drupal using Tika | Drupal.org". www.drupal.org. 30 July 2012. Retrieved 2016-04-15.
- ^ "Content Transformation and Metadata Extraction with Apache Tika - alfrescowiki". wiki.alfresco.com. 5 June 2015. Retrieved 2016-04-15.
- ^ Fox-Brewster, Thomas. "From Encrypted Drives To Amazon's Cloud -- The Amazing Flight Of The Panama Papers". Forbes. Retrieved 2016-04-15.