Carrot2
Developer(s) | Carrot Search |
---|---|
Stable release | 4.5.2
/ November 6, 2023 |
BSD license | |
Website | search |
Carrot²
History
The initial version of Carrot² was implemented in 2001 by Dawid Weiss as part of his MSc thesis to validate the applicability of the STC clustering algorithm to clustering search results in Polish.
Release | Release Date | Major changes and new features |
---|---|---|
4.5.2 | November 2023 | Dependency updates, build system improvements. |
4.5.1 | May 2023 | Dependency updates, minor bug fixes. |
4.5.0 | November 2022 | Dependency updates, bug fixes. |
4.4.3 | August 2022 | Dependency updates, bug fixes to STC and stemming infrastructure. |
4.4.0, 4.4.1, 4.4.2 | December 2021 | Security fixes and dependency updates. |
4.3.0 | July 2021 | Minor API changes and bug fixes. Improvements to the workbench (DCS search frontend). |
4.2.0, 4.2.1 | March 2021 | Improvements to JSON dictionaries and the workbench. Bug fixes. |
4.1.0 | January 2021 | Web-based Workbench. JSON dictionaries and new filtering options. API polishing. |
4.0.0 | July 2020 | API changes and simplifications across the codebase. Removal of deprecated technologies and tools. New documentation and code cleanups. |
3.16.2 | September 2019 | Update third party libraries (security-related issues). |
3.16.1 | January 2019 | Update of JS visualizations. Migration of Microsoft Bing API v5 to v7. |
3.16.0 | May 2018 | An overhaul of Java 9+ compatibility issues. Workbench compatibility for Ubuntu distros. Document source updates and removals of non-functional document sources. |
3.15.1 | March 2017 | A bugfix for .NET release that could result in unchecked I/O exceptions on inaccessible current working directory. |
3.15.0 | October 2016 | Bing API V2 to V5 transition. Upgrade of third party dependencies. Internal cosmetics. |
3.14.0 | September 2016 | Workbench improvements (high DPI support, MacOSX improvements, bug fixes). PubMed switching to HTTPs. Other minor improvements. |
3.13.0 | July 2016 | Servlet API bug fixes, Workbench bug fixes, removed Google document source, fixed language codes for a few languages. |
3.12.0 | February 2016 | Upgrade of Morfologik Polish dictionary, infrastructural changes and adjustments allowing C2 to operate under more strict security manager policies. |
3.11.0 | October 2015 | Upgrade of Apache Lucene, bug fixes and a rollup of changes from 3.10.x minors. |
3.10.4 | October 2015 | Upgrade of Morfologik library. |
3.10.3 | August 2015 | Repackaged Google Guava to avoid conflicts in Solr. |
3.10.2 | July 2015 | Minor fixes to the Workbench (Arabic cluster display). |
3.10.1 | May 2015 | Aduna visualization dropped from MacOS distribution. Minor fixes to the Workbench. |
3.10.0 | May 2015 | Visualization updates. Bug fixes. Library dependency updates. |
3.9.4 | November 2014 | FoamTree update. New attributes for multilingual clustering. Visualization fixes. |
3.9.3 | July 2014 | FoamTree update. Infrastructure fixes and tweaks (jflex, sonatype repository URLs). |
3.9.2 | April 2014 | Bug fix to FoamTree HTML5. |
3.9.1 | April 2014 | Bug fixes, upgrades of HTML5 visualizations. |
3.9.0 | February 2014 | HTML5 visualizations replacing flash, library dependencies update, bugfixes. |
3.8.1 | October 2013 | Bug fixes, minor tweaks to functionality. |
3.8.0 | July 2013 | Bug fixes, library dependency updates. |
3.7.1 | May 2013 | Minor bug fixes (3.7.0 maintenance release). |
3.7.0 | April 2013 | Infrastructure changes to the core (string IDs), better Solr integration XSLT, Workbench tweaks for larger inputs, updated dependencies. |
3.6.3 | April 2013 | Minor bug fixes and improvements: customization of Solr adapter XSLT, Workbench tweaks for larger inputs, updated dependencies. |
3.6.2 | November 2012 | Minor bug fixes and improvements. |
3.6.1 | August 2012 | Minor bug fixes. |
3.6.0 | June 2012 | Infrastructural changes, refactorings and bug fixes. |
3.5.3 | December 2011 | Infrastructure updates resulting from migration to GitHub. Workbench update to SWT 3.7.1. |
3.5.2 | September 2011 | Ajax support in Document Clustering Server, Bing document source improved, Workbench improvements, bug fixes. |
3.5.1 | June 2011 | Bug fixes, visualization integration improvements, support for Yahoo BOSS API removed. |
3.5.0 | May 2011 | FoamTree visualization, bisecting k-means clustering, resource management improvements |
3.4.3 | March 2011 | Distribution to Maven central repository |
3.4.2 | October 2010 | Bug fixes |
3.4.1 | September 2010 | Solr 1.4.x compatibility package, bug fixes |
3.4.0 | August 2010 | .NET API for calling Carrot² clustering |
3.3.0 | April 2010 | Significant scalability improvements in the STC clustering algorithm |
3.2.0 | March 2010 | Experimental support for clustering Arabic and Korean content, command line application for clustering in batch mode, LGPL -licensed dependencies removed
|
3.1.0 | September 2009 | Experimental support for clustering Chinese content, search results clustering plugin for Apache Solr |
3.1.0 | September 2009 | Experimental support for clustering Chinese content, search results clustering plugin for Apache Solr |
3.0.1 | March 2009 | Document Clustering Workbench available for Mac OS X |
3.0.0 | January 2009 | Document Clustering Workbench added for easy experimenting with Carrot² clustering, radically simplified Java API, search results clustering web application re-implemented, user manual[5] available |
2.1.0 | August 2007 | Document Clustering Server added for exposing clustering as a REST service |
2.0.0 | September 2006 | New user interface of the search results clustering web application |
1.0.0 | January 2006 | First official release, binaries available on SourceForge |
0.0.0 | since 2002 | Incubation releases, source code available on SourceForge |
Architecture
Carrot² 4.0 is predominantly a Java programming library with public APIs for management of language-specific resources, algorithm configuration and execution. A HTTP/REST component (document clustering server) is provided for interoperability with other languages.
Clustering algorithms
Carrot² offers a few document clustering algorithms that place emphasis on the quality of cluster labels:
- Lingo:[4] a clustering algorithm based on the Singular value decomposition
- STC:[6] Suffix Tree Clustering
Spin-offs
Carrot Search
Carrot Search,[7] a commercial spin-off of the Carrot² project, works on further development of Carrot², offers a real-time text clustering algorithm[8] compliant with the Carrot² framework as well as text mining consulting services based on open source and proprietary software.
Carrot Search Labs
Carrot² gave rise to a number of independent open source projects released under the umbrella of Carrot Search Labs.[9] The following projects are or were published as part of this initiative:
- Randomized Testing: a JUnit test runner with built-in utilities to make every test run slightly different (randomized). Also an ANT task for running JUnit tests on parallel JVMs, with load balancing and other bells and whistles.
- High Performance Primitive Collections for Java (HPPC): Lists, Sets, Maps and other collections of primitives for Java tuned for highest performance and memory efficiency.
- SmartSprites: fully automatic maintenance of CSS sprites; no tedious copying and pasting to the CSS when adding or changing sprited images.
Discontinued projects:
- jSuffixArrays: Several Java implementations of the Suffix Array data structure with different performance and memory characteristics.
- JUnitBenchmarks: A set of extensions for turning JUnit4 tests into performance micro-benchmarks with GC monitoring, time variance measurement and simple graphical visualizations.
See also
- Free software portal
References
- ^ Carrot2 Project, Stanislaw Osinski, Dawid Weiss. "Carrot2 - Open Source Search Results Clustering Engine".
{{cite web}}
: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link) - ^ Carrot2 search results clustering demo
- ^ Dawid Weiss: A Clustering Interface for Web Search Results in Polish and English. MSc thesis. Poznan University of Technology, Poznań, Poland, 2001 download PDF
- ^ a b Stanisław Osiński, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 48–54.
- ^ "Carrot2".
- ^ Oren Zamir, Oren Etzioni: Web Document Clustering: A Feasibility Demonstration, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (1998), pp. 46–54
- ^ Carrot Search s.c. "Carrot Search: document clustering and visualization software".
- ^ Carrot Search s.c. "Carrot Search: Lingo3G: Text Document Clustering Engine".
- ^ Carrot Search s.c. "Carrot Search Labs".