CMU Sphinx
Stable release | 5-prealpha
/ August 3, 2015 |
---|---|
Written in | Cross-platform |
Type | Image library |
License | BSD-style[1] |
Website | cmusphinx |
Stable release | 5-prealpha
/ August 5, 2015 |
---|---|
Written in | Cross-platform |
Type | Image library |
License | BSD-style |
Website | cmusphinx |
CMU Sphinx, also called Sphinx for short, is the general term to describe a group of
In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training, language model compilation and a public domain pronunciation dictionary, cmudict.
Sphinx encompasses a number of software systems, described below.
Sphinx
Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models (HMMs) and an n-gram statistical language model. It was developed by Kai-Fu Lee. Sphinx featured feasibility of continuous-speech, speaker-independent large-vocabulary recognition, the possibility of which was in dispute at the time (1986). Sphinx is of historical interest only; it has been superseded in performance by subsequent versions. An archival article[2] describes the system in detail.
Sphinx 2
A fast performance-oriented recognizer, originally developed by
Sphinx 3
Sphinx 2 used a semi-continuous representation for acoustic modeling (i.e., a single set of Gaussians is used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx 3 adopted the prevalent continuous HMM representation and has been used primarily for high-accuracy, non-real-time recognition. Recent developments (in algorithms and in hardware) have made Sphinx 3 "near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active development and in conjunction with SphinxTrain provides access to a number of modern modeling techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article on
Sphinx 4
Sphinx 4 is a complete rewrite of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition, written entirely in the Java programming language. Sun Microsystems supported the development of Sphinx 4 and contributed software engineering expertise to the project. Participants included individuals at MERL, MIT and CMU. (Currently supported languages are C, C++, C#, Python, Ruby, Java, and JavaScript.)
Current development goals include:
- developing a new (acoustic model) trainer
- implementing speaker adaptation (e.g. MLLR)
- improving configuration management
- creating a graph-based UI for graphical system design
PocketSphinx
A version of Sphinx that can be used in embedded systems (e.g., based on an
See also
References
External links
- Sphinx developers recommend Vosk now
- CMU Sphinx homepage
- Sphinx' repository on GitHub should be considered the definitive source for code
- SourceForge hosts older releases and files
- NeXT on Campus Fall 1990 (This document is postscript format compressed with gzip.) Carnegie Mellon University - Breakthroughs in speech recognition and document management, pgs. 12-13