pocketsphinx

5610573 about 3 years ago

5.62 kB

	PocketSphinx 5.0.0
	==================

	This is PocketSphinx, one of Carnegie Mellon University's open source large
	vocabulary, speaker-independent continuous speech recognition engines.

	Although this was at one point a research system, active development
	has largely ceased and it has become very, very far from the state of
	the art. I am making a release, because people are nonetheless using
	it, and there are a number of historical errors in the build system
	and API which needed to be corrected.

	The version number is strangely large because there was a "release"
	that people are using called 5prealpha, and we will use proper
	[semantic versioning](https://semver.org/) from now on.

	Please see the LICENSE file for terms of use.

	Installation
	------------

	You should be able to install this with pip for recent platforms and
	versions of Python:

	pip3 install pocketsphinx

	Alternately, you can also compile it from the source tree. I highly
	suggest doing this in a virtual environment (replace
	`~/ve_pocketsphinx` with the virtual environment you wish to create),
	from the top level directory:

	python3 -m venv ~/ve_pocketsphinx
	. ~/ve_pocketsphinx/bin/activate
	pip3 install .

	On GNU/Linux and maybe other platforms, you must have
	[PortAudio](http://www.portaudio.com/) installed for the `LiveSpeech`
	class to work (we may add a fall-back to `sox` in the near future).
	On Debian-like systems this can be achieved by installing the
	`libportaudio2` package:

	sudo apt-get install libportaudio2

	Usage
	-----

	See the [examples directory](../examples/) for a number of examples of
	using the library from Python. You can also read the [documentation
	for the Python API](https://pocketsphinx.readthedocs.io) or [the C
	API](https://cmusphinx.github.io/doc/pocketsphinx/).

	It also mostly supports the same APIs as the previous
	[pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python)
	module, as described below.

	### LiveSpeech

	An iterator class for continuous recognition or keyword search from a
	microphone. For example, to do speech-to-text with the default (some
	kind of US English) model:

	```python
	from pocketsphinx import LiveSpeech
	for phrase in LiveSpeech(): print(phrase)
	```

	Or to do keyword search:

	```python
	from pocketsphinx import LiveSpeech

	speech = LiveSpeech(keyphrase='forward', kws_threshold=1e-20)
	for phrase in speech:
	print(phrase.segments(detailed=True))
	```

	With your model and dictionary:

	```python
	import os
	from pocketsphinx import LiveSpeech, get_model_path

	speech = LiveSpeech(
	sampling_rate=16000, # optional
	hmm=get_model_path('en-us'),
	lm=get_model_path('en-us.lm.bin'),
	dic=get_model_path('cmudict-en-us.dict')
	)

	for phrase in speech:
	print(phrase)
	```

	### AudioFile

	This is an iterator class for continuous recognition or keyword search
	from a file. Currently it supports only raw, single-channel, 16-bit
	PCM data in native byte order.

	```python
	from pocketsphinx import AudioFile
	for phrase in AudioFile("goforward.raw"): print(phrase) # => "go forward ten meters"
	```

	An example of a keyword search:

	```python
	from pocketsphinx import AudioFile

	audio = AudioFile("goforward.raw", keyphrase='forward', kws_threshold=1e-20)
	for phrase in audio:
	print(phrase.segments(detailed=True)) # => "[('forward', -617, 63, 121)]"
	```

	With your model and dictionary:

	```python
	import os
	from pocketsphinx import AudioFile, get_model_path

	model_path = get_model_path()

	config = {
	'verbose': False,
	'audio_file': 'goforward.raw',
	'hmm': get_model_path('en-us'),
	'lm': get_model_path('en-us.lm.bin'),
	'dict': get_model_path('cmudict-en-us.dict')
	}

	audio = AudioFile(**config)
	for phrase in audio:
	print(phrase)
	```

	Convert frame into time coordinates:

	```python
	from pocketsphinx import AudioFile

	# Frames per Second
	fps = 100

	for phrase in AudioFile(frate=fps): # frate (default=100)
	print('-' * 28)
	print('\| %5s \| %3s \| %4s \|' % ('start', 'end', 'word'))
	print('-' * 28)
	for s in phrase.seg():
	print('\| %4ss \| %4ss \| %8s \|' % (s.start_frame / fps, s.end_frame / fps, s.word))
	print('-' * 28)

	# ----------------------------
	# \| start \| end \| word \|
	# ----------------------------
	# \| 0.0s \| 0.24s \| <s> \|
	# \| 0.25s \| 0.45s \| <sil> \|
	# \| 0.46s \| 0.63s \| go \|
	# \| 0.64s \| 1.16s \| forward \|
	# \| 1.17s \| 1.52s \| ten \|
	# \| 1.53s \| 2.11s \| meters \|
	# \| 2.12s \| 2.6s \| </s> \|
	# ----------------------------
	```

	Authors
	-------

	PocketSphinx is ultimately based on `Sphinx-II` which in turn was
	based on some older systems at Carnegie Mellon University, which were
	released as free software under a BSD-like license thanks to the
	efforts of Kevin Lenzo. Much of the decoder in particular was written
	by Ravishankar Mosur (look for "rkm" in the comments), but various
	other people contributed as well, see [the AUTHORS file](./AUTHORS)
	for more details.

	David Huggins-Daines (the author of this document) is
	guilty^H^H^H^H^Hresponsible for creating `PocketSphinx` which added
	various speed and memory optimizations, fixed-point computation, JSGF
	support, portability to various platforms, and a somewhat coherent
	API. He then disappeared for a while.

	Nickolay Shmyrev took over maintenance for quite a long time
	afterwards, and a lot of code was contributed by Alexander Solovets,
	Vyacheslav Klimkov, and others. The
	[pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python)
	module was originally written by Dmitry Prazdnichnov.

	Currently this is maintained by David Huggins-Daines again.