.TH POCKETSPHINX 1 "2022-09-27" .SH NAME pocketsphinx \- Run speech recognition on audio data .SH SYNOPSIS .B pocketsphinx [ \fIoptions\fR... ] [ \fBlive\fR | \fBsingle\fR | \fBhelp\fR | \fBsoxflags\fR ] \fIINPUTS\fR... .SH DESCRIPTION .PP The ‘\f[CR]pocketsphinx\fP’ command-line program reads single-channel 16-bit PCM audio one or more input files (or ‘\f[CR]-\fP’ to read from standard input), and attemps to recognize speech in it using the default acoustic and language model. The input files can be raw audio, WAV, or NIST Sphere files, though some of these may not be recognized properly. It accepts a large number of options which you probably don't care about, and a \fIcommand\fP which defaults to ‘\f[CR]live\fP’. The commands are as follows: .TP .B help Print a long list of those options you don't care about. .TP .B config Dump configuration as JSON to standard output (can be loaded with the ‘\f[CR]-config\fP’ option). .TP .B live Detect speech segments in input files, run recognition on them (using those options you don't care about), and write the results to standard output in line-delimited JSON. I realize this isn't the prettiest format, but it sure beats XML. Each line contains a JSON object with these fields, which have short names to make the lines more readable: .IP "b": Start time in seconds, from the beginning of the stream .IP "d": Duration in seconds .IP "p": Estimated probability of the recognition result, i.e. a number between 0 and 1 which may be used as a confidence score .IP "t": Full text of recognition result .IP "w": List of segments (usually words), each of which in turn contains the ‘\f[CR]b\fP’, ‘\f[CR]d\fP’, ‘\f[CR]p\fP’, and ‘\f[CR]t\fP’ fields, for start, end, probability, and the text of the word. In the future we may also support hierarchical results in which case ‘\f[CR]w\fP’ could be present. .TP .B single Recognize the input as a single utterance, and write a JSON object in the same format described above. .TP .B align Align a single input file (or ‘\f[CR]-\fP’ for standard input) to a word sequence, and write a JSON object in the same format described above. The first positional argument is the input, and all subsequent ones are concatenated to make the text, to avoid surprises if you forget to quote it. You are responsible for normalizing the text to remove punctuation, uppercase, centipedes, etc. For example: .EX pocketsphinx align goforward.wav "go forward ten meters" .EE By default, only word-level alignment is done. To get phone alignments, pass `-phone_align yes` in the flags, e.g.: .EX pocketsphinx -phone_align yes align audio.wav $text .EE This will make not particularly readable output, but you can use .B jq (https://stedolan.github.io/jq/) to clean it up. For example, you can get just the word names and start times like this: .EX pocketsphinx align audio.wav $text | jq '.w[]|[.t,.b]' .EE Or you could get the phone names and durations like this: .EX pocketsphinx -phone_align yes align audio.wav $text | jq '.w[]|.w[]|[.t,.d]' .EE There are many, many other possibilities, of course. .TP .B help Print a usage and help text with a list of possible arguments. .TP .B soxflags Return arguments to ‘\f[CR]sox\fP’ which will create the appropriate input format. Note that because the ‘\f[CR]sox\fP’ command-line is slightly quirky these must always come \fIafter\fP the filename or ‘\f[CR]-d\fP’ (which tells ‘\f[CR]sox\fP’ to read from the microphone). You can run live recognition like this: .EX sox -d $(pocketsphinx soxflags) | pocketsphinx - .EE or decode from a file named "audio.mp3" like this: .EX sox audio.mp3 $(pocketsphinx soxflags) | pocketsphinx - .EE .PP By default only errors are printed to standard error, but if you want more information you can pass ‘\f[CR]-loglevel INFO\fP’. Partial results are not printed, maybe they will be in the future, but don't hold your breath. Force-alignment is likely to be supported soon, however. .SH OPTIONS .\" ### ARGUMENTS ### .SH AUTHOR Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines .SH COPYRIGHT Copyright \(co 1994-2016 Carnegie Mellon University. See the file \fILICENSE\fR included with this package for more information. .br .SH "SEE ALSO" .BR pocketsphinx_batch (1), .BR sphinx_fe (1). .br