pocketsphinx-20.04 / doxygen /pocketsphinx_batch.1

pocketsphinx-20.04

37a92a9 about 3 years ago

9.8 kB

	.TH POCKETSPHINX_BATCH 1 "2007-08-27"
	.SH NAME
	pocketsphinx_batch \- Run speech recognition in batch mode
	.SH SYNOPSIS
	.B pocketsphinx_batch
	.RI \fB\-ctl\fR
	\fIctlfile\fR
	\fB\-cepdir\fR
	\fIcepdir\fR
	\fB\-cepext\fR
	\fI.mfc\fR
	[\fI options \fR]...
	.SH DESCRIPTION
	.PP
	Run speech recognition over a list of utterances in batchmode. A list
	of arguments follows:
	.TP
	.B \-adchdr
	Size of audio file header in bytes (headers are ignored)
	.TP
	.B \-adcin
	Input is raw audio data
	.TP
	.B \-agc
	Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
	.TP
	.B \-agcthresh
	Initial threshold for automatic gain control
	.TP
	.B \-allphone
	phoneme decoding with phonetic lm
	.TP
	.B \-allphone_ci
	Perform phoneme decoding with phonetic lm and context-independent units only
	.TP
	.B \-alpha
	Preemphasis parameter
	.TP
	.B \-argfile
	file giving extra arguments.
	.TP
	.B \-ascale
	Inverse of acoustic model scale for confidence score calculation
	.TP
	.B \-aw
	Inverse weight applied to acoustic scores.
	.TP
	.B \-backtrace
	Print results and backtraces to log file.
	.TP
	.B \-beam
	Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
	.TP
	.B \-bestpath
	Run bestpath (Dijkstra) search over word lattice (3rd pass)
	.TP
	.B \-bestpathlw
	Language model probability weight for bestpath search
	.TP
	.B \-build_outdirs
	Create missing subdirectories in output directory
	.TP
	.B \-cepdir
	files directory (prefixed to filespecs in control file)
	.TP
	.B \-cepext
	Input files extension (suffixed to filespecs in control file)
	.TP
	.B \-ceplen
	Number of components in the input feature vector
	.TP
	.B \-cmn
	Cepstral mean normalization scheme ('current', 'prior', or 'none')
	.TP
	.B \-cmninit
	Initial values (comma-separated) for cepstral mean when 'prior' is used
	.TP
	.B \-compallsen
	Compute all senone scores in every frame (can be faster when there are many senones)
	.TP
	.B \-ctl
	file listing utterances to be processed
	.TP
	.B \-ctlcount
	No. of utterances to be processed (after skipping \fB\-ctloffset\fR entries)
	.TP
	.B \-ctlincr
	Do every Nth line in the control file
	.TP
	.B \-ctloffset
	No. of utterances at the beginning of \fB\-ctl\fR file to be skipped
	.TP
	.B \-ctm
	output in CTM file format (may require post-sorting)
	.TP
	.B \-debug
	level for debugging messages
	.TP
	.B \-dict
	pronunciation dictionary (lexicon) input file
	.TP
	.B \-dictcase
	Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
	.TP
	.B \-dither
	Add 1/2-bit noise
	.TP
	.B \-doublebw
	Use double bandwidth filters (same center freq)
	.TP
	.B \-ds
	Frame GMM computation downsampling ratio
	.TP
	.B \-fdict
	word pronunciation dictionary input file
	.TP
	.B \-feat
	Feature stream type, depends on the acoustic model
	.TP
	.B \-featparams
	containing feature extraction parameters.
	.TP
	.B \-fillprob
	Filler word transition probability
	.TP
	.B \-frate
	Frame rate
	.TP
	.B \-fsg
	format finite state grammar file
	.TP
	.B \-fsgctl
	file listing FSG file to use for each utterance
	.TP
	.B \-fsgdir
	directory for FSG files
	.TP
	.B \-fsgext
	extension for FSG files (including leading dot)
	.TP
	.B \-fsgusealtpron
	Add alternate pronunciations to FSG
	.TP
	.B \-fsgusefiller
	Insert filler words at each state.
	.TP
	.B \-fwdflat
	Run forward flat-lexicon search over word lattice (2nd pass)
	.TP
	.B \-fwdflatbeam
	Beam width applied to every frame in second-pass flat search
	.TP
	.B \-fwdflatefwid
	Minimum number of end frames for a word to be searched in fwdflat search
	.TP
	.B \-fwdflatlw
	Language model probability weight for flat lexicon (2nd pass) decoding
	.TP
	.B \-fwdflatsfwin
	Window of frames in lattice to search for successor words in fwdflat search
	.TP
	.B \-fwdflatwbeam
	Beam width applied to word exits in second-pass flat search
	.TP
	.B \-fwdtree
	Run forward lexicon-tree search (1st pass)
	.TP
	.B \-hmm
	containing acoustic model files.
	.TP
	.B \-hyp
	output file name
	.TP
	.B \-hypseg
	output with segmentation file name
	.TP
	.B \-input_endian
	Endianness of input data, big or little, ignored if NIST or MS Wav
	.TP
	.B \-jsgf
	grammar file
	.TP
	.B \-keyphrase
	to spot
	.TP
	.B \-kws
	file with keyphrases to spot, one per line
	.TP
	.B \-kws_delay
	Delay to wait for best detection score
	.TP
	.B \-kws_plp
	Phone loop probability for keyword spotting
	.TP
	.B \-kws_threshold
	Threshold for p(hyp)/p(alternatives) ratio
	.TP
	.B \-latsize
	Initial backpointer table size
	.TP
	.B \-lda
	containing transformation matrix to be applied to features (single-stream features only)
	.TP
	.B \-ldadim
	Dimensionality of output of feature transformation (0 to use entire matrix)
	.TP
	.B \-lifter
	Length of sin-curve for liftering, or 0 for no liftering.
	.TP
	.B \-lm
	trigram language model input file
	.TP
	.B \-lmctl
	a set of language model
	.TP
	.B \-lmname
	language model in \fB\-lmctl\fR to use by default
	.TP
	.B \-lmnamectl
	file listing LM name to use for each utterance
	.TP
	.B \-logbase
	Base in which all log-likelihoods calculated
	.TP
	.B \-logfn
	to write log messages in
	.TP
	.B \-logspec
	Write out logspectral files instead of cepstra
	.TP
	.B \-lowerf
	Lower edge of filters
	.TP
	.B \-lpbeam
	Beam width applied to last phone in words
	.TP
	.B \-lponlybeam
	Beam width applied to last phone in single-phone words
	.TP
	.B \-lw
	Language model probability weight
	.TP
	.B \-maxhmmpf
	Maximum number of active HMMs to maintain at each frame (or \fB\-1\fR for no pruning)
	.TP
	.B \-maxwpf
	Maximum number of distinct word exits at each frame (or \fB\-1\fR for no pruning)
	.TP
	.B \-mdef
	definition input file
	.TP
	.B \-mean
	gaussian means input file
	.TP
	.B \-mfclogdir
	to log feature files to
	.TP
	.B \-min_endfr
	Nodes ignored in lattice construction if they persist for fewer than N frames
	.TP
	.B \-mixw
	mixture weights input file (uncompressed)
	.TP
	.B \-mixwfloor
	Senone mixture weights floor (applied to data from \fB\-mixw\fR file)
	.TP
	.B \-mllr
	transformation to apply to means and variances
	.TP
	.B \-mllrctl
	file listing MLLR transforms to use for each utterance
	.TP
	.B \-mllrdir
	directory for MLLR transforms
	.TP
	.B \-mllrext
	extension for MLLR transforms (including leading dot)
	.TP
	.B \-mmap
	Use memory-mapped I/O (if possible) for model files
	.TP
	.B \-nbest
	Number of N-best hypotheses to write to \fB\-nbestdir\fR (0 for no N-best)
	.TP
	.B \-nbestdir
	for writing N-best hypothesis lists
	.TP
	.B \-nbestext
	Extension for N-best hypothesis list files
	.TP
	.B \-ncep
	Number of cep coefficients
	.TP
	.B \-nfft
	Size of FFT
	.TP
	.B \-nfilt
	Number of filter banks
	.TP
	.B \-nwpen
	New word transition penalty
	.TP
	.B \-outlatbeam
	Minimum posterior probability for output lattice nodes
	.TP
	.B \-outlatdir
	for dumping word lattices
	.TP
	.B \-outlatext
	Filename extension for dumping word lattices
	.TP
	.B \-outlatfmt
	Format for dumping word lattices (s3 or htk)
	.TP
	.B \-pbeam
	Beam width applied to phone transitions
	.TP
	.B \-pip
	Phone insertion penalty
	.TP
	.B \-pl_beam
	Beam width applied to phone loop search for lookahead
	.TP
	.B \-pl_pbeam
	Beam width applied to phone loop transitions for lookahead
	.TP
	.B \-pl_pip
	Phone insertion penalty for phone loop
	.TP
	.B \-pl_weight
	Weight for phoneme lookahead penalties
	.TP
	.B \-pl_window
	Phoneme lookahead window size, in frames
	.TP
	.B \-rawlogdir
	to log raw audio files to
	.TP
	.B \-remove_dc
	Remove DC offset from each frame
	.TP
	.B \-remove_noise
	Remove noise with spectral subtraction in mel-energies
	.TP
	.B \-round_filters
	Round mel filter frequencies to DFT points
	.TP
	.B \-samprate
	Sampling rate
	.TP
	.B \-seed
	Seed for random number generator; if less than zero, pick our own
	.TP
	.B \-sendump
	dump (compressed mixture weights) input file
	.TP
	.B \-senin
	Input is senone score dump files
	.TP
	.B \-senlogdir
	to log senone score files to
	.TP
	.B \-senmgau
	to codebook mapping input file (usually not needed)
	.TP
	.B \-silprob
	Silence word transition probability
	.TP
	.B \-smoothspec
	Write out cepstral-smoothed logspectral files
	.TP
	.B \-svspec
	specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
	.TP
	.B \-tmat
	state transition matrix input file
	.TP
	.B \-tmatfloor
	HMM state transition probability floor (applied to \fB\-tmat\fR file)
	.TP
	.B \-topn
	Maximum number of top Gaussians to use in scoring.
	.TP
	.B \-topn_beam
	Beam width used to determine top-N Gaussians (or a list, per-feature)
	.TP
	.B \-toprule
	rule for JSGF (first public rule is default)
	.TP
	.B \-transform
	Which type of transform to use to calculate cepstra (legacy, dct, or htk)
	.TP
	.B \-unit_area
	Normalize mel filters to unit area
	.TP
	.B \-upperf
	Upper edge of filters
	.TP
	.B \-uw
	Unigram weight
	.TP
	.B \-var
	gaussian variances input file
	.TP
	.B \-varfloor
	Mixture gaussian variance floor (applied to data from \fB\-var\fR file)
	.TP
	.B \-varnorm
	Variance normalize each utterance (only if CMN == current)
	.TP
	.B \-verbose
	Show input filenames
	.TP
	.B \-warp_params
	defining the warping function
	.TP
	.B \-warp_type
	Warping function type (or shape)
	.TP
	.B \-wbeam
	Beam width applied to word exits
	.TP
	.B \-wip
	Word insertion penalty
	.TP
	.B \-wlen
	Hamming window length
	.PP
	To do batchmode recognition, you
	will need to specify a control file, using
	.B \-ctl
	This is a simple text file containing one entry per line. Each entry
	is the name of an input file relative to the
	.B \-cepdir
	directory, and without the filename extension (which is given in the
	.B \-cepext
	argument).
	.PP
	If you are using acoustic feature files as input (see
	.BR sphinx_fe (1)
	for information on how to generate these), you can also specify a subpart
	of a file, using the following format:
	.PP
	.RS
	.B FILENAME START\-FRAME END\-FRAME UTTERANCE-ID
	.RE
	.SH AUTHOR
	Written by numerous people at CMU from 1994 onwards. This manual page
	by David Huggins-Daines <dhdaines@gmail.com>
	.SH COPYRIGHT
	Copyright \(co 1994-2016 Carnegie Mellon University. See the file
	\fILICENSE\fR included with this package for more information.
	.br
	.SH "SEE ALSO"
	.BR pocketsphinx_continuous (1),
	.BR sphinx_fe (1).
	.br