| .TH POCKETSPHINX_BATCH 1 "2007-08-27" | |
| .SH NAME | |
| pocketsphinx_batch \- Run speech recognition in batch mode | |
| .SH SYNOPSIS | |
| .B pocketsphinx_batch | |
| .RI \fB\-ctl\fR | |
| \fIctlfile\fR | |
| \fB\-cepdir\fR | |
| \fIcepdir\fR | |
| \fB\-cepext\fR | |
| \fI.mfc\fR | |
| [\fI options \fR]... | |
| .SH DESCRIPTION | |
| .PP | |
| Run speech recognition over a list of utterances in batchmode. A list | |
| of arguments follows: | |
| .TP | |
| .B \-adchdr | |
| Size of audio file header in bytes (headers are ignored) | |
| .TP | |
| .B \-adcin | |
| Input is raw audio data | |
| .TP | |
| .B \-agc | |
| Automatic gain control for c0 ('max', 'emax', 'noise', or 'none') | |
| .TP | |
| .B \-agcthresh | |
| Initial threshold for automatic gain control | |
| .TP | |
| .B \-allphone | |
| phoneme decoding with phonetic lm | |
| .TP | |
| .B \-allphone_ci | |
| Perform phoneme decoding with phonetic lm and context-independent units only | |
| .TP | |
| .B \-alpha | |
| Preemphasis parameter | |
| .TP | |
| .B \-argfile | |
| file giving extra arguments. | |
| .TP | |
| .B \-ascale | |
| Inverse of acoustic model scale for confidence score calculation | |
| .TP | |
| .B \-aw | |
| Inverse weight applied to acoustic scores. | |
| .TP | |
| .B \-backtrace | |
| Print results and backtraces to log file. | |
| .TP | |
| .B \-beam | |
| Beam width applied to every frame in Viterbi search (smaller values mean wider beam) | |
| .TP | |
| .B \-bestpath | |
| Run bestpath (Dijkstra) search over word lattice (3rd pass) | |
| .TP | |
| .B \-bestpathlw | |
| Language model probability weight for bestpath search | |
| .TP | |
| .B \-build_outdirs | |
| Create missing subdirectories in output directory | |
| .TP | |
| .B \-cepdir | |
| files directory (prefixed to filespecs in control file) | |
| .TP | |
| .B \-cepext | |
| Input files extension (suffixed to filespecs in control file) | |
| .TP | |
| .B \-ceplen | |
| Number of components in the input feature vector | |
| .TP | |
| .B \-cmn | |
| Cepstral mean normalization scheme ('current', 'prior', or 'none') | |
| .TP | |
| .B \-cmninit | |
| Initial values (comma-separated) for cepstral mean when 'prior' is used | |
| .TP | |
| .B \-compallsen | |
| Compute all senone scores in every frame (can be faster when there are many senones) | |
| .TP | |
| .B \-ctl | |
| file listing utterances to be processed | |
| .TP | |
| .B \-ctlcount | |
| No. of utterances to be processed (after skipping \fB\-ctloffset\fR entries) | |
| .TP | |
| .B \-ctlincr | |
| Do every Nth line in the control file | |
| .TP | |
| .B \-ctloffset | |
| No. of utterances at the beginning of \fB\-ctl\fR file to be skipped | |
| .TP | |
| .B \-ctm | |
| output in CTM file format (may require post-sorting) | |
| .TP | |
| .B \-debug | |
| level for debugging messages | |
| .TP | |
| .B \-dict | |
| pronunciation dictionary (lexicon) input file | |
| .TP | |
| .B \-dictcase | |
| Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only) | |
| .TP | |
| .B \-dither | |
| Add 1/2-bit noise | |
| .TP | |
| .B \-doublebw | |
| Use double bandwidth filters (same center freq) | |
| .TP | |
| .B \-ds | |
| Frame GMM computation downsampling ratio | |
| .TP | |
| .B \-fdict | |
| word pronunciation dictionary input file | |
| .TP | |
| .B \-feat | |
| Feature stream type, depends on the acoustic model | |
| .TP | |
| .B \-featparams | |
| containing feature extraction parameters. | |
| .TP | |
| .B \-fillprob | |
| Filler word transition probability | |
| .TP | |
| .B \-frate | |
| Frame rate | |
| .TP | |
| .B \-fsg | |
| format finite state grammar file | |
| .TP | |
| .B \-fsgctl | |
| file listing FSG file to use for each utterance | |
| .TP | |
| .B \-fsgdir | |
| directory for FSG files | |
| .TP | |
| .B \-fsgext | |
| extension for FSG files (including leading dot) | |
| .TP | |
| .B \-fsgusealtpron | |
| Add alternate pronunciations to FSG | |
| .TP | |
| .B \-fsgusefiller | |
| Insert filler words at each state. | |
| .TP | |
| .B \-fwdflat | |
| Run forward flat-lexicon search over word lattice (2nd pass) | |
| .TP | |
| .B \-fwdflatbeam | |
| Beam width applied to every frame in second-pass flat search | |
| .TP | |
| .B \-fwdflatefwid | |
| Minimum number of end frames for a word to be searched in fwdflat search | |
| .TP | |
| .B \-fwdflatlw | |
| Language model probability weight for flat lexicon (2nd pass) decoding | |
| .TP | |
| .B \-fwdflatsfwin | |
| Window of frames in lattice to search for successor words in fwdflat search | |
| .TP | |
| .B \-fwdflatwbeam | |
| Beam width applied to word exits in second-pass flat search | |
| .TP | |
| .B \-fwdtree | |
| Run forward lexicon-tree search (1st pass) | |
| .TP | |
| .B \-hmm | |
| containing acoustic model files. | |
| .TP | |
| .B \-hyp | |
| output file name | |
| .TP | |
| .B \-hypseg | |
| output with segmentation file name | |
| .TP | |
| .B \-input_endian | |
| Endianness of input data, big or little, ignored if NIST or MS Wav | |
| .TP | |
| .B \-jsgf | |
| grammar file | |
| .TP | |
| .B \-keyphrase | |
| to spot | |
| .TP | |
| .B \-kws | |
| file with keyphrases to spot, one per line | |
| .TP | |
| .B \-kws_delay | |
| Delay to wait for best detection score | |
| .TP | |
| .B \-kws_plp | |
| Phone loop probability for keyword spotting | |
| .TP | |
| .B \-kws_threshold | |
| Threshold for p(hyp)/p(alternatives) ratio | |
| .TP | |
| .B \-latsize | |
| Initial backpointer table size | |
| .TP | |
| .B \-lda | |
| containing transformation matrix to be applied to features (single-stream features only) | |
| .TP | |
| .B \-ldadim | |
| Dimensionality of output of feature transformation (0 to use entire matrix) | |
| .TP | |
| .B \-lifter | |
| Length of sin-curve for liftering, or 0 for no liftering. | |
| .TP | |
| .B \-lm | |
| trigram language model input file | |
| .TP | |
| .B \-lmctl | |
| a set of language model | |
| .TP | |
| .B \-lmname | |
| language model in \fB\-lmctl\fR to use by default | |
| .TP | |
| .B \-lmnamectl | |
| file listing LM name to use for each utterance | |
| .TP | |
| .B \-logbase | |
| Base in which all log-likelihoods calculated | |
| .TP | |
| .B \-logfn | |
| to write log messages in | |
| .TP | |
| .B \-logspec | |
| Write out logspectral files instead of cepstra | |
| .TP | |
| .B \-lowerf | |
| Lower edge of filters | |
| .TP | |
| .B \-lpbeam | |
| Beam width applied to last phone in words | |
| .TP | |
| .B \-lponlybeam | |
| Beam width applied to last phone in single-phone words | |
| .TP | |
| .B \-lw | |
| Language model probability weight | |
| .TP | |
| .B \-maxhmmpf | |
| Maximum number of active HMMs to maintain at each frame (or \fB\-1\fR for no pruning) | |
| .TP | |
| .B \-maxwpf | |
| Maximum number of distinct word exits at each frame (or \fB\-1\fR for no pruning) | |
| .TP | |
| .B \-mdef | |
| definition input file | |
| .TP | |
| .B \-mean | |
| gaussian means input file | |
| .TP | |
| .B \-mfclogdir | |
| to log feature files to | |
| .TP | |
| .B \-min_endfr | |
| Nodes ignored in lattice construction if they persist for fewer than N frames | |
| .TP | |
| .B \-mixw | |
| mixture weights input file (uncompressed) | |
| .TP | |
| .B \-mixwfloor | |
| Senone mixture weights floor (applied to data from \fB\-mixw\fR file) | |
| .TP | |
| .B \-mllr | |
| transformation to apply to means and variances | |
| .TP | |
| .B \-mllrctl | |
| file listing MLLR transforms to use for each utterance | |
| .TP | |
| .B \-mllrdir | |
| directory for MLLR transforms | |
| .TP | |
| .B \-mllrext | |
| extension for MLLR transforms (including leading dot) | |
| .TP | |
| .B \-mmap | |
| Use memory-mapped I/O (if possible) for model files | |
| .TP | |
| .B \-nbest | |
| Number of N-best hypotheses to write to \fB\-nbestdir\fR (0 for no N-best) | |
| .TP | |
| .B \-nbestdir | |
| for writing N-best hypothesis lists | |
| .TP | |
| .B \-nbestext | |
| Extension for N-best hypothesis list files | |
| .TP | |
| .B \-ncep | |
| Number of cep coefficients | |
| .TP | |
| .B \-nfft | |
| Size of FFT | |
| .TP | |
| .B \-nfilt | |
| Number of filter banks | |
| .TP | |
| .B \-nwpen | |
| New word transition penalty | |
| .TP | |
| .B \-outlatbeam | |
| Minimum posterior probability for output lattice nodes | |
| .TP | |
| .B \-outlatdir | |
| for dumping word lattices | |
| .TP | |
| .B \-outlatext | |
| Filename extension for dumping word lattices | |
| .TP | |
| .B \-outlatfmt | |
| Format for dumping word lattices (s3 or htk) | |
| .TP | |
| .B \-pbeam | |
| Beam width applied to phone transitions | |
| .TP | |
| .B \-pip | |
| Phone insertion penalty | |
| .TP | |
| .B \-pl_beam | |
| Beam width applied to phone loop search for lookahead | |
| .TP | |
| .B \-pl_pbeam | |
| Beam width applied to phone loop transitions for lookahead | |
| .TP | |
| .B \-pl_pip | |
| Phone insertion penalty for phone loop | |
| .TP | |
| .B \-pl_weight | |
| Weight for phoneme lookahead penalties | |
| .TP | |
| .B \-pl_window | |
| Phoneme lookahead window size, in frames | |
| .TP | |
| .B \-rawlogdir | |
| to log raw audio files to | |
| .TP | |
| .B \-remove_dc | |
| Remove DC offset from each frame | |
| .TP | |
| .B \-remove_noise | |
| Remove noise with spectral subtraction in mel-energies | |
| .TP | |
| .B \-round_filters | |
| Round mel filter frequencies to DFT points | |
| .TP | |
| .B \-samprate | |
| Sampling rate | |
| .TP | |
| .B \-seed | |
| Seed for random number generator; if less than zero, pick our own | |
| .TP | |
| .B \-sendump | |
| dump (compressed mixture weights) input file | |
| .TP | |
| .B \-senin | |
| Input is senone score dump files | |
| .TP | |
| .B \-senlogdir | |
| to log senone score files to | |
| .TP | |
| .B \-senmgau | |
| to codebook mapping input file (usually not needed) | |
| .TP | |
| .B \-silprob | |
| Silence word transition probability | |
| .TP | |
| .B \-smoothspec | |
| Write out cepstral-smoothed logspectral files | |
| .TP | |
| .B \-svspec | |
| specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38) | |
| .TP | |
| .B \-tmat | |
| state transition matrix input file | |
| .TP | |
| .B \-tmatfloor | |
| HMM state transition probability floor (applied to \fB\-tmat\fR file) | |
| .TP | |
| .B \-topn | |
| Maximum number of top Gaussians to use in scoring. | |
| .TP | |
| .B \-topn_beam | |
| Beam width used to determine top-N Gaussians (or a list, per-feature) | |
| .TP | |
| .B \-toprule | |
| rule for JSGF (first public rule is default) | |
| .TP | |
| .B \-transform | |
| Which type of transform to use to calculate cepstra (legacy, dct, or htk) | |
| .TP | |
| .B \-unit_area | |
| Normalize mel filters to unit area | |
| .TP | |
| .B \-upperf | |
| Upper edge of filters | |
| .TP | |
| .B \-uw | |
| Unigram weight | |
| .TP | |
| .B \-var | |
| gaussian variances input file | |
| .TP | |
| .B \-varfloor | |
| Mixture gaussian variance floor (applied to data from \fB\-var\fR file) | |
| .TP | |
| .B \-varnorm | |
| Variance normalize each utterance (only if CMN == current) | |
| .TP | |
| .B \-verbose | |
| Show input filenames | |
| .TP | |
| .B \-warp_params | |
| defining the warping function | |
| .TP | |
| .B \-warp_type | |
| Warping function type (or shape) | |
| .TP | |
| .B \-wbeam | |
| Beam width applied to word exits | |
| .TP | |
| .B \-wip | |
| Word insertion penalty | |
| .TP | |
| .B \-wlen | |
| Hamming window length | |
| .PP | |
| To do batchmode recognition, you | |
| will need to specify a control file, using | |
| .B \-ctl | |
| This is a simple text file containing one entry per line. Each entry | |
| is the name of an input file relative to the | |
| .B \-cepdir | |
| directory, and without the filename extension (which is given in the | |
| .B \-cepext | |
| argument). | |
| .PP | |
| If you are using acoustic feature files as input (see | |
| .BR sphinx_fe (1) | |
| for information on how to generate these), you can also specify a subpart | |
| of a file, using the following format: | |
| .PP | |
| .RS | |
| .B FILENAME START\-FRAME END\-FRAME UTTERANCE-ID | |
| .RE | |
| .SH AUTHOR | |
| Written by numerous people at CMU from 1994 onwards. This manual page | |
| by David Huggins-Daines <dhdaines@gmail.com> | |
| .SH COPYRIGHT | |
| Copyright \(co 1994-2016 Carnegie Mellon University. See the file | |
| \fILICENSE\fR included with this package for more information. | |
| .br | |
| .SH "SEE ALSO" | |
| .BR pocketsphinx_continuous (1), | |
| .BR sphinx_fe (1). | |
| .br | |