pocketsphinx

5610573 about 3 years ago

4.28 kB

	.TH SPHINX_FE 1 "2007-08-27"
	.SH NAME
	sphinx_fe \- Convert audio files to acoustic feature files
	.SH SYNOPSIS
	.B sphinx_fe
	[\fI options \fR]...
	.SH DESCRIPTION
	.PP
	This program converts audio files (in either Microsoft WAV, NIST
	Sphere, or raw format) to acoustic feature files for input to
	batch-mode speech recognition. The resulting files are also useful
	for various other things. A list of options follows:
	.TP
	.B \-alpha
	Preemphasis parameter
	.TP
	.B \-argfile
	file (e.g. feat.params from an acoustic model) to read parameters from. This will override anything set in other command line arguments.
	.TP
	.B \-blocksize
	Number of samples to read at a time.
	.TP
	.B \-build_outdirs
	Create missing subdirectories in output directory
	.TP
	.B \-c
	file for batch processing
	.TP
	.B \-cep2spec
	Input is cepstral files, output is log spectral files
	.TP
	.B \-di
	directory, input file names are relative to this, if defined
	.TP
	.B \-dither
	Add 1/2-bit noise
	.TP
	.B \-do
	directory, output files are relative to this
	.TP
	.B \-doublebw
	Use double bandwidth filters (same center freq)
	.TP
	.B \-ei
	extension to be applied to all input files
	.TP
	.B \-eo
	extension to be applied to all output files
	.TP
	.B \-example
	Shows example of how to use the tool
	.TP
	.B \-frate
	Frame rate
	.TP
	.B \-help
	Shows the usage of the tool
	.TP
	.B \-i
	audio input file
	.TP
	.B \-input_endian
	Endianness of input data, big or little, ignored if NIST or MS Wav
	.TP
	.B \-lifter
	Length of sin-curve for liftering, or 0 for no liftering.
	.TP
	.B \-logspec
	Write out logspectral files instead of cepstra
	.TP
	.B \-lowerf
	Lower edge of filters
	.TP
	.B \-mach_endian
	Endianness of machine, big or little
	.TP
	.B \-mswav
	Defines input format as Microsoft Wav (RIFF)
	.TP
	.B \-ncep
	Number of cep coefficients
	.TP
	.B \-nchans
	Number of channels of data (interlaced samples assumed)
	.TP
	.B \-nfft
	Size of FFT
	.TP
	.B \-nfilt
	Number of filter banks
	.TP
	.B \-nist
	Defines input format as NIST sphere
	.TP
	.B \-npart
	Number of parts to run in (supersedes \fB\-nskip\fR and \fB\-runlen\fR if non-zero)
	.TP
	.B \-nskip
	If a control file was specified, the number of utterances to skip at the head of the file
	.TP
	.B \-o
	cepstral output file
	.TP
	.B \-ofmt
	Format of output files - one of sphinx, htk, text.
	.TP
	.B \-part
	Index of the part to run (supersedes \fB\-nskip\fR and \fB\-runlen\fR if non-zero)
	.TP
	.B \-raw
	Defines input format as raw binary data
	.TP
	.B \-remove_dc
	Remove DC offset from each frame
	.TP
	.B \-remove_noise
	Remove noise with spectral subtraction in mel-energies
	.TP
	.B \-round_filters
	Round mel filter frequencies to DFT points
	.TP
	.B \-runlen
	If a control file was specified, the number of utterances to process, or \fB\-1\fR for all
	.TP
	.B \-samprate
	Sampling rate
	.TP
	.B \-seed
	Seed for random number generator; if less than zero, pick our own
	.TP
	.B \-smoothspec
	Write out cepstral-smoothed logspectral files
	.TP
	.B \-spec2cep
	Input is log spectral files, output is cepstral files
	.TP
	.B \-sph2pipe
	Input is NIST sphere (possibly with Shorten), use sph2pipe to convert
	.TP
	.B \-transform
	Which type of transform to use to calculate cepstra (legacy, dct, or htk)
	.TP
	.B \-unit_area
	Normalize mel filters to unit area
	.TP
	.B \-upperf
	Upper edge of filters
	.TP
	.B \-verbose
	Show input filenames
	.TP
	.B \-warp_params
	defining the warping function
	.TP
	.B \-warp_type
	Warping function type (or shape)
	.TP
	.B \-whichchan
	Channel to process (numbered from 1), or 0 to mix all channels
	.TP
	.B \-wlen
	Hamming window length
	.PP
	Currently the only kind of features supported are MFCCs (mel-frequency
	cepstral coefficients). There are numerous options which control the
	properties of the output features. It is \fBVERY\fR important that
	you document the specific set of flags used to create any given set of
	feature files, since this information is \fBNOT\fR recorded in the
	files themselves, and any mismatch between the parameters used to
	extract features for recognition and those used to extract features
	for training will cause recognition to fail.
	.SH AUTHOR
	Written by numerous people at CMU from 1994 onwards. This manual page
	by David Huggins-Daines <dhdaines@gmail.com>
	.SH COPYRIGHT
	Copyright \(co 1994-2007 Carnegie Mellon University. See the file
	\fICOPYING\fR included with this package for more information.
	.br