ducdatit2002
/

Speech2Invoice

Automatic Speech Recognition

invoice-extraction

Model card Files Files and versions

Speech2Invoice / README.md

ducdatit2002

Update

5f68701 12 days ago

|

history blame contribute delete

2.28 kB

	---
	library_name: pytorch
	license: mit
	pipeline_tag: automatic-speech-recognition
	language:
	- vi
	- en
	tags:
	- automatic-speech-recognition
	- invoice-extraction
	- speech
	---

	# ASR + Invoice Extraction Server

	Standalone packaging of `Server_conformer.py` to transcribe audio and extract invoice JSON from transcript text. This folder now includes a copy of the trained RNNT checkpoint for convenience.

	## What’s inside
	- `Server_conformer.py`, `Speech2text.py`, `InformationExtractor.py`
	- `chunkformer/` code
	- `chunkformer-model/`
	- `requirements.txt`

	## Prerequisites
	- Python 3.9+ and a CUDA GPU (required for Qwen invoice extraction; CPU will be extremely slow)
	- Hugging Face token with access to the models you use (`HF_TOKEN`)
	- Chunkformer RNNT checkpoint available at `chunkformer-model` (copied into this folder). Update `CHUNKFORMER_MODEL_PATH` if you place it elsewhere.

	## Setup
	```bash
	cd Speech2Invoice
	python3 -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	## Configure environment
	Create a `.env` (or export env vars) with at least:
	```
	PORT=8000
	USE_NGROK=false
	HF_TOKEN=your_hf_token_here
	CHUNKFORMER_MODEL_PATH=chunkformer-model
	LOG_LEVEL=DEBUG
	DEBUG=true

	# Optional ngrok
	NGROK_AUTHTOKEN=
	NGROK_REGION=ap

	# Optional invoice LLM overrides (defaults are fast)
	IE_LLM_MODEL_ID=Qwen/Qwen1.5-7B-Chat
	IE_MAX_NEW_TOKENS=256
	IE_DO_SAMPLE=false
	IE_TEMPERATURE=0.0
	IE_TOP_P=0.8
	```

	If you move the model elsewhere, set `CHUNKFORMER_MODEL_PATH` to that directory.

	## Run
	```bash
	python3 Server_conformer.py
	```

	## Endpoints
	- `POST /transcribe` — multipart/form-data with audio file (`wav`, `mp3`, `m4a`, `ogg`, `webm`). Returns JSON with `final_result` and `full_transcription`.
	- `POST /ticket` — JSON body `{"full_transcription": "<text>"}`. Returns invoice JSON inferred by Qwen.

	## Notes
	- The invoice extractor requires GPU and HF download on first run. Use smaller models via `IE_LLM_MODEL_ID` for speed.
	- Model weights for the RNNT checkpoint are included in `chunkformer-model/`. For large files, consider git-lfs if you plan to push to a remote.

	## Contact

	For questions or controlled access requests to Speech2Invoice:

	* Duc Dat Pham
	* Email: [ducdatit2002@gmail.com](mailto:ducdatit2002@gmail.com)