Speech2Invoice / README.md
ducdatit2002
Update
5f68701
---
library_name: pytorch
license: mit
pipeline_tag: automatic-speech-recognition
language:
- vi
- en
tags:
- automatic-speech-recognition
- invoice-extraction
- speech
---
# ASR + Invoice Extraction Server
Standalone packaging of `Server_conformer.py` to transcribe audio and extract invoice JSON from transcript text. This folder now includes a copy of the trained RNNT checkpoint for convenience.
## What’s inside
- `Server_conformer.py`, `Speech2text.py`, `InformationExtractor.py`
- `chunkformer/` code
- `chunkformer-model/`
- `requirements.txt`
## Prerequisites
- Python 3.9+ and a CUDA GPU (required for Qwen invoice extraction; CPU will be extremely slow)
- Hugging Face token with access to the models you use (`HF_TOKEN`)
- Chunkformer RNNT checkpoint available at `chunkformer-model` (copied into this folder). Update `CHUNKFORMER_MODEL_PATH` if you place it elsewhere.
## Setup
```bash
cd Speech2Invoice
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## Configure environment
Create a `.env` (or export env vars) with at least:
```
PORT=8000
USE_NGROK=false
HF_TOKEN=your_hf_token_here
CHUNKFORMER_MODEL_PATH=chunkformer-model
LOG_LEVEL=DEBUG
DEBUG=true
# Optional ngrok
NGROK_AUTHTOKEN=
NGROK_REGION=ap
# Optional invoice LLM overrides (defaults are fast)
IE_LLM_MODEL_ID=Qwen/Qwen1.5-7B-Chat
IE_MAX_NEW_TOKENS=256
IE_DO_SAMPLE=false
IE_TEMPERATURE=0.0
IE_TOP_P=0.8
```
If you move the model elsewhere, set `CHUNKFORMER_MODEL_PATH` to that directory.
## Run
```bash
python3 Server_conformer.py
```
## Endpoints
- `POST /transcribe` — multipart/form-data with audio file (`wav`, `mp3`, `m4a`, `ogg`, `webm`). Returns JSON with `final_result` and `full_transcription`.
- `POST /ticket` — JSON body `{"full_transcription": "<text>"}`. Returns invoice JSON inferred by Qwen.
## Notes
- The invoice extractor requires GPU and HF download on first run. Use smaller models via `IE_LLM_MODEL_ID` for speed.
- Model weights for the RNNT checkpoint are included in `chunkformer-model/`. For large files, consider git-lfs if you plan to push to a remote.
## Contact
For questions or controlled access requests to Speech2Invoice:
* Duc Dat Pham
* Email: [ducdatit2002@gmail.com](mailto:ducdatit2002@gmail.com)