TeleAgent / README.md
S-K-yadav's picture
the model tries to cache on startup to boost performance
3e394c4
|
Raw
History Blame Contribute Delete
2.61 kB
---
title: TeleAgent
emoji: "๐Ÿ“ž"
colorFrom: indigo
colorTo: blue
sdk: docker
sdk_version: "1.0"
app_file: app.py
pinned: false
---
# TeeleAgentHF
TeeleAgentHF is an AI-powered telecalling agent built for a Hugging Face competition. It captures live audio, transcribes speech, extracts scheduling intent, evaluates feasibility, and confirms bookings. Designed for low-VRAM deployment (4GB budget) and Hugging Face Spaces.
## Key Features
- Real-time microphone capture with Gradio UI
- ASR: Hugging Face Moonshine (streaming)
- Intent parsing: Qwen2.5-7B-Instruct (GGUF via llama-cpp-python)
- Evaluation: MiniCPM3-4B (int4 quantized evaluator)
- VAD: Silero VAD (ONNX)
- Persistent bookings in SQLite (`data/calls.db`)
- Scheduling rules and slot-checking logic
## Architecture
- `app.py`: Gradio front-end and session controls
- `pipeline/`: transcriber, intent parser, evaluator, orchestrator, VAD listener
- `config.py` & `hf_config.json`: model and inference configuration
- `data/calls.db` and `db.py`: call logging and booking persistence
## Requirements
- Python 3.10+ (3.11 recommended)
- CUDA-capable GPU for llama-cpp-python Qwen inference (recommended)
- Install dependencies: `pip install -r requirements.txt`
- Note: `llama-cpp-python` may require a CUDA-enabled build. Example:
```bash
CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUBLAS=on" pip install -U "llama-cpp-python"
```
## Running Locally
1. Create and activate a virtual environment
2. Install dependencies: `pip install -r requirements.txt`
3. Ensure models referenced in `hf_config.json` are available or accessible via Hugging Face
4. Start the app:
```bash
python app.py
```
5. Open http://127.0.0.1:7860 in a browser
## Deployment (Hugging Face Spaces)
- Ensure `app.py` listens on 0.0.0.0:7860 (config.py already uses these defaults)
- For Moonshine ASR, leave `TRANSCRIBE_LOCAL_ONLY` unset or set it to `0` in the environment so the model can be downloaded automatically on first run.
- Provide model files or configure download/autoload in `hf_config.json`
- Verify VRAM budget and use quantized GGUF models to fit resource limits
## Configuration
- Edit `config.py` and `hf_config.json` to tune models, quantization, batch sizes, and scheduling rules (working hours, slot lengths, etc.)
## Collaborators
- Saurav Kumar Yadav <sauravkumaryadav100@gmail.com>
## Contributing
- Open issues or PRs. For large model changes, include resource and runtime notes.
## License
See LICENSE in the repository root.
## Contact
For questions about this project, contact the repository owner or listed collaborators.