Spaces:

build-small-hackathon
/

TeleAgent

Running

App Files Files Community

TeleAgent / README.md

S-K-yadav

the model tries to cache on startup to boost performance

3e394c4 17 days ago

preview code

Raw

History Blame Contribute Delete

2.61 kB

metadata

title: TeleAgent
emoji: 📞
colorFrom: indigo
colorTo: blue
sdk: docker
sdk_version: '1.0'
app_file: app.py
pinned: false

TeeleAgentHF

TeeleAgentHF is an AI-powered telecalling agent built for a Hugging Face competition. It captures live audio, transcribes speech, extracts scheduling intent, evaluates feasibility, and confirms bookings. Designed for low-VRAM deployment (4GB budget) and Hugging Face Spaces.

Key Features

Real-time microphone capture with Gradio UI
ASR: Hugging Face Moonshine (streaming)
Intent parsing: Qwen2.5-7B-Instruct (GGUF via llama-cpp-python)
Evaluation: MiniCPM3-4B (int4 quantized evaluator)
VAD: Silero VAD (ONNX)
Persistent bookings in SQLite (data/calls.db)
Scheduling rules and slot-checking logic

Architecture

app.py: Gradio front-end and session controls
pipeline/: transcriber, intent parser, evaluator, orchestrator, VAD listener
config.py & hf_config.json: model and inference configuration
data/calls.db and db.py: call logging and booking persistence

Requirements

Python 3.10+ (3.11 recommended)
CUDA-capable GPU for llama-cpp-python Qwen inference (recommended)
Install dependencies: pip install -r requirements.txt

Note: llama-cpp-python may require a CUDA-enabled build. Example:

CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUBLAS=on" pip install -U "llama-cpp-python"

Running Locally

Create and activate a virtual environment
Install dependencies: pip install -r requirements.txt
Ensure models referenced in hf_config.json are available or accessible via Hugging Face
Start the app:
```
python app.py
```
Open http://127.0.0.1:7860 in a browser

Deployment (Hugging Face Spaces)

Ensure app.py listens on 0.0.0.0:7860 (config.py already uses these defaults)
For Moonshine ASR, leave TRANSCRIBE_LOCAL_ONLY unset or set it to 0 in the environment so the model can be downloaded automatically on first run.
Provide model files or configure download/autoload in hf_config.json
Verify VRAM budget and use quantized GGUF models to fit resource limits

Configuration

Edit config.py and hf_config.json to tune models, quantization, batch sizes, and scheduling rules (working hours, slot lengths, etc.)

Collaborators

Saurav Kumar Yadav sauravkumaryadav100@gmail.com

Contributing

Open issues or PRs. For large model changes, include resource and runtime notes.

License

See LICENSE in the repository root.

Contact

For questions about this project, contact the repository owner or listed collaborators.