TeleAgent / README.md
S-K-yadav's picture
the model tries to cache on startup to boost performance
3e394c4
|
Raw
History Blame Contribute Delete
2.61 kB
metadata
title: TeleAgent
emoji: 📞
colorFrom: indigo
colorTo: blue
sdk: docker
sdk_version: '1.0'
app_file: app.py
pinned: false

TeeleAgentHF

TeeleAgentHF is an AI-powered telecalling agent built for a Hugging Face competition. It captures live audio, transcribes speech, extracts scheduling intent, evaluates feasibility, and confirms bookings. Designed for low-VRAM deployment (4GB budget) and Hugging Face Spaces.

Key Features

  • Real-time microphone capture with Gradio UI
  • ASR: Hugging Face Moonshine (streaming)
  • Intent parsing: Qwen2.5-7B-Instruct (GGUF via llama-cpp-python)
  • Evaluation: MiniCPM3-4B (int4 quantized evaluator)
  • VAD: Silero VAD (ONNX)
  • Persistent bookings in SQLite (data/calls.db)
  • Scheduling rules and slot-checking logic

Architecture

  • app.py: Gradio front-end and session controls
  • pipeline/: transcriber, intent parser, evaluator, orchestrator, VAD listener
  • config.py & hf_config.json: model and inference configuration
  • data/calls.db and db.py: call logging and booking persistence

Requirements

  • Python 3.10+ (3.11 recommended)
  • CUDA-capable GPU for llama-cpp-python Qwen inference (recommended)
  • Install dependencies: pip install -r requirements.txt
  • Note: llama-cpp-python may require a CUDA-enabled build. Example:
    CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUBLAS=on" pip install -U "llama-cpp-python"
    

Running Locally

  1. Create and activate a virtual environment
  2. Install dependencies: pip install -r requirements.txt
  3. Ensure models referenced in hf_config.json are available or accessible via Hugging Face
  4. Start the app:
    python app.py
    
  5. Open http://127.0.0.1:7860 in a browser

Deployment (Hugging Face Spaces)

  • Ensure app.py listens on 0.0.0.0:7860 (config.py already uses these defaults)
  • For Moonshine ASR, leave TRANSCRIBE_LOCAL_ONLY unset or set it to 0 in the environment so the model can be downloaded automatically on first run.
  • Provide model files or configure download/autoload in hf_config.json
  • Verify VRAM budget and use quantized GGUF models to fit resource limits

Configuration

  • Edit config.py and hf_config.json to tune models, quantization, batch sizes, and scheduling rules (working hours, slot lengths, etc.)

Collaborators

Contributing

  • Open issues or PRs. For large model changes, include resource and runtime notes.

License

See LICENSE in the repository root.

Contact

For questions about this project, contact the repository owner or listed collaborators.