--- title: TeleAgent emoji: "📞" colorFrom: indigo colorTo: blue sdk: docker sdk_version: "1.0" app_file: app.py pinned: false --- # TeeleAgentHF TeeleAgentHF is an AI-powered telecalling agent built for a Hugging Face competition. It captures live audio, transcribes speech, extracts scheduling intent, evaluates feasibility, and confirms bookings. Designed for low-VRAM deployment (4GB budget) and Hugging Face Spaces. ## Key Features - Real-time microphone capture with Gradio UI - ASR: Hugging Face Moonshine (streaming) - Intent parsing: Qwen2.5-7B-Instruct (GGUF via llama-cpp-python) - Evaluation: MiniCPM3-4B (int4 quantized evaluator) - VAD: Silero VAD (ONNX) - Persistent bookings in SQLite (`data/calls.db`) - Scheduling rules and slot-checking logic ## Architecture - `app.py`: Gradio front-end and session controls - `pipeline/`: transcriber, intent parser, evaluator, orchestrator, VAD listener - `config.py` & `hf_config.json`: model and inference configuration - `data/calls.db` and `db.py`: call logging and booking persistence ## Requirements - Python 3.10+ (3.11 recommended) - CUDA-capable GPU for llama-cpp-python Qwen inference (recommended) - Install dependencies: `pip install -r requirements.txt` - Note: `llama-cpp-python` may require a CUDA-enabled build. Example: ```bash CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUBLAS=on" pip install -U "llama-cpp-python" ``` ## Running Locally 1. Create and activate a virtual environment 2. Install dependencies: `pip install -r requirements.txt` 3. Ensure models referenced in `hf_config.json` are available or accessible via Hugging Face 4. Start the app: ```bash python app.py ``` 5. Open http://127.0.0.1:7860 in a browser ## Deployment (Hugging Face Spaces) - Ensure `app.py` listens on 0.0.0.0:7860 (config.py already uses these defaults) - For Moonshine ASR, leave `TRANSCRIBE_LOCAL_ONLY` unset or set it to `0` in the environment so the model can be downloaded automatically on first run. - Provide model files or configure download/autoload in `hf_config.json` - Verify VRAM budget and use quantized GGUF models to fit resource limits ## Configuration - Edit `config.py` and `hf_config.json` to tune models, quantization, batch sizes, and scheduling rules (working hours, slot lengths, etc.) ## Collaborators - Saurav Kumar Yadav ## Contributing - Open issues or PRs. For large model changes, include resource and runtime notes. ## License See LICENSE in the repository root. ## Contact For questions about this project, contact the repository owner or listed collaborators.