--- title: Samantic Audio emoji: 🤖 colorFrom: blue colorTo: green sdk: gradio sdk_version: 6.18.0 python_version: '3.10' app_file: app.py pinned: false tags: - track:wood - sponsor:openbmb - sponsor:openai - achievement:offgrid - bonus:tiny-titan --- # Samantic Audio Semantic-to-audio communication system that features a local robotic AI. The application uses an LLM to generate responses, converts the response to speech using bark-small TTS, and processes the output through a custom synthesizer to create a unique "droid-style" voice. ## Links | Resource | Link | | :--- | :--- | | **Demo Video** | [YouTube](https://youtu.be/ID0IG2BIBRo) | | **Social Post** | [X Post](https://x.com/GovindLokam/status/2066625722891026578?s=20) | | **Social Post** | [LinkedIn Post](https://www.linkedin.com/posts/govind-lokam-335382230_ai-llm-generativeai-share-7472390184009895936-6YOz/?utm_source=share&utm_medium=member_desktop&rcm=ACoAADm4O1kBjwPZCkmLC-YSFR8At-SNQhkj4XY) | | **GitHub Repository** | [GovIndLok/Voinal](https://github.com/GovIndLok/Voinal) | ## Hackathon Badges Claimed | Category | Badge Name (Tag) | Description / Justification | | :--- | :--- | :--- | | **Track** | Thousand Token Wood (`track:wood`) | Whimsical, delightful, AI-native app | | **Sponsor** | MiniCPM Build (`sponsor:openbmb`) | Used MiniCPM5-1B for LLM | | **Sponsor** | Codex (`sponsor:openai`) | Codex-attributed commits in the connected GitHub repo | | **Achievement** | Off the Grid (`achievement:offgrid`) | Both models run in-Space on ZeroGPU; no external AI API is called. And can also be run locally as shown even in demo | | **Bonus** | Tiny-titan (`bonus:tiny-titan`) | Models must be ≤ 4B parameters. (1B (minicpm5) + ~240M (bark-small) <= 4B) | ## Features - **Local LLM Integration**: Uses `MiniCPM5-1B` to generate responses locally. - **Text-to-Speech**: Uses the `bark-small` TTS model for voice generation. - **Custom Audio Synthesis**: The `synth.py` pipeline transforms audio into a droid-style output. - **Web Interface**: A Gradio web UI to easily chat with the robot. ## Requirements - Python >= 3.12 ## Setup & Installation Activate your virtual environment and install the required packages: ```bash source .venv/bin/activate pip install -r pyproject.toml # or install dependencies manually: gradio numpy scipy torch soundfile ``` ## Running the Application To start the Gradio UI locally: ```bash python app.py ``` The application will be served at `http://127.0.0.1:7860`. ## Project Structure - `app.py`: Gradio entrypoint serving the UI and handling the full pipeline. - `synth.py`: Audio-processing pipeline to transform standard WAVs into droid-style output. - `tts_model.py`: `bark-small` TTS integration. - `pyproject.toml`: Project metadata and dependencies.