Spaces:
Running on Zero
Running on Zero
GovIndLok
feat: update TTS to bark-small and other updates in associated project documentation, added links required for submission
2864ffb | title: Samantic Audio | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.18.0 | |
| python_version: '3.10' | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - track:wood | |
| - sponsor:openbmb | |
| - sponsor:openai | |
| - achievement:offgrid | |
| - bonus:tiny-titan | |
| # Samantic Audio | |
| Semantic-to-audio communication system that features a local robotic AI. | |
| The application uses an LLM to generate responses, converts the response to speech using bark-small TTS, and processes the output through a custom synthesizer to create a unique "droid-style" voice. | |
| ## Links | |
| | Resource | Link | | |
| | :--- | :--- | | |
| | **Demo Video** | [YouTube](https://youtu.be/ID0IG2BIBRo) | | |
| | **Social Post** | [X Post](https://x.com/GovindLokam/status/2066625722891026578?s=20) | | |
| | **Social Post** | [LinkedIn Post](https://www.linkedin.com/posts/govind-lokam-335382230_ai-llm-generativeai-share-7472390184009895936-6YOz/?utm_source=share&utm_medium=member_desktop&rcm=ACoAADm4O1kBjwPZCkmLC-YSFR8At-SNQhkj4XY) | | |
| | **GitHub Repository** | [GovIndLok/Voinal](https://github.com/GovIndLok/Voinal) | | |
| ## Hackathon Badges Claimed | |
| | Category | Badge Name (Tag) | Description / Justification | | |
| | :--- | :--- | :--- | | |
| | **Track** | Thousand Token Wood (`track:wood`) | Whimsical, delightful, AI-native app | | |
| | **Sponsor** | MiniCPM Build (`sponsor:openbmb`) | Used MiniCPM5-1B for LLM | | |
| | **Sponsor** | Codex (`sponsor:openai`) | Codex-attributed commits in the connected GitHub repo | | |
| | **Achievement** | Off the Grid (`achievement:offgrid`) | Both models run in-Space on ZeroGPU; no external AI API is called. And can also be run locally as shown even in demo | | |
| | **Bonus** | Tiny-titan (`bonus:tiny-titan`) | Models must be ≤ 4B parameters. (1B (minicpm5) + ~240M (bark-small) <= 4B) | | |
| ## Features | |
| - **Local LLM Integration**: Uses `MiniCPM5-1B` to generate responses locally. | |
| - **Text-to-Speech**: Uses the `bark-small` TTS model for voice generation. | |
| - **Custom Audio Synthesis**: The `synth.py` pipeline transforms audio into a droid-style output. | |
| - **Web Interface**: A Gradio web UI to easily chat with the robot. | |
| ## Requirements | |
| - Python >= 3.12 | |
| ## Setup & Installation | |
| Activate your virtual environment and install the required packages: | |
| ```bash | |
| source .venv/bin/activate | |
| pip install -r pyproject.toml # or install dependencies manually: gradio numpy scipy torch soundfile | |
| ``` | |
| ## Running the Application | |
| To start the Gradio UI locally: | |
| ```bash | |
| python app.py | |
| ``` | |
| The application will be served at `http://127.0.0.1:7860`. | |
| ## Project Structure | |
| - `app.py`: Gradio entrypoint serving the UI and handling the full pipeline. | |
| - `synth.py`: Audio-processing pipeline to transform standard WAVs into droid-style output. | |
| - `tts_model.py`: `bark-small` TTS integration. | |
| - `pyproject.toml`: Project metadata and dependencies. |