Voinal / README.md
GovIndLok
feat: update TTS to bark-small and other updates in associated project documentation, added links required for submission
2864ffb
|
Raw
History Blame Contribute Delete
2.8 kB
---
title: Samantic Audio
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.18.0
python_version: '3.10'
app_file: app.py
pinned: false
tags:
- track:wood
- sponsor:openbmb
- sponsor:openai
- achievement:offgrid
- bonus:tiny-titan
---
# Samantic Audio
Semantic-to-audio communication system that features a local robotic AI.
The application uses an LLM to generate responses, converts the response to speech using bark-small TTS, and processes the output through a custom synthesizer to create a unique "droid-style" voice.
## Links
| Resource | Link |
| :--- | :--- |
| **Demo Video** | [YouTube](https://youtu.be/ID0IG2BIBRo) |
| **Social Post** | [X Post](https://x.com/GovindLokam/status/2066625722891026578?s=20) |
| **Social Post** | [LinkedIn Post](https://www.linkedin.com/posts/govind-lokam-335382230_ai-llm-generativeai-share-7472390184009895936-6YOz/?utm_source=share&utm_medium=member_desktop&rcm=ACoAADm4O1kBjwPZCkmLC-YSFR8At-SNQhkj4XY) |
| **GitHub Repository** | [GovIndLok/Voinal](https://github.com/GovIndLok/Voinal) |
## Hackathon Badges Claimed
| Category | Badge Name (Tag) | Description / Justification |
| :--- | :--- | :--- |
| **Track** | Thousand Token Wood (`track:wood`) | Whimsical, delightful, AI-native app |
| **Sponsor** | MiniCPM Build (`sponsor:openbmb`) | Used MiniCPM5-1B for LLM |
| **Sponsor** | Codex (`sponsor:openai`) | Codex-attributed commits in the connected GitHub repo |
| **Achievement** | Off the Grid (`achievement:offgrid`) | Both models run in-Space on ZeroGPU; no external AI API is called. And can also be run locally as shown even in demo |
| **Bonus** | Tiny-titan (`bonus:tiny-titan`) | Models must be ≤ 4B parameters. (1B (minicpm5) + ~240M (bark-small) <= 4B) |
## Features
- **Local LLM Integration**: Uses `MiniCPM5-1B` to generate responses locally.
- **Text-to-Speech**: Uses the `bark-small` TTS model for voice generation.
- **Custom Audio Synthesis**: The `synth.py` pipeline transforms audio into a droid-style output.
- **Web Interface**: A Gradio web UI to easily chat with the robot.
## Requirements
- Python >= 3.12
## Setup & Installation
Activate your virtual environment and install the required packages:
```bash
source .venv/bin/activate
pip install -r pyproject.toml # or install dependencies manually: gradio numpy scipy torch soundfile
```
## Running the Application
To start the Gradio UI locally:
```bash
python app.py
```
The application will be served at `http://127.0.0.1:7860`.
## Project Structure
- `app.py`: Gradio entrypoint serving the UI and handling the full pipeline.
- `synth.py`: Audio-processing pipeline to transform standard WAVs into droid-style output.
- `tts_model.py`: `bark-small` TTS integration.
- `pyproject.toml`: Project metadata and dependencies.