Spaces:

build-small-hackathon
/

Voinal

Running on Zero

Voinal / README.md

GovIndLok

feat: update TTS to bark-small and other updates in associated project documentation, added links required for submission

2864ffb 18 days ago

preview code

Raw

History Blame Contribute Delete

2.8 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Samantic Audio
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.18.0
python_version: '3.10'
app_file: app.py
pinned: false
tags:
  - track:wood
  - sponsor:openbmb
  - sponsor:openai
  - achievement:offgrid
  - bonus:tiny-titan

Samantic Audio

Semantic-to-audio communication system that features a local robotic AI.

The application uses an LLM to generate responses, converts the response to speech using bark-small TTS, and processes the output through a custom synthesizer to create a unique "droid-style" voice.

Links

Resource	Link
Demo Video	YouTube
Social Post	X Post
Social Post	LinkedIn Post
GitHub Repository	GovIndLok/Voinal

Hackathon Badges Claimed

Category	Badge Name (Tag)	Description / Justification
Track	Thousand Token Wood (`track:wood`)	Whimsical, delightful, AI-native app
Sponsor	MiniCPM Build (`sponsor:openbmb`)	Used MiniCPM5-1B for LLM
Sponsor	Codex (`sponsor:openai`)	Codex-attributed commits in the connected GitHub repo
Achievement	Off the Grid (`achievement:offgrid`)	Both models run in-Space on ZeroGPU; no external AI API is called. And can also be run locally as shown even in demo
Bonus	Tiny-titan (`bonus:tiny-titan`)	Models must be ≤ 4B parameters. (1B (minicpm5) + ~240M (bark-small) <= 4B)

Features

Local LLM Integration: Uses MiniCPM5-1B to generate responses locally.
Text-to-Speech: Uses the bark-small TTS model for voice generation.
Custom Audio Synthesis: The synth.py pipeline transforms audio into a droid-style output.
Web Interface: A Gradio web UI to easily chat with the robot.

Requirements

Python >= 3.12

Setup & Installation

Activate your virtual environment and install the required packages:

source .venv/bin/activate
pip install -r pyproject.toml # or install dependencies manually: gradio numpy scipy torch soundfile

Running the Application

To start the Gradio UI locally:

python app.py

The application will be served at http://127.0.0.1:7860.

Project Structure

app.py: Gradio entrypoint serving the UI and handling the full pipeline.
synth.py: Audio-processing pipeline to transform standard WAVs into droid-style output.
tts_model.py: bark-small TTS integration.
pyproject.toml: Project metadata and dependencies.