Voinal / README.md
GovIndLok
feat: update TTS to bark-small and other updates in associated project documentation, added links required for submission
2864ffb
|
Raw
History Blame Contribute Delete
2.8 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Samantic Audio
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.18.0
python_version: '3.10'
app_file: app.py
pinned: false
tags:
  - track:wood
  - sponsor:openbmb
  - sponsor:openai
  - achievement:offgrid
  - bonus:tiny-titan

Samantic Audio

Semantic-to-audio communication system that features a local robotic AI.

The application uses an LLM to generate responses, converts the response to speech using bark-small TTS, and processes the output through a custom synthesizer to create a unique "droid-style" voice.

Links

Resource Link
Demo Video YouTube
Social Post X Post
Social Post LinkedIn Post
GitHub Repository GovIndLok/Voinal

Hackathon Badges Claimed

Category Badge Name (Tag) Description / Justification
Track Thousand Token Wood (track:wood) Whimsical, delightful, AI-native app
Sponsor MiniCPM Build (sponsor:openbmb) Used MiniCPM5-1B for LLM
Sponsor Codex (sponsor:openai) Codex-attributed commits in the connected GitHub repo
Achievement Off the Grid (achievement:offgrid) Both models run in-Space on ZeroGPU; no external AI API is called. And can also be run locally as shown even in demo
Bonus Tiny-titan (bonus:tiny-titan) Models must be ≤ 4B parameters. (1B (minicpm5) + ~240M (bark-small) <= 4B)

Features

  • Local LLM Integration: Uses MiniCPM5-1B to generate responses locally.
  • Text-to-Speech: Uses the bark-small TTS model for voice generation.
  • Custom Audio Synthesis: The synth.py pipeline transforms audio into a droid-style output.
  • Web Interface: A Gradio web UI to easily chat with the robot.

Requirements

  • Python >= 3.12

Setup & Installation

Activate your virtual environment and install the required packages:

source .venv/bin/activate
pip install -r pyproject.toml # or install dependencies manually: gradio numpy scipy torch soundfile

Running the Application

To start the Gradio UI locally:

python app.py

The application will be served at http://127.0.0.1:7860.

Project Structure

  • app.py: Gradio entrypoint serving the UI and handling the full pipeline.
  • synth.py: Audio-processing pipeline to transform standard WAVs into droid-style output.
  • tts_model.py: bark-small TTS integration.
  • pyproject.toml: Project metadata and dependencies.