Spaces:

build-small-hackathon
/

Voinal

Running on Zero

Voinal / README.md

GovIndLok

feat: update TTS to bark-small and other updates in associated project documentation, added links required for submission

2864ffb 18 days ago

preview code

Raw

History Blame Contribute Delete

2.8 kB

	---
	title: Samantic Audio
	emoji: 🤖
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 6.18.0
	python_version: '3.10'
	app_file: app.py
	pinned: false
	tags:
	- track:wood
	- sponsor:openbmb
	- sponsor:openai
	- achievement:offgrid
	- bonus:tiny-titan
	---

	# Samantic Audio

	Semantic-to-audio communication system that features a local robotic AI.

	The application uses an LLM to generate responses, converts the response to speech using bark-small TTS, and processes the output through a custom synthesizer to create a unique "droid-style" voice.

	## Links

	\| Resource \| Link \|
	\| :--- \| :--- \|
	\| Demo Video \| [YouTube](https://youtu.be/ID0IG2BIBRo) \|
	\| Social Post \| [X Post](https://x.com/GovindLokam/status/2066625722891026578?s=20) \|
	\| Social Post \| [LinkedIn Post](https://www.linkedin.com/posts/govind-lokam-335382230_ai-llm-generativeai-share-7472390184009895936-6YOz/?utm_source=share&utm_medium=member_desktop&rcm=ACoAADm4O1kBjwPZCkmLC-YSFR8At-SNQhkj4XY) \|
	\| GitHub Repository \| [GovIndLok/Voinal](https://github.com/GovIndLok/Voinal) \|

	## Hackathon Badges Claimed

	\| Category \| Badge Name (Tag) \| Description / Justification \|
	\| :--- \| :--- \| :--- \|
	\| Track \| Thousand Token Wood (`track:wood`) \| Whimsical, delightful, AI-native app \|
	\| Sponsor \| MiniCPM Build (`sponsor:openbmb`) \| Used MiniCPM5-1B for LLM \|
	\| Sponsor \| Codex (`sponsor:openai`) \| Codex-attributed commits in the connected GitHub repo \|
	\| Achievement \| Off the Grid (`achievement:offgrid`) \| Both models run in-Space on ZeroGPU; no external AI API is called. And can also be run locally as shown even in demo \|
	\| Bonus \| Tiny-titan (`bonus:tiny-titan`) \| Models must be ≤ 4B parameters. (1B (minicpm5) + ~240M (bark-small) <= 4B) \|

	## Features
	- Local LLM Integration: Uses `MiniCPM5-1B` to generate responses locally.
	- Text-to-Speech: Uses the `bark-small` TTS model for voice generation.
	- Custom Audio Synthesis: The `synth.py` pipeline transforms audio into a droid-style output.
	- Web Interface: A Gradio web UI to easily chat with the robot.

	## Requirements
	- Python >= 3.12

	## Setup & Installation
	Activate your virtual environment and install the required packages:

	```bash
	source .venv/bin/activate
	pip install -r pyproject.toml # or install dependencies manually: gradio numpy scipy torch soundfile
	```

	## Running the Application
	To start the Gradio UI locally:

	```bash
	python app.py
	```

	The application will be served at `http://127.0.0.1:7860`.

	## Project Structure
	- `app.py`: Gradio entrypoint serving the UI and handling the full pipeline.
	- `synth.py`: Audio-processing pipeline to transform standard WAVs into droid-style output.
	- `tts_model.py`: `bark-small` TTS integration.
	- `pyproject.toml`: Project metadata and dependencies.