vox-models / README.md
addy-hypr4
feat: restructure repo to match runtime, add missing piper voices, and update README/manifest
c9658c4
---
license: apache-2.0
tags:
- voice
- stt
- tts
- llm
- vox
- real-time
- edge-inference
library_name: generic
---
# Vox Models
This repository serves as the official model host for **Vox**, a real-time, local-first voice-to-voice system. It contains specialized models for Voice Activity Detection (VAD), Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS).
## Directory Structure
The structure of this repository exactly mirrors the runtime expectations of the Vox backend:
```text
.
β”œβ”€β”€ manifest.json # Single source of truth for all models
β”œβ”€β”€ llm/
β”‚ └── gemma4/ # Large Language Models (GGUF)
β”œβ”€β”€ stt/
β”‚ └── qwen3-asr/ # Speech-to-Text (ONNX)
β”‚ └── tokenizer/ # STT Tokenizer configs
β”œβ”€β”€ tts/
β”‚ β”œβ”€β”€ kokoro/ # English TTS (Kokoro ONNX)
β”‚ └── piper_hi/ # Hindi TTS (Piper ONNX)
└── vad/
└── ten_vad.onnx # Voice Activity Detection (ONNX)
```
## Manifest
The `manifest.json` file in the root directory provides metadata for automated management, including:
- Relative file paths
- Exact byte sizes
- SHA256 hashes for integrity verification
- Archive markers for compressed assets (e.g., `espeak-ng-data`)
## Usage
These models are intended to be downloaded and managed by the Vox application runtime. For manual use, ensure you have [Git LFS](https://git-lfs.github.com/) installed to correctly retrieve the large model weights.