---
license: apache-2.0
tags:
- voice
- stt
- tts
- llm
- vox
- real-time
- edge-inference
library_name: generic
---

# Vox Models

This repository serves as the official model host for **Vox**, a real-time, local-first voice-to-voice system. It contains specialized models for Voice Activity Detection (VAD), Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS).

## Directory Structure

The structure of this repository exactly mirrors the runtime expectations of the Vox backend:

```text
.
├── manifest.json            # Single source of truth for all models
├── llm/
│   └── gemma4/             # Large Language Models (GGUF)
├── stt/
│   └── qwen3-asr/          # Speech-to-Text (ONNX)
│       └── tokenizer/      # STT Tokenizer configs
├── tts/
│   ├── kokoro/             # English TTS (Kokoro ONNX)
│   └── piper_hi/           # Hindi TTS (Piper ONNX)
└── vad/
    └── ten_vad.onnx        # Voice Activity Detection (ONNX)
```

## Manifest

The `manifest.json` file in the root directory provides metadata for automated management, including:
- Relative file paths
- Exact byte sizes
- SHA256 hashes for integrity verification
- Archive markers for compressed assets (e.g., `espeak-ng-data`)

## Usage

These models are intended to be downloaded and managed by the Vox application runtime. For manual use, ensure you have [Git LFS](https://git-lfs.github.com/) installed to correctly retrieve the large model weights.