--- license: apache-2.0 tags: - voice - stt - tts - llm - vox - real-time - edge-inference library_name: generic --- # Vox Models This repository serves as the official model host for **Vox**, a real-time, local-first voice-to-voice system. It contains specialized models for Voice Activity Detection (VAD), Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS). ## Directory Structure The structure of this repository exactly mirrors the runtime expectations of the Vox backend: ```text . ├── manifest.json # Single source of truth for all models ├── llm/ │ └── gemma4/ # Large Language Models (GGUF) ├── stt/ │ └── qwen3-asr/ # Speech-to-Text (ONNX) │ └── tokenizer/ # STT Tokenizer configs ├── tts/ │ ├── kokoro/ # English TTS (Kokoro ONNX) │ └── piper_hi/ # Hindi TTS (Piper ONNX) └── vad/ └── ten_vad.onnx # Voice Activity Detection (ONNX) ``` ## Manifest The `manifest.json` file in the root directory provides metadata for automated management, including: - Relative file paths - Exact byte sizes - SHA256 hashes for integrity verification - Archive markers for compressed assets (e.g., `espeak-ng-data`) ## Usage These models are intended to be downloaded and managed by the Vox application runtime. For manual use, ensure you have [Git LFS](https://git-lfs.github.com/) installed to correctly retrieve the large model weights.