File size: 1,527 Bytes
c622461
 
c9658c4
 
 
 
 
 
 
 
 
c622461
c9658c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
license: apache-2.0
tags:
- voice
- stt
- tts
- llm
- vox
- real-time
- edge-inference
library_name: generic
---

# Vox Models

This repository serves as the official model host for **Vox**, a real-time, local-first voice-to-voice system. It contains specialized models for Voice Activity Detection (VAD), Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS).

## Directory Structure

The structure of this repository exactly mirrors the runtime expectations of the Vox backend:

```text
.
β”œβ”€β”€ manifest.json            # Single source of truth for all models
β”œβ”€β”€ llm/
β”‚   └── gemma4/             # Large Language Models (GGUF)
β”œβ”€β”€ stt/
β”‚   └── qwen3-asr/          # Speech-to-Text (ONNX)
β”‚       └── tokenizer/      # STT Tokenizer configs
β”œβ”€β”€ tts/
β”‚   β”œβ”€β”€ kokoro/             # English TTS (Kokoro ONNX)
β”‚   └── piper_hi/           # Hindi TTS (Piper ONNX)
└── vad/
    └── ten_vad.onnx        # Voice Activity Detection (ONNX)
```

## Manifest

The `manifest.json` file in the root directory provides metadata for automated management, including:
- Relative file paths
- Exact byte sizes
- SHA256 hashes for integrity verification
- Archive markers for compressed assets (e.g., `espeak-ng-data`)

## Usage

These models are intended to be downloaded and managed by the Vox application runtime. For manual use, ensure you have [Git LFS](https://git-lfs.github.com/) installed to correctly retrieve the large model weights.