File size: 3,250 Bytes
87c19a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
license: cc-by-nc-sa-4.0
base_model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
pipeline_tag: text-to-speech
library_name: transformers
language:
  - en
tags:
  - tts
  - qwen3-tts
  - voice-design
  - prompttts
  - vocence
  - bittensor
---



Inference uses **`qwen_tts.Qwen3TTSModel`**, loaded from the repo root via `from_pretrained(this_folder)`.

## Layout

| Path | Role |
|------|------|
| `config.json`, weights, tokenizer, codec dirs | Qwen3-TTS snapshot (as shipped by the upstream model card) |
| `miner.py` | Vocence engine: `Miner`, `warmup()`, `generate_wav(instruction, text)` |
| `vocence_config.yaml` | Device, dtype, caps, language |
| `chute_config.yml` | Chutes image / GPU / scaling / TEE |
| `demo.py` | Optional local smoke test (if present) |

## Vocence API

Validators call your deployed chute with JSON shaped like:

```json
{
  "text": "Words to speak.",
  "instruction": "gender: male | pitch: mid | speed: normal | age_group: adult | emotion: neutral | tone: casual | accent: us"
}
```

The miner forwards **`text`** → `generate_voice_design(..., text=...)` and **`instruction`** → `instruct=...`, using **`language`** from config (default English).

## Configure (`vocence_config.yaml`)

| Area | Keys |
|------|------|
| Runtime | `device_preference` (`cuda` / `cpu`), `dtype` (`bfloat16` / `float32`), `use_flash_attention_2`, `default_language` |
| Generation | `sample_rate` (e.g. 24000), `max_seconds` |
| Limits | `max_text_chars`, `max_instruction_chars`, `default_language` |

Warmup runs one short `generate_voice_design` with a **180 s** timeout.

## Local quick test

Install PyTorch (CUDA if available), then:

```bash
pip install "qwen-tts" pyyaml soundfile numpy
```

```python
from pathlib import Path
from miner import Miner

miner = Miner(Path("."))
miner.warmup()
wave, sr = miner.generate_wav(
    instruction="A calm, clear narrator, neutral US accent.",
    text="Hello — this is a short synthesis check.",
)
```

Or load the class directly from transformers-style layout:

```python
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(".")  # or your HF repo id
wavs, sr = model.generate_voice_design(
    text="Hello fellas.",
    instruct="Cute voice.",
    language="english",
)
```

Replace `"."` with your HF repo id after upload, e.g. `"your-org/your-repo"`.

## Chutes / Vocence deploy

1. Push this layout to a Hugging Face **model** repo; pin a **commit SHA** for `VOCENCE_REVISION`.
2. Render the canonical Vocence chute script with `VOCENCE_REPO`, `VOCENCE_REVISION`, `VOCENCE_CHUTES_USER`, `VOCENCE_CHUTE_ID`.
3. `chutes build … --wait` then `chutes deploy … --accept-fee`.
4. Commit on chain: `model_name`, `model_revision` (HF SHA), `chute_id` (UUID from Chutes).

Chute **name** must contain **`vocence`** (case-insensitive). See **`miner_sample/MINER_GUIDE.md`** in the Vocence repo.

## Training / fine-tuning

Fine-tuning is done **outside** Chutes on your own GPU; export a full snapshot compatible with **`Qwen3TTSModel.from_pretrained(...)`**, then replace weights in this repo layout and push a new revision.

## License

**CC BY-NC-SA 4.0** — see the license file in this repo. Respect upstream Qwen / Alibaba terms for the base checkpoint.