aiseosae
/

minerTTS

text-generation

Model card Files Files and versions

minerTTS / README.md

aiseosae's picture

Upload folder using huggingface_hub

87c19a6 verified 11 days ago

|

history blame contribute delete

3.25 kB

	---
	license: cc-by-nc-sa-4.0
	base_model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
	pipeline_tag: text-to-speech
	library_name: transformers
	language:
	- en
	tags:
	- tts
	- qwen3-tts
	- voice-design
	- prompttts
	- vocence
	- bittensor
	---



	Inference uses `qwen_tts.Qwen3TTSModel`, loaded from the repo root via `from_pretrained(this_folder)`.

	## Layout

	\| Path \| Role \|
	\|------\|------\|
	\| `config.json`, weights, tokenizer, codec dirs \| Qwen3-TTS snapshot (as shipped by the upstream model card) \|
	\| `miner.py` \| Vocence engine: `Miner`, `warmup()`, `generate_wav(instruction, text)` \|
	\| `vocence_config.yaml` \| Device, dtype, caps, language \|
	\| `chute_config.yml` \| Chutes image / GPU / scaling / TEE \|
	\| `demo.py` \| Optional local smoke test (if present) \|

	## Vocence API

	Validators call your deployed chute with JSON shaped like:

	```json
	{
	"text": "Words to speak.",
	"instruction": "gender: male \| pitch: mid \| speed: normal \| age_group: adult \| emotion: neutral \| tone: casual \| accent: us"
	}
	```

	The miner forwards `text` → `generate_voice_design(..., text=...)` and `instruction` → `instruct=...`, using `language` from config (default English).

	## Configure (`vocence_config.yaml`)

	\| Area \| Keys \|
	\|------\|------\|
	\| Runtime \| `device_preference` (`cuda` / `cpu`), `dtype` (`bfloat16` / `float32`), `use_flash_attention_2`, `default_language` \|
	\| Generation \| `sample_rate` (e.g. 24000), `max_seconds` \|
	\| Limits \| `max_text_chars`, `max_instruction_chars`, `default_language` \|

	Warmup runs one short `generate_voice_design` with a 180 s timeout.

	## Local quick test

	Install PyTorch (CUDA if available), then:

	```bash
	pip install "qwen-tts" pyyaml soundfile numpy
	```

	```python
	from pathlib import Path
	from miner import Miner

	miner = Miner(Path("."))
	miner.warmup()
	wave, sr = miner.generate_wav(
	instruction="A calm, clear narrator, neutral US accent.",
	text="Hello — this is a short synthesis check.",
	)
	```

	Or load the class directly from transformers-style layout:

	```python
	from qwen_tts import Qwen3TTSModel

	model = Qwen3TTSModel.from_pretrained(".") # or your HF repo id
	wavs, sr = model.generate_voice_design(
	text="Hello fellas.",
	instruct="Cute voice.",
	language="english",
	)
	```

	Replace `"."` with your HF repo id after upload, e.g. `"your-org/your-repo"`.

	## Chutes / Vocence deploy

	1. Push this layout to a Hugging Face model repo; pin a commit SHA for `VOCENCE_REVISION`.
	2. Render the canonical Vocence chute script with `VOCENCE_REPO`, `VOCENCE_REVISION`, `VOCENCE_CHUTES_USER`, `VOCENCE_CHUTE_ID`.
	3. `chutes build … --wait` then `chutes deploy … --accept-fee`.
	4. Commit on chain: `model_name`, `model_revision` (HF SHA), `chute_id` (UUID from Chutes).

	Chute name must contain `vocence` (case-insensitive). See `miner_sample/MINER_GUIDE.md` in the Vocence repo.

	## Training / fine-tuning

	Fine-tuning is done outside Chutes on your own GPU; export a full snapshot compatible with `Qwen3TTSModel.from_pretrained(...)`, then replace weights in this repo layout and push a new revision.

	## License

	CC BY-NC-SA 4.0 — see the license file in this repo. Respect upstream Qwen / Alibaba terms for the base checkpoint.