YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Vocence PromptTTS Miner - Base Parler-TTS
This repository contains a Vocence-compliant miner using the base Parler-TTS Mini v1 model.
Model Details
- Model: parler-tts/parler-tts-mini-v1 (880M parameters)
- Source: HuggingFace (loaded at runtime)
- Sample Rate: 44.1kHz
- Instruction Format: Vocence-compliant voice trait descriptions
- No fine-tuning: Uses pretrained base model
Voice Traits Supported
The model follows Vocence instruction format:
- Gender: male, female, neutral
- Pitch: low, mid, high
- Speed: slow, normal, fast
- Age: child, young_adult, adult, senior
- Emotion: neutral, happy, sad, angry, calm, excited, serious, fearful
- Tone: warm, cold, friendly, formal, casual, authoritative
- Accent: us, uk, au, in, neutral, other
Instruction Format
A {emotion} {age_group} {gender} voice with {pitch} pitch, speaking at {speed} speed in a {tone} manner with a {accent} accent.
Example
A calm adult male voice with mid pitch, speaking at normal speed in a casual manner with a US accent.
Usage
The miner implements the Vocence contract:
from pathlib import Path
from miner import Miner
# Initialize (model downloads from HuggingFace on first run)
miner = Miner(Path("./"))
# Warmup
miner.warmup()
# Generate
instruction = "A calm adult male voice with mid pitch, speaking at normal speed in a casual manner with a neutral accent."
text = "Hello world, this is a test of the Vocence PromptTTS system."
waveform, sample_rate = miner.generate_wav(instruction, text)
Deployment
Requirements
- GPU with 16GB+ VRAM
- Python 3.12+
- Dependencies: torch, transformers, parler-tts, etc. (see chute_config.yml)
Quick Deploy to Chutes
- Upload this repo to HuggingFace
- Render the Vocence canonical wrapper template with your repo details
- Build and deploy to Chutes
- Register on Vocence subnet
See DEPLOYMENT_GUIDE.md for detailed steps.
Local A/B Evaluation
Use eval_ab.py to compare prompt-conditioning strategies and quickly tune decoding:
python eval_ab.py --mode rawpython eval_ab.py --mode conditionedpython eval_ab.py --mode both
Outputs are written to eval_outputs/:
- per-case WAV files for listening tests
metrics.csvwith duration, RMS, clipping ratio, and silence ratio
Performance Characteristics
Strengths
- ✅ High quality: Parler-TTS base model quality
- ✅ Proven reliability: Thoroughly tested base model
- ✅ Fast deployment: No training required
- ✅ Script accuracy: Good transcription quality
- ✅ Naturalness: Human-like speech
Limitations
- ⚠️ No fine-tuning: Not optimized for specific voice characteristics
- ⚠️ Generic traits: May not hit all trait combinations perfectly
- ⚠️ Baseline performance: Competitive but not cutting-edge
Expected Vocence Score
Estimated: 70-80%
- Script accuracy (30%): ~80-85% ⭐⭐⭐⭐
- Naturalness (15%): ~85-90% ⭐⭐⭐⭐
- Trait control (55%): ~65-75% ⭐⭐⭐
To reach 90%+ score, consider:
- Fine-tuning on multi-speaker datasets
- Adding voice trait classification training
- Optimizing for edge cases
API Contract
The miner provides:
__init__(path_hf_repo: Path)- Initialize with repo pathwarmup()- Optional warmup callgenerate_wav(instruction: str, text: str) -> tuple[np.ndarray, int]- Generate audio
Output format:
- Mono float32 numpy array
- 44.1kHz sample rate
- Values in range [-1, 1]
License
Based on Parler-TTS (Apache 2.0 License)
Support
For issues or questions:
- Parler-TTS: https://github.com/huggingface/parler-tts
- Vocence docs: Check subnet documentation
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support