Spaces:
Sleeping
Sleeping
File size: 2,045 Bytes
66094fd a57e18f 8b6843e a57e18f 8b6843e 298479c a57e18f 66094fd a57e18f 8b6843e 66094fd a57e18f 66094fd 2d64c5e 8b6843e 2d64c5e 8b6843e 66094fd a57e18f 8b6843e a57e18f 8b6843e a57e18f 8b6843e a57e18f 2d64c5e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ---
title: Haseeb's TTS
emoji: π
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.54.0
python_version: '3.10'
app_file: app.py
pinned: false
license: apache-2.0
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/652ac2e92aa5b27c77cba196/6Y7vGO0SQfVaCj9CYXzzf.png
---
# π§ Haseeb's TTS (Audiobook MP3 Generator)
Generate audiobook-style narration using **Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice** with a Streamlit UI built for long chapters.
## Why `qwen-tts` instead of `transformers.pipeline()`?
The model uses the `qwen3_tts` architecture. Some Transformers builds in hosted environments may not recognize it.
This Space uses Qwenβs official **`qwen-tts`** package which supports:
- `generate_custom_voice(text, language, speaker, instruct, ...)`
- `get_supported_speakers()` / `get_supported_languages()`
(As shown in Qwenβs official Qwen3-TTS repo docs.) :contentReference[oaicite:1]{index=1}
## Features
- β
**MP3 output** (no ffmpeg needed)
- β
**Batch mode**: upload multiple `.txt` files β get multiple MP3s + **ZIP download**
- β
**Long chapters (10,000+ chars)** via chunking + stitching
- β
**Language Support** (dropdown; auto-populated from the model when possible)
- β
**Voices / Speakers** (auto-populated from the model when possible)
- β
**Instruction Control** (style/emotion/pacing)
## How to use
### Single chapter
1. Paste text (or upload a single `.txt`)
2. Choose language, speaker, instruction
3. Click **Generate MP3**
### Batch mode
1. Switch to **Batch mode**
2. Upload multiple `.txt` files (each file = one chapter)
3. Click **Generate MP3s (Batch)**
4. Download the ZIP containing all MP3 outputs
## Tips for audiobooks
- Chunk size: **1200β1800 chars** is usually stable for long narration.
- Silence between chunks: **200β350 ms** reduces audible joins.
- If memory is tight, reduce:
- chunk size
- `max_new_tokens`
## Files
- `app.py` β Streamlit UI + batch mode + MP3 encoding + chunking/stitching
- `requirements.txt` β dependencies |