File size: 2,045 Bytes
66094fd
a57e18f
8b6843e
a57e18f
 
 
8b6843e
298479c
a57e18f
66094fd
a57e18f
 
8b6843e
66094fd
 
a57e18f
66094fd
2d64c5e
 
8b6843e
 
 
 
 
2d64c5e
8b6843e
66094fd
a57e18f
 
 
 
8b6843e
 
 
a57e18f
 
 
 
 
8b6843e
a57e18f
 
 
 
 
 
 
 
 
 
8b6843e
a57e18f
 
 
 
 
 
2d64c5e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: Haseeb's TTS
emoji: πŸš€
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.54.0
python_version: '3.10'
app_file: app.py
pinned: false
license: apache-2.0
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/652ac2e92aa5b27c77cba196/6Y7vGO0SQfVaCj9CYXzzf.png
---

# 🎧 Haseeb's TTS (Audiobook MP3 Generator)

Generate audiobook-style narration using **Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice** with a Streamlit UI built for long chapters.

## Why `qwen-tts` instead of `transformers.pipeline()`?
The model uses the `qwen3_tts` architecture. Some Transformers builds in hosted environments may not recognize it.
This Space uses Qwen’s official **`qwen-tts`** package which supports:
- `generate_custom_voice(text, language, speaker, instruct, ...)`
- `get_supported_speakers()` / `get_supported_languages()`  

(As shown in Qwen’s official Qwen3-TTS repo docs.) :contentReference[oaicite:1]{index=1}

## Features
- βœ… **MP3 output** (no ffmpeg needed)
- βœ… **Batch mode**: upload multiple `.txt` files β†’ get multiple MP3s + **ZIP download**
- βœ… **Long chapters (10,000+ chars)** via chunking + stitching
- βœ… **Language Support** (dropdown; auto-populated from the model when possible)
- βœ… **Voices / Speakers** (auto-populated from the model when possible)
- βœ… **Instruction Control** (style/emotion/pacing)

## How to use

### Single chapter
1. Paste text (or upload a single `.txt`)
2. Choose language, speaker, instruction
3. Click **Generate MP3**

### Batch mode
1. Switch to **Batch mode**
2. Upload multiple `.txt` files (each file = one chapter)
3. Click **Generate MP3s (Batch)**
4. Download the ZIP containing all MP3 outputs

## Tips for audiobooks
- Chunk size: **1200–1800 chars** is usually stable for long narration.
- Silence between chunks: **200–350 ms** reduces audible joins.
- If memory is tight, reduce:
  - chunk size
  - `max_new_tokens`

## Files
- `app.py` β€” Streamlit UI + batch mode + MP3 encoding + chunking/stitching
- `requirements.txt` β€” dependencies