Spaces:

Jekyll2000
/

MY_TTS

Sleeping

File size: 2,045 Bytes

66094fd
a57e18f
8b6843e
a57e18f
 
 
8b6843e
298479c
a57e18f
66094fd
a57e18f
 
8b6843e
66094fd
 
a57e18f
66094fd
2d64c5e
 
8b6843e
 
 
 
 
2d64c5e
8b6843e
66094fd
a57e18f
 
 
 
8b6843e
 
 
a57e18f
 
 
 
 
8b6843e
a57e18f
 
 
 
 
 
 
 
 
 
8b6843e
a57e18f
 
 
 
 
 
2d64c5e

---
title: Haseeb's TTS
emoji: 🚀
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.54.0
python_version: '3.10'
app_file: app.py
pinned: false
license: apache-2.0
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/652ac2e92aa5b27c77cba196/6Y7vGO0SQfVaCj9CYXzzf.png
---

# 🎧 Haseeb's TTS (Audiobook MP3 Generator)

Generate audiobook-style narration using **Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice** with a Streamlit UI built for long chapters.

## Why `qwen-tts` instead of `transformers.pipeline()`?
The model uses the `qwen3_tts` architecture. Some Transformers builds in hosted environments may not recognize it.
This Space uses Qwen’s official **`qwen-tts`** package which supports:
- `generate_custom_voice(text, language, speaker, instruct, ...)`
- `get_supported_speakers()` / `get_supported_languages()`  

(As shown in Qwen’s official Qwen3-TTS repo docs.) :contentReference[oaicite:1]{index=1}

## Features
- ✅ **MP3 output** (no ffmpeg needed)
- ✅ **Batch mode**: upload multiple `.txt` files → get multiple MP3s + **ZIP download**
- ✅ **Long chapters (10,000+ chars)** via chunking + stitching
- ✅ **Language Support** (dropdown; auto-populated from the model when possible)
- ✅ **Voices / Speakers** (auto-populated from the model when possible)
- ✅ **Instruction Control** (style/emotion/pacing)

## How to use

### Single chapter
1. Paste text (or upload a single `.txt`)
2. Choose language, speaker, instruction
3. Click **Generate MP3**

### Batch mode
1. Switch to **Batch mode**
2. Upload multiple `.txt` files (each file = one chapter)
3. Click **Generate MP3s (Batch)**
4. Download the ZIP containing all MP3 outputs

## Tips for audiobooks
- Chunk size: **1200–1800 chars** is usually stable for long narration.
- Silence between chunks: **200–350 ms** reduces audible joins.
- If memory is tight, reduce:
  - chunk size
  - `max_new_tokens`

## Files
- `app.py` — Streamlit UI + batch mode + MP3 encoding + chunking/stitching
- `requirements.txt` — dependencies