Spaces:
Runtime error
Runtime error
File size: 3,170 Bytes
e3f3734 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # TransformerPrime: Text-to-Audio (TTA) Pipeline
## Project Overview
**TransformerPrime** is a high-performance, GPU-accelerated text-to-audio generation suite built on top of the Hugging Face `transformers` ecosystem. It is specifically optimized for NVIDIA RTX 40/50 series and datacenter GPUs (A100/H100/B200), targeting low-latency inference and efficient VRAM management (<10 GB for 1B-parameter models).
### Core Technologies
- **Runtime:** Python 3.10+, PyTorch 2.5.0+
- **Backbone:** HF `transformers` (v4.57+), `accelerate`
- **Optimization:** `bitsandbytes` (4-bit/8-bit quantization), FlashAttention-2
- **Interface:** Gradio (Web UI), CLI (argparse)
- **Audio:** `soundfile`, `numpy`
### Architecture
The project is structured as a modular pipeline wrapper:
- `src/text_to_audio/pipeline.py`: Core logic for model loading, inference, memory profiling, and streaming-style chunking.
- `src/text_to_audio/__init__.py`: Public API surface (`build_pipeline`, `TextToAudioPipeline`).
- `demo.py`: Unified entry point for the Gradio web interface and CLI operations.
- `tests/`: Unit tests for pipeline configuration and logic (mocking model downloads).
---
## Building and Running
### Setup
```bash
# Install dependencies
pip install -r requirements.txt
# Optional: Ensure bitsandbytes is installed for quantization support
pip install bitsandbytes
```
### Execution
- **Gradio Web UI (Default):**
```bash
python demo.py --model csm-1b --quantize
```
- **CLI Mode:**
```bash
python demo.py --cli --text "Hello from TransformerPrime." --output output.wav --quantize
```
### Testing
```bash
# Run unit tests from the root directory
PYTHONPATH=. pytest tests/
```
---
## Development Conventions
### TransformerPrime Persona
When extending this codebase, adhere to the **TransformerPrime** persona (defined in `.cursor/rules/TransformerPrime.mdc`):
- **Precision:** Never hallucinate config values or method signatures.
- **Modern Standards:** Favor `flash_attention_2` over eager implementations and `bfloat16` over `float16`.
- **Performance First:** Always consider VRAM footprint and Real-Time Factor (RTF). Use `generate_with_profile()` to validate changes.
### Coding Style
- **Type Safety:** Use Python type hints and `from __future__ import annotations`.
- **Configuration:** Use `dataclasses` (e.g., `PipelineConfig`) for structured parameters.
- **Device Management:** Use `accelerate` or `torch.cuda.is_available()` to handle device placement automatically (`device_map="auto"`).
- **Quantization:** Support `bitsandbytes` for 4-bit (`nf4`) and 8-bit loading to ensure compatibility with consumer GPUs.
### Key Symbols
- `build_pipeline()`: Primary factory for creating pipeline instances.
- `TextToAudioPipeline.generate_with_profile()`: Returns both audio and performance metrics (VRAM, RTF).
- `TextToAudioPipeline.stream_chunks()`: Generator for processing long audio outputs in fixed-duration slices.
---
## Future Roadmap (TODO)
- [ ] Add support for Kokoro-82M and Qwen3-TTS backends.
- [ ] Implement speculative decoding for faster inference on large TTA models.
- [ ] Add real-time streaming playback in the Gradio UI.
|