Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available: 6.12.0
TransformerPrime: Text-to-Audio (TTA) Pipeline
Project Overview
TransformerPrime is a high-performance, GPU-accelerated text-to-audio generation suite built on top of the Hugging Face transformers ecosystem. It is specifically optimized for NVIDIA RTX 40/50 series and datacenter GPUs (A100/H100/B200), targeting low-latency inference and efficient VRAM management (<10 GB for 1B-parameter models).
Core Technologies
- Runtime: Python 3.10+, PyTorch 2.5.0+
- Backbone: HF
transformers(v4.57+),accelerate - Optimization:
bitsandbytes(4-bit/8-bit quantization), FlashAttention-2 - Interface: Gradio (Web UI), CLI (argparse)
- Audio:
soundfile,numpy
Architecture
The project is structured as a modular pipeline wrapper:
src/text_to_audio/pipeline.py: Core logic for model loading, inference, memory profiling, and streaming-style chunking.src/text_to_audio/__init__.py: Public API surface (build_pipeline,TextToAudioPipeline).demo.py: Unified entry point for the Gradio web interface and CLI operations.tests/: Unit tests for pipeline configuration and logic (mocking model downloads).
Building and Running
Setup
# Install dependencies
pip install -r requirements.txt
# Optional: Ensure bitsandbytes is installed for quantization support
pip install bitsandbytes
Execution
- Gradio Web UI (Default):
python demo.py --model csm-1b --quantize - CLI Mode:
python demo.py --cli --text "Hello from TransformerPrime." --output output.wav --quantize
Testing
# Run unit tests from the root directory
PYTHONPATH=. pytest tests/
Development Conventions
TransformerPrime Persona
When extending this codebase, adhere to the TransformerPrime persona (defined in .cursor/rules/TransformerPrime.mdc):
- Precision: Never hallucinate config values or method signatures.
- Modern Standards: Favor
flash_attention_2over eager implementations andbfloat16overfloat16. - Performance First: Always consider VRAM footprint and Real-Time Factor (RTF). Use
generate_with_profile()to validate changes.
Coding Style
- Type Safety: Use Python type hints and
from __future__ import annotations. - Configuration: Use
dataclasses(e.g.,PipelineConfig) for structured parameters. - Device Management: Use
accelerateortorch.cuda.is_available()to handle device placement automatically (device_map="auto"). - Quantization: Support
bitsandbytesfor 4-bit (nf4) and 8-bit loading to ensure compatibility with consumer GPUs.
Key Symbols
build_pipeline(): Primary factory for creating pipeline instances.TextToAudioPipeline.generate_with_profile(): Returns both audio and performance metrics (VRAM, RTF).TextToAudioPipeline.stream_chunks(): Generator for processing long audio outputs in fixed-duration slices.
Future Roadmap (TODO)
- Add support for Kokoro-82M and Qwen3-TTS backends.
- Implement speculative decoding for faster inference on large TTA models.
- Add real-time streaming playback in the Gradio UI.