Spaces:

OnyxMunk
/

DAW_Sampler_Invader

Runtime error

App Files Files Community

DAW_Sampler_Invader / GEMINI.md

Keith

Initial commit for HF Space

e3f3734 about 1 month ago

preview code

raw

history blame contribute delete

3.17 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

TransformerPrime: Text-to-Audio (TTA) Pipeline

Project Overview

TransformerPrime is a high-performance, GPU-accelerated text-to-audio generation suite built on top of the Hugging Face transformers ecosystem. It is specifically optimized for NVIDIA RTX 40/50 series and datacenter GPUs (A100/H100/B200), targeting low-latency inference and efficient VRAM management (<10 GB for 1B-parameter models).

Core Technologies

Runtime: Python 3.10+, PyTorch 2.5.0+
Backbone: HF transformers (v4.57+), accelerate
Optimization: bitsandbytes (4-bit/8-bit quantization), FlashAttention-2
Interface: Gradio (Web UI), CLI (argparse)
Audio: soundfile, numpy

Architecture

The project is structured as a modular pipeline wrapper:

src/text_to_audio/pipeline.py: Core logic for model loading, inference, memory profiling, and streaming-style chunking.
src/text_to_audio/__init__.py: Public API surface (build_pipeline, TextToAudioPipeline).
demo.py: Unified entry point for the Gradio web interface and CLI operations.
tests/: Unit tests for pipeline configuration and logic (mocking model downloads).

Building and Running

Setup

# Install dependencies
pip install -r requirements.txt

# Optional: Ensure bitsandbytes is installed for quantization support
pip install bitsandbytes

Execution

Gradio Web UI (Default):

python demo.py --model csm-1b --quantize

CLI Mode:

python demo.py --cli --text "Hello from TransformerPrime." --output output.wav --quantize

Testing

# Run unit tests from the root directory
PYTHONPATH=. pytest tests/

Development Conventions

TransformerPrime Persona

When extending this codebase, adhere to the TransformerPrime persona (defined in .cursor/rules/TransformerPrime.mdc):

Precision: Never hallucinate config values or method signatures.
Modern Standards: Favor flash_attention_2 over eager implementations and bfloat16 over float16.
Performance First: Always consider VRAM footprint and Real-Time Factor (RTF). Use generate_with_profile() to validate changes.

Coding Style

Type Safety: Use Python type hints and from __future__ import annotations.
Configuration: Use dataclasses (e.g., PipelineConfig) for structured parameters.
Device Management: Use accelerate or torch.cuda.is_available() to handle device placement automatically (device_map="auto").
Quantization: Support bitsandbytes for 4-bit (nf4) and 8-bit loading to ensure compatibility with consumer GPUs.

Key Symbols

build_pipeline(): Primary factory for creating pipeline instances.
TextToAudioPipeline.generate_with_profile(): Returns both audio and performance metrics (VRAM, RTF).
TextToAudioPipeline.stream_chunks(): Generator for processing long audio outputs in fixed-duration slices.

Future Roadmap (TODO)

Add support for Kokoro-82M and Qwen3-TTS backends.
Implement speculative decoding for faster inference on large TTA models.
Add real-time streaming playback in the Gradio UI.