Firworks
/

Step-Audio-R1-nvfp4

8-bit precision

compressed-tensors

Model card Files Files and versions

Firworks commited on Nov 30, 2025

Commit

f5a3fdd

·

verified ·

1 Parent(s): 130497c

Update README.md

Files changed (1) hide show

README.md +0 -8

README.md CHANGED Viewed

@@ -18,23 +18,15 @@ This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights Audio
 The original BF16 model requires ~67 GB VRAM.
 Step-Audio-R1 combines:
 A high-capacity audio encoder
 A projection layer that maps audio features into the transformer
 A language backbone for reasoning and text generation
 The model is designed for:
 Speech transcription and interpretation
 Emotional / tonal analysis
 Speaker characteristics
 Music and sound-scene understanding
 High-quality step-by-step reasoning about audio inputs
 It does not generate audio; it produces text based on audio input.

 The original BF16 model requires ~67 GB VRAM.
 Step-Audio-R1 combines:
 A high-capacity audio encoder
 A projection layer that maps audio features into the transformer
 A language backbone for reasoning and text generation
 The model is designed for:
 Speech transcription and interpretation
 Emotional / tonal analysis
 Speaker characteristics
 Music and sound-scene understanding
 High-quality step-by-step reasoning about audio inputs
 It does not generate audio; it produces text based on audio input.