Firworks
/

Step-Audio-R1-nvfp4

8-bit precision

compressed-tensors

Model card Files Files and versions

Firworks commited on Nov 30, 2025

Commit

906942f

·

verified ·

1 Parent(s): 92d9d71

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ base_model:
 # 📘 About This Model
-This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights Qwen2-Audio–based multimodal model for audio understanding and reasoning.
 The original BF16 model requires ~67 GB VRAM.
 Step-Audio-R1 combines:
@@ -23,7 +23,7 @@ A high-capacity audio encoder
 A projection layer that maps audio features into the transformer
-A Qwen2 language backbone for reasoning and text generation
 The model is designed for:
@@ -62,7 +62,7 @@ The model supports:
 ✘ It does not synthesize audio
 ✘ It does not require pre-burned waveforms — any user-provided audio file works
-Check the original model card for information about this model for more info.
 # Running the model with VLLM in Docker
 It requires a specific vllm container released by the model authors.

 # 📘 About This Model
+This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights Audio–based multimodal model for audio understanding and reasoning.
 The original BF16 model requires ~67 GB VRAM.
 Step-Audio-R1 combines:
 A projection layer that maps audio features into the transformer
+A language backbone for reasoning and text generation
 The model is designed for:
 ✘ It does not synthesize audio
 ✘ It does not require pre-burned waveforms — any user-provided audio file works
+Check the original model card for more information about this model.
 # Running the model with VLLM in Docker
 It requires a specific vllm container released by the model authors.