Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ base_model:
|
|
| 14 |
|
| 15 |
# 📘 About This Model
|
| 16 |
|
| 17 |
-
This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights
|
| 18 |
The original BF16 model requires ~67 GB VRAM.
|
| 19 |
|
| 20 |
Step-Audio-R1 combines:
|
|
@@ -23,7 +23,7 @@ A high-capacity audio encoder
|
|
| 23 |
|
| 24 |
A projection layer that maps audio features into the transformer
|
| 25 |
|
| 26 |
-
A
|
| 27 |
|
| 28 |
The model is designed for:
|
| 29 |
|
|
@@ -62,7 +62,7 @@ The model supports:
|
|
| 62 |
✘ It does not synthesize audio
|
| 63 |
✘ It does not require pre-burned waveforms — any user-provided audio file works
|
| 64 |
|
| 65 |
-
Check the original model card for information about this model
|
| 66 |
|
| 67 |
# Running the model with VLLM in Docker
|
| 68 |
It requires a specific vllm container released by the model authors.
|
|
|
|
| 14 |
|
| 15 |
# 📘 About This Model
|
| 16 |
|
| 17 |
+
This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights Audio–based multimodal model for audio understanding and reasoning.
|
| 18 |
The original BF16 model requires ~67 GB VRAM.
|
| 19 |
|
| 20 |
Step-Audio-R1 combines:
|
|
|
|
| 23 |
|
| 24 |
A projection layer that maps audio features into the transformer
|
| 25 |
|
| 26 |
+
A language backbone for reasoning and text generation
|
| 27 |
|
| 28 |
The model is designed for:
|
| 29 |
|
|
|
|
| 62 |
✘ It does not synthesize audio
|
| 63 |
✘ It does not require pre-burned waveforms — any user-provided audio file works
|
| 64 |
|
| 65 |
+
Check the original model card for more information about this model.
|
| 66 |
|
| 67 |
# Running the model with VLLM in Docker
|
| 68 |
It requires a specific vllm container released by the model authors.
|