Firworks commited on
Commit
906942f
·
verified ·
1 Parent(s): 92d9d71

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -14,7 +14,7 @@ base_model:
14
 
15
  # 📘 About This Model
16
 
17
- This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights Qwen2-Audio–based multimodal model for audio understanding and reasoning.
18
  The original BF16 model requires ~67 GB VRAM.
19
 
20
  Step-Audio-R1 combines:
@@ -23,7 +23,7 @@ A high-capacity audio encoder
23
 
24
  A projection layer that maps audio features into the transformer
25
 
26
- A Qwen2 language backbone for reasoning and text generation
27
 
28
  The model is designed for:
29
 
@@ -62,7 +62,7 @@ The model supports:
62
  ✘ It does not synthesize audio
63
  ✘ It does not require pre-burned waveforms — any user-provided audio file works
64
 
65
- Check the original model card for information about this model for more info.
66
 
67
  # Running the model with VLLM in Docker
68
  It requires a specific vllm container released by the model authors.
 
14
 
15
  # 📘 About This Model
16
 
17
+ This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights Audio–based multimodal model for audio understanding and reasoning.
18
  The original BF16 model requires ~67 GB VRAM.
19
 
20
  Step-Audio-R1 combines:
 
23
 
24
  A projection layer that maps audio features into the transformer
25
 
26
+ A language backbone for reasoning and text generation
27
 
28
  The model is designed for:
29
 
 
62
  ✘ It does not synthesize audio
63
  ✘ It does not require pre-burned waveforms — any user-provided audio file works
64
 
65
+ Check the original model card for more information about this model.
66
 
67
  # Running the model with VLLM in Docker
68
  It requires a specific vllm container released by the model authors.