Firworks commited on
Commit
f5a3fdd
·
verified ·
1 Parent(s): 130497c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -8
README.md CHANGED
@@ -18,23 +18,15 @@ This is a quantized NVFP4 (W4A4) version of Step-Audio-R1, an open-weights Audio
18
  The original BF16 model requires ~67 GB VRAM.
19
 
20
  Step-Audio-R1 combines:
21
-
22
  A high-capacity audio encoder
23
-
24
  A projection layer that maps audio features into the transformer
25
-
26
  A language backbone for reasoning and text generation
27
 
28
  The model is designed for:
29
-
30
  Speech transcription and interpretation
31
-
32
  Emotional / tonal analysis
33
-
34
  Speaker characteristics
35
-
36
  Music and sound-scene understanding
37
-
38
  High-quality step-by-step reasoning about audio inputs
39
 
40
  It does not generate audio; it produces text based on audio input.
 
18
  The original BF16 model requires ~67 GB VRAM.
19
 
20
  Step-Audio-R1 combines:
 
21
  A high-capacity audio encoder
 
22
  A projection layer that maps audio features into the transformer
 
23
  A language backbone for reasoning and text generation
24
 
25
  The model is designed for:
 
26
  Speech transcription and interpretation
 
27
  Emotional / tonal analysis
 
28
  Speaker characteristics
 
29
  Music and sound-scene understanding
 
30
  High-quality step-by-step reasoning about audio inputs
31
 
32
  It does not generate audio; it produces text based on audio input.