AlexWortega commited on
Commit
ea02e16
·
verified ·
1 Parent(s): 86a7cc2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +99 -11
README.md CHANGED
@@ -15,17 +15,7 @@ library_name: transformers
15
 
16
  # Borealis-5B-IT
17
 
18
- ## Benchmarks
19
-
20
-
21
- | Split | WER | CER | Samples |
22
- |--------------------------|--------|--------|---------|
23
- | Russian_LibriSpeech | 6.63% | 3.49% | 1000 |
24
- | Common_Voice_Corpus_22.0 | 8.88% | 5.04% | 1000 |
25
- | Tone_Webinars | 56.87% | 52.47% | 1000 |
26
- | Tone_Books | 6.03% | 3.75% | 1000 |
27
- | Tone_Speak | 4.63% | 3.38% | 700 |
28
- | Sova_RuDevices | 17.28% | 8.03% | 1000 |
29
 
30
  ## Model Description
31
 
@@ -161,6 +151,104 @@ Audio Input (16kHz)
161
  Text Output
162
  ```
163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
  ## Limitations
165
 
166
  - Optimized for audio up to 30 seconds
 
15
 
16
  # Borealis-5B-IT
17
 
18
+ Borealis is an audio-language model that combines Whisper encoder with Qwen3-4B LLM for speech understanding and instruction-following tasks.
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Model Description
21
 
 
151
  Text Output
152
  ```
153
 
154
+ ## vLLM Support
155
+
156
+ Borealis can be accelerated using vLLM for the text generation backbone. Since Borealis uses custom audio processing (Whisper encoder + adapter), we provide a hybrid approach.
157
+
158
+ ### Install vLLM
159
+
160
+ ```bash
161
+ pip install vllm>=0.6.0
162
+ ```
163
+
164
+ ### Option 1: Text-only with vLLM (Qwen3-4B backbone)
165
+
166
+ If you've already processed audio to text (e.g., via ASR), you can use vLLM directly with the Qwen3 backbone:
167
+
168
+ ```python
169
+ from vllm import LLM, SamplingParams
170
+
171
+ llm = LLM(
172
+ model="Qwen/Qwen3-4B",
173
+ dtype="bfloat16",
174
+ gpu_memory_utilization=0.8,
175
+ )
176
+
177
+ prompt = """<|im_start|>system
178
+ You are a helpful voice assistant.<|im_end|>
179
+ <|im_start|>user
180
+ [Transcribed text from audio goes here]<|im_end|>
181
+ <|im_start|>assistant
182
+ """
183
+
184
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
185
+ outputs = llm.generate([prompt], sampling_params)
186
+ print(outputs[0].outputs[0].text)
187
+ ```
188
+
189
+ ### Option 2: Hybrid Inference (HF Audio + vLLM Text)
190
+
191
+ For maximum performance, use HuggingFace for audio encoding and vLLM for text generation:
192
+
193
+ ```python
194
+ import torch
195
+ import torchaudio
196
+ from transformers import AutoModel
197
+ from vllm import LLM, SamplingParams
198
+
199
+ # Step 1: Load Borealis for audio encoding
200
+ borealis = AutoModel.from_pretrained(
201
+ "Vikhrmodels/Borealis-5b-it",
202
+ trust_remote_code=True,
203
+ device="cuda"
204
+ )
205
+ borealis.eval()
206
+
207
+ # Step 2: Load vLLM for text generation
208
+ vllm_model = LLM(
209
+ model="Qwen/Qwen3-4B",
210
+ dtype="bfloat16",
211
+ gpu_memory_utilization=0.5,
212
+ )
213
+
214
+ # Step 3: Encode audio with Borealis
215
+ audio, sr = torchaudio.load("audio.wav")
216
+ if sr != 16000:
217
+ audio = torchaudio.functional.resample(audio, sr, 16000)
218
+ audio = audio.squeeze()
219
+
220
+ with torch.inference_mode():
221
+ # Get audio transcription/understanding from Borealis
222
+ output_ids = borealis.generate(
223
+ audio=audio,
224
+ user_prompt="Transcribe: <|start_of_audio|><|end_of_audio|>",
225
+ system_prompt="You are a speech recognition assistant.",
226
+ max_new_tokens=128,
227
+ )
228
+ transcription = borealis.decode(output_ids[0])
229
+
230
+ # Step 4: Use vLLM for fast follow-up generation
231
+ prompt = f"""<|im_start|>system
232
+ You are a helpful assistant.<|im_end|>
233
+ <|im_start|>user
234
+ Based on this audio transcription: "{transcription}"
235
+ Please provide a detailed summary.<|im_end|>
236
+ <|im_start|>assistant
237
+ """
238
+
239
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
240
+ outputs = vllm_model.generate([prompt], sampling_params)
241
+ print(outputs[0].outputs[0].text)
242
+ ```
243
+
244
+ ### Benchmark Results
245
+
246
+ | Method | Throughput | Notes |
247
+ |--------|------------|-------|
248
+ | Native HF (Borealis) | 32.6 tok/s | Full audio-to-text pipeline |
249
+ | vLLM (Qwen3-4B) | 201.4 tok/s | Text-only, 6.18x faster |
250
+ | Hybrid | ~150 tok/s | Audio encoding + vLLM text gen |
251
+
252
  ## Limitations
253
 
254
  - Optimized for audio up to 30 seconds