AlexWortega commited on
Commit
d4e16f8
·
verified ·
1 Parent(s): 8c1339a

Upload vllm_borealis/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. vllm_borealis/README.md +101 -0
vllm_borealis/README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # vLLM Plugin for Borealis
2
+
3
+ vLLM plugin to enable inference with Borealis Audio-Language Model.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pip install -e .
9
+ ```
10
+
11
+ ## Usage
12
+
13
+ After installation, the Borealis model will be automatically registered with vLLM.
14
+
15
+ ```python
16
+ import numpy as np
17
+ from vllm import LLM, SamplingParams
18
+
19
+ # Load model
20
+ llm = LLM(
21
+ model="Vikhrmodels/Borealis-5b-it",
22
+ trust_remote_code=True,
23
+ dtype="bfloat16",
24
+ limit_mm_per_prompt={"audio": 1},
25
+ )
26
+
27
+ # Load audio (16kHz expected)
28
+ import librosa
29
+ audio, sr = librosa.load("audio.wav", sr=16000)
30
+
31
+ # Create prompt with audio placeholder
32
+ prompt = "<|AUDIO|>Transcribe this audio."
33
+
34
+ # Inference
35
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
36
+ outputs = llm.generate(
37
+ {
38
+ "prompt": prompt,
39
+ "multi_modal_data": {"audio": audio},
40
+ },
41
+ sampling_params=sampling_params,
42
+ )
43
+
44
+ print(outputs[0].outputs[0].text)
45
+ ```
46
+
47
+ ### With Chat Template
48
+
49
+ ```python
50
+ from vllm import LLM, SamplingParams
51
+ import librosa
52
+
53
+ llm = LLM(
54
+ model="Vikhrmodels/Borealis-5b-it",
55
+ trust_remote_code=True,
56
+ dtype="bfloat16",
57
+ limit_mm_per_prompt={"audio": 1},
58
+ )
59
+
60
+ audio, sr = librosa.load("audio.wav", sr=16000)
61
+
62
+ messages = [
63
+ {"role": "system", "content": "You are a helpful voice assistant."},
64
+ {"role": "user", "content": "<|AUDIO|>What is being said in this audio?"},
65
+ ]
66
+
67
+ # Apply chat template
68
+ prompt = llm.get_tokenizer().apply_chat_template(
69
+ messages,
70
+ tokenize=False,
71
+ add_generation_prompt=True,
72
+ )
73
+
74
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
75
+ outputs = llm.generate(
76
+ {
77
+ "prompt": prompt,
78
+ "multi_modal_data": {"audio": audio},
79
+ },
80
+ sampling_params=sampling_params,
81
+ )
82
+
83
+ print(outputs[0].outputs[0].text)
84
+ ```
85
+
86
+ ## Architecture
87
+
88
+ Borealis combines:
89
+ - **Whisper Large V3** encoder for audio processing (1280-dim, 1500 frames)
90
+ - **Qwen3-4B** LLM for text generation (2560-dim hidden size)
91
+ - **Audio Adapter** that downsamples by 4x and projects to LLM space (375 tokens per 30s audio)
92
+
93
+ ## Model
94
+
95
+ - HuggingFace: [Vikhrmodels/Borealis-5b-it](https://huggingface.co/Vikhrmodels/Borealis-5b-it)
96
+
97
+ ## Requirements
98
+
99
+ - vLLM >= 0.12.0
100
+ - transformers
101
+ - torch