massaindustries commited on
Commit
59b2353
·
verified ·
1 Parent(s): 229e943

Update model card: remove specific-LLM references, clarify variant purpose

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -87,6 +87,34 @@ print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip())
87
  # Output: hard
88
  ```
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ## About Brick
91
 
92
  [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.
 
87
  # Output: hard
88
  ```
89
 
90
+ ## Usage (vLLM)
91
+
92
+ ```python
93
+ from vllm import LLM, SamplingParams
94
+ from vllm.lora.request import LoRARequest
95
+
96
+ llm = LLM(
97
+ model="Qwen/Qwen3.5-0.8B",
98
+ enable_lora=True,
99
+ max_lora_rank=32,
100
+ dtype="bfloat16",
101
+ )
102
+ sp = SamplingParams(temperature=0, max_tokens=3)
103
+
104
+ system = """You are a query difficulty classifier for an LLM routing system.
105
+ Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
106
+ Respond with ONLY one word: easy, medium, or hard."""
107
+ prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Explain the rendering equation from radiometric first principles<|im_end|>\n<|im_start|>assistant\n"
108
+
109
+ out = llm.generate(
110
+ [prompt],
111
+ sp,
112
+ lora_request=LoRARequest("brick-complexity-2-max", 1, "regolo/brick-complexity-2-max"),
113
+ )
114
+ print(out[0].outputs[0].text.strip())
115
+ # Output: hard
116
+ ```
117
+
118
  ## About Brick
119
 
120
  [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.