File size: 4,811 Bytes
d790e60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229e943
d790e60
7137822
 
 
 
 
 
229e943
 
7137822
 
 
06e2db7
d790e60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229e943
d790e60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59b2353
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d790e60
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
library_name: peft
license: cc-by-nc-4.0
language:
  - en
tags:
  - peft
  - safetensors
  - lora
  - complexity-classification
  - llm-routing
  - query-difficulty
  - brick
  - text-classification
  - semantic-router
  - inference-optimization
  - cost-reduction
  - reasoning-budget
base_model: Qwen/Qwen3.5-0.8B
pipeline_tag: text-classification
---

<div align="center">

# Brick Complexity Classifier v2: `max`

</div>

## What is this?

Classifier v2 is a family of small adapters that score each incoming prompt as **`easy` / `medium` / `hard`**, so a router can send it to the right tier of a model pool. Two variants optimize for different goals:

- **`eco`**: optimized for **cost**. Biases predictions toward `easy` so most traffic stays on the cheap tier. Use when the cost-per-query bill matters more than squeezing the last accuracy point.
- **`max`**: optimized for **routing accuracy**. Gives the sharpest easy/medium/hard split, so hard queries reliably reach the strongest tier and easy ones stay cheap. Use when answer quality is paramount.

<div align="center">

Maximum-accuracy variant tuned to classify query complexity as precisely as possible. Prioritizes routing quality over cost.

**[Regolo.ai](https://regolo.ai) | [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1)**

[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
[![Base Model](https://img.shields.io/badge/Base-Qwen3.5--0.8B-blue)](https://huggingface.co/Qwen/Qwen3.5-0.8B)

</div>

---

## Model Details

| Property | Value |
|---|---|
| **Variant** | `max` |
| **Target** | Best classification accuracy, maximum routing quality |
| **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
| **Adapter type** | LoRA (r=32, α=32, dropout=0.1) |
| **Output classes** | 3 (`easy`, `medium`, `hard`) |
| **License** | CC BY-NC 4.0 |

## Available Formats

| Format | Link |
|---|---|
| LoRA adapter | [regolo/brick-complexity-2-max](https://huggingface.co/regolo/brick-complexity-2-max) |
| GGUF BF16 | [regolo/brick-complexity-2-max-BF16-GGUF](https://huggingface.co/regolo/brick-complexity-2-max-BF16-GGUF) |
| GGUF Q8_0 | [regolo/brick-complexity-2-max-Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-2-max-Q8_0-GGUF) |
| GGUF Q4_K_M | [regolo/brick-complexity-2-max-Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-2-max-Q4_K_M-GGUF) |

## Usage (PEFT)

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype=torch.bfloat16)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-0.8B")
model = PeftModel.from_pretrained(base, "regolo/brick-complexity-2-max").eval()

system = """You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard."""
prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Design a distributed consensus algorithm<|im_end|>\n<|im_start|>assistant\n"
ids = tok(prompt, return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=3, do_sample=False)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip())
# Output: hard
```

## Usage (vLLM)

```python
from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(
    model="Qwen/Qwen3.5-0.8B",
    enable_lora=True,
    max_lora_rank=32,
    dtype="bfloat16",
)
sp = SamplingParams(temperature=0, max_tokens=3)

system = """You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard."""
prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Explain the rendering equation from radiometric first principles<|im_end|>\n<|im_start|>assistant\n"

out = llm.generate(
    [prompt],
    sp,
    lora_request=LoRARequest("brick-complexity-2-max", 1, "regolo/brick-complexity-2-max"),
)
print(out[0].outputs[0].text.strip())
# Output: hard
```

## About Brick

[Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.

**[Website](https://regolo.ai) | [Docs](https://docs.regolo.ai) | [GitHub](https://github.com/regolo-ai) | [Discord](https://discord.gg/myuuVFcfJw)**