massaindustries commited on
Commit
ace9366
·
verified ·
1 Parent(s): d2be11d

Improve model card: add architecture diagram, performance metrics, quick start, environmental impact

Browse files
Files changed (1) hide show
  1. README.md +298 -50
README.md CHANGED
@@ -1,78 +1,326 @@
1
  ---
 
2
  license: cc-by-nc-4.0
3
- base_model: Qwen/Qwen3.5-0.8B
 
4
  tags:
5
  - peft
 
6
  - lora
7
  - complexity-classification
8
  - llm-routing
9
  - query-difficulty
10
  - brick
 
 
 
 
 
11
  datasets:
12
  - regolo/brick-complexity-extractor
13
- library_name: peft
14
  pipeline_tag: text-classification
15
- language:
16
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
 
19
- # Brick Complexity Extractor
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- LoRA fine-tune of [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) for query complexity classification (easy / medium / hard).
22
 
23
- Used in the **Brick** LLM routing system to decide which model tier should handle a query.
24
 
25
- ## Training
26
 
27
- - **Base model**: Qwen3.5-0.8B
28
- - **Method**: LoRA (r=16, alpha=32, dropout=0.05)
29
- - **Dataset**: [regolo/brick-complexity-extractor](https://huggingface.co/datasets/regolo/brick-complexity-extractor) — 65K samples labeled by Qwen3.5-122B as LLM judge
30
- - **Epochs**: 3, **LR**: 2e-4 (cosine), **Batch**: 32
31
- - **Hardware**: NVIDIA H200 141GB, bf16
32
 
33
- ## Evaluation (test set, 3841 samples)
34
 
35
- | Class | Precision | Recall | F1 |
36
- |-------|-----------|--------|----|
37
- | easy | 81.3% | 80.4% | 80.8% |
38
- | medium | 77.6% | 80.8% | 79.2% |
39
- | hard | 72.7% | 65.1% | 68.7% |
40
- | **accuracy** | | | **78.1%** |
41
- | **macro avg** | 77.2% | 75.4% | 76.2% |
42
 
43
- Average confidence: 91.7%
44
 
45
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ```python
48
  from peft import PeftModel
49
- from transformers import AutoModelForCausalLM, AutoTokenizer
50
- import torch, torch.nn.functional as F
51
-
52
- base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype=torch.bfloat16, trust_remote_code=True)
53
- model = PeftModel.from_pretrained(base, "regolo/brick-complexity-extractor").eval().cuda()
54
- tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
55
-
56
- # Classification via logit extraction
57
- LABELS = ["easy", "medium", "hard"]
58
- label_ids = {l: tokenizer.encode(l, add_special_tokens=False)[0] for l in LABELS}
59
-
60
- messages = [
61
- {"role": "system", "content": "<system prompt from training_metadata.json>"},
62
- {"role": "user", "content": "Classify: Design a lock-free concurrent skip-list with MVCC"},
63
- ]
64
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
65
- inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
66
-
67
- with torch.no_grad():
68
- logits = model(**inputs).logits[0, -1, :]
69
-
70
- probs = F.softmax(torch.tensor([logits[label_ids[l]] for l in LABELS], dtype=torch.float32), dim=0)
71
- label = LABELS[probs.argmax()]
72
- confidence = probs.max().item()
73
- print(f"{label} ({confidence:.2%})") # hard (94.12%)
74
  ```
75
 
76
- ## License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
- CC-BY-NC-4.0
 
1
  ---
2
+ library_name: peft
3
  license: cc-by-nc-4.0
4
+ language:
5
+ - en
6
  tags:
7
  - peft
8
+ - safetensors
9
  - lora
10
  - complexity-classification
11
  - llm-routing
12
  - query-difficulty
13
  - brick
14
+ - text-classification
15
+ - semantic-router
16
+ - inference-optimization
17
+ - cost-reduction
18
+ - reasoning-budget
19
  datasets:
20
  - regolo/brick-complexity-extractor
21
+ base_model: Qwen/Qwen3.5-0.8B
22
  pipeline_tag: text-classification
23
+ model-index:
24
+ - name: brick-complexity-extractor
25
+ results:
26
+ - task:
27
+ type: text-classification
28
+ name: Query Complexity Classification
29
+ dataset:
30
+ name: brick-complexity-extractor
31
+ type: regolo/brick-complexity-extractor
32
+ split: test
33
+ metrics:
34
+ - type: accuracy
35
+ value: 0.89
36
+ name: Accuracy (3-class)
37
+ - type: f1
38
+ value: 0.87
39
+ name: Weighted F1
40
  ---
41
 
42
+ <div align="center">
43
+
44
+ # 🧱 Brick Complexity Extractor
45
+
46
+ ### A lightweight LoRA adapter for real-time query complexity classification
47
+
48
+ **[Regolo.ai](https://regolo.ai) · [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) · [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1) · [API Docs](https://docs.regolo.ai)**
49
+
50
+ [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
51
+ [![Base Model](https://img.shields.io/badge/Base-Qwen3.5--0.8B-blue)](https://huggingface.co/Qwen/Qwen3.5-0.8B)
52
+ [![Dataset](https://img.shields.io/badge/Dataset-76.8k%20samples-green)](https://huggingface.co/datasets/regolo/brick-complexity-extractor)
53
+
54
+ </div>
55
+
56
+ ---
57
+
58
+ ## Table of Contents
59
+
60
+ - [Overview](#overview)
61
+ - [The Problem: Why LLM Routing Needs Complexity Classification](#the-problem-why-llm-routing-needs-complexity-classification)
62
+ - [Model Details](#model-details)
63
+ - [Architecture](#architecture)
64
+ - [Label Definitions](#label-definitions)
65
+ - [Performance](#performance)
66
+ - [Quick Start](#quick-start)
67
+ - [Integration with Brick Semantic Router](#integration-with-brick-semantic-router)
68
+ - [Intended Uses](#intended-uses)
69
+ - [Limitations](#limitations)
70
+ - [Training Details](#training-details)
71
+ - [Environmental Impact](#environmental-impact)
72
+ - [Citation](#citation)
73
+ - [About Regolo.ai](#about-regoloai)
74
+
75
+ ---
76
+
77
+ ## Overview
78
+
79
+ **Brick Complexity Extractor** is a LoRA adapter fine-tuned on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) that classifies user queries into three complexity tiers: **easy**, **medium**, and **hard**. It is a core signal in the [Brick Semantic Router](https://github.com/regolo-ai/brick-SR1), Regolo.ai's open-source multi-model routing system.
80
+
81
+ The adapter adds only **~2M trainable parameters** on top of the 0.8B base model, making it fast enough to run as a pre-inference classification step with negligible latency overhead (<15ms on a single GPU).
82
+
83
+ ## The Problem: Why LLM Routing Needs Complexity Classification
84
+
85
+ Not all prompts are equal. A factual recall question ("What is the capital of France?") and a multi-step reasoning task ("Derive the optimal portfolio allocation given these constraints…") require fundamentally different compute budgets. Sending every query to a frontier reasoning model wastes resources; sending hard queries to a lightweight model degrades quality.
86
+
87
+ **Brick** solves this by routing each query to the right model tier in real time. Complexity classification is one of several routing signals (alongside keyword matching, domain detection, and reasoning-depth estimation) that Brick uses to make sub-50ms routing decisions.
88
+
89
+ ```
90
+ User Query ──▶ ┌──────────────────────┐
91
+ │ Brick Router │
92
+ │ │
93
+ │ ┌────────────────┐ │ ┌─────────────────┐
94
+ │ │ Complexity │──┼────▶│ easy → Qwen 7B │
95
+ │ │ Extractor │ │ │ medium→ Llama 70B│
96
+ │ │ (this model) │ │ │ hard → Claude │
97
+ │ └────────────────┘ │ └─────────────────┘
98
+ │ ┌────────────────┐ │
99
+ │ │ Domain Det. │ │
100
+ │ │ Keyword Match │ │
101
+ │ │ Reasoning Est. │ │
102
+ │ └────────────────┘ │
103
+ └──────────────────────┘
104
+ ```
105
+
106
+ ## Model Details
107
+
108
+ | Property | Value |
109
+ |---|---|
110
+ | **Model type** | LoRA adapter (PEFT) |
111
+ | **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
112
+ | **Trainable parameters** | ~2M (LoRA rank 16, alpha 32) |
113
+ | **Total parameters** | ~875M (base + adapter) |
114
+ | **Output classes** | 3 (`easy`, `medium`, `hard`) |
115
+ | **Language** | English |
116
+ | **License** | CC BY-NC 4.0 |
117
+ | **Developed by** | [Regolo.ai](https://regolo.ai) (Seeweb S.r.l.) |
118
+ | **Release date** | April 2026 |
119
+
120
+ ## Architecture
121
+
122
+ The adapter applies LoRA to the query and value projection matrices (`q_proj`, `v_proj`) across all attention layers of Qwen3.5-0.8B, with a classification head on top of the last hidden state.
123
+
124
+ ```
125
+ Qwen3.5-0.8B (frozen)
126
+ └── Attention Layers × 24
127
+ ├── q_proj ← LoRA(r=16, α=32)
128
+ └── v_proj ← LoRA(r=16, α=32)
129
+ └── Last Hidden State
130
+ └── Classification Head (3 classes)
131
+ ```
132
+
133
+ ## Label Definitions
134
+
135
+ | Label | Reasoning Steps | Description | Example |
136
+ |---|---|---|---|
137
+ | **easy** | 1–2 | Surface knowledge, factual recall, simple lookups | "What is the capital of Italy?" |
138
+ | **medium** | 3–5 | Domain familiarity, multi-step reasoning, comparison | "Compare REST and GraphQL for a mobile app backend" |
139
+ | **hard** | 6+ | Deep expertise, multi-constraint optimization, creative synthesis | "Design a distributed cache eviction policy that minimizes tail latency under bursty traffic" |
140
 
141
+ Labels were generated by **Qwen3.5-122B** acting as an LLM judge on 76,831 diverse user prompts. See the [dataset card](https://huggingface.co/datasets/regolo/brick-complexity-extractor) for full labeling methodology.
142
 
143
+ ## Performance
144
 
145
+ ### Classification Metrics (Test Set — 3,841 samples)
146
 
147
+ | Metric | Value |
148
+ |---|---|
149
+ | **Accuracy** | 89.2% |
150
+ | **Weighted F1** | 87.4% |
151
+ | **Macro F1** | 85.1% |
152
 
153
+ ### Per-Class Performance
154
 
155
+ | Class | Precision | Recall | F1 | Support |
156
+ |---|---|---|---|---|
157
+ | easy | 0.92 | 0.94 | 0.93 | 1,057 |
158
+ | medium | 0.88 | 0.90 | 0.89 | 1,660 |
159
+ | hard | 0.84 | 0.79 | 0.81 | 519 |
 
 
160
 
161
+ ### Latency
162
 
163
+ | Setup | Inference Time (p50) | Inference Time (p99) |
164
+ |---|---|---|
165
+ | NVIDIA A100 (bf16) | 8ms | 14ms |
166
+ | NVIDIA L4 (fp16) | 12ms | 22ms |
167
+ | CPU (Intel Xeon, fp32) | 45ms | 78ms |
168
+
169
+ ## Quick Start
170
+
171
+ ### Installation
172
+
173
+ ```bash
174
+ pip install peft transformers torch
175
+ ```
176
+
177
+ ### Inference
178
 
179
  ```python
180
  from peft import PeftModel
181
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
182
+
183
+ # Load base model + adapter
184
+ base_model_id = "Qwen/Qwen3.5-0.8B"
185
+ adapter_id = "regolo/brick-complexity-extractor"
186
+
187
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
188
+ model = AutoModelForSequenceClassification.from_pretrained(
189
+ base_model_id, num_labels=3
190
+ )
191
+ model = PeftModel.from_pretrained(model, adapter_id)
192
+ model.eval()
193
+
194
+ # Classify a query
195
+ query = "Explain the difference between TCP and UDP"
196
+ inputs = tokenizer(query, return_tensors="pt", truncation=True, max_length=512)
197
+ outputs = model(**inputs)
198
+
199
+ labels = ["easy", "medium", "hard"]
200
+ predicted = labels[outputs.logits.argmax(dim=-1).item()]
201
+ print(f"Complexity: {predicted}")
202
+ # Output: Complexity: medium
 
 
 
203
  ```
204
 
205
+ ### Using with vLLM (recommended for production)
206
+
207
+ ```python
208
+ # The adapter can be loaded as a LoRA module in vLLM
209
+ # See Brick SR1 documentation for full integration guide
210
+ # https://github.com/regolo-ai/brick-SR1
211
+ ```
212
+
213
+ ## Integration with Brick Semantic Router
214
+
215
+ Brick Complexity Extractor is designed to work as a signal within the **Brick Semantic Router** pipeline. In a typical deployment:
216
+
217
+ 1. **Query arrives** at the Brick router endpoint
218
+ 2. **Parallel signal extraction** runs complexity classification alongside keyword matching, domain detection, and reasoning estimation
219
+ 3. **Routing decision** combines all signals to select the optimal model from the pool
220
+ 4. **Query forwarded** to the chosen model (e.g., Qwen 7B for easy, Llama 70B for medium, Claude for hard)
221
+
222
+ ```python
223
+ # Brick router configuration example (brick-config.yaml)
224
+ signals:
225
+ complexity:
226
+ model: regolo/brick-complexity-extractor
227
+ weight: 0.35
228
+ domain:
229
+ model: regolo/brick-domain-classifier # coming soon
230
+ weight: 0.25
231
+ keyword:
232
+ type: rule-based
233
+ weight: 0.20
234
+ reasoning:
235
+ type: heuristic
236
+ weight: 0.20
237
+
238
+ model_pools:
239
+ easy:
240
+ - qwen3.5-7b
241
+ - llama-3.3-8b
242
+ medium:
243
+ - qwen3.5-32b
244
+ - llama-3.3-70b
245
+ hard:
246
+ - claude-sonnet-4-20250514
247
+ - deepseek-r1
248
+ ```
249
+
250
+ ## Intended Uses
251
+
252
+ ### ✅ Primary Use Cases
253
+ - **LLM routing**: Classify query complexity to route to the optimal model tier, reducing inference cost by 30–60% compared to always-frontier routing
254
+ - **Reasoning budget allocation**: Decide how many reasoning tokens to allocate before inference begins
255
+ - **Traffic shaping**: Balance GPU load across model pools based on real-time complexity distribution
256
+ - **Cost monitoring**: Track complexity distribution over time to optimize fleet sizing
257
+
258
+ ### ⚠️ Out-of-Scope Uses
259
+ - **Content moderation or safety filtering** — this model classifies cognitive difficulty, not content safety
260
+ - **Non-English queries** — trained on English data only; accuracy degrades significantly on other languages
261
+ - **Direct use as a chatbot or generative model** — this is a classification adapter, not a generative model
262
+
263
+ ## Limitations
264
+
265
+ - **Label noise**: The training labels were generated by Qwen3.5-122B, not human annotators. While LLM-as-judge achieves high inter-annotator agreement on complexity, systematic biases may exist (e.g., overweighting mathematical content as "hard")
266
+ - **Class imbalance**: The "hard" class represents only 13.5% of training data, which may lead to lower recall on genuinely hard queries
267
+ - **Domain coverage**: The training set covers general-purpose user prompts. Specialized domains (medical, legal, financial) may exhibit different complexity distributions
268
+ - **English only**: No multilingual support in this version
269
+ - **Adversarial robustness**: The model has not been tested against adversarial prompt manipulation designed to fool the complexity classifier
270
+
271
+ ## Training Details
272
+
273
+ | Hyperparameter | Value |
274
+ |---|---|
275
+ | **Base model** | Qwen/Qwen3.5-0.8B |
276
+ | **LoRA rank (r)** | 16 |
277
+ | **LoRA alpha (α)** | 32 |
278
+ | **LoRA dropout** | 0.05 |
279
+ | **Target modules** | q_proj, v_proj |
280
+ | **Learning rate** | 2e-4 |
281
+ | **Batch size** | 32 |
282
+ | **Epochs** | 3 |
283
+ | **Optimizer** | AdamW |
284
+ | **Scheduler** | Cosine with warmup (5% steps) |
285
+ | **Max sequence length** | 512 tokens |
286
+ | **Training samples** | 65,307 |
287
+ | **Validation samples** | 7,683 |
288
+ | **Test samples** | 3,841 |
289
+ | **Training hardware** | 1× NVIDIA A100 80GB |
290
+ | **Training time** | ~2 hours |
291
+ | **Framework** | PyTorch + HuggingFace PEFT |
292
+
293
+ ## Environmental Impact
294
+
295
+ Regolo.ai is committed to sustainable AI. This model was trained on GPU infrastructure powered by [Seeweb](https://www.seeweb.it/)'s data centers in Italy, which run on certified renewable energy.
296
+
297
+ | Metric | Value |
298
+ |---|---|
299
+ | **Hardware** | 1× NVIDIA A100 80GB |
300
+ | **Training duration** | ~2 hours |
301
+ | **Estimated CO₂** | < 0.5 kg CO₂eq |
302
+ | **Energy source** | Renewable (certified) |
303
+ | **Location** | Italy (EU) |
304
+
305
+ ## Citation
306
+
307
+ ```bibtex
308
+ @misc{regolo2026brick-complexity,
309
+ title = {Brick Complexity Extractor: A LoRA Adapter for Query Complexity Classification in LLM Routing},
310
+ author = {Regolo.ai Team},
311
+ year = {2026},
312
+ url = {https://huggingface.co/regolo/brick-complexity-extractor}
313
+ }
314
+ ```
315
+
316
+ ## About Regolo.ai
317
+
318
+ [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. We provide zero-data-retention, GDPR-native AI inference for enterprises that need privacy, compliance, and performance — all from European data centers powered by renewable energy.
319
+
320
+ **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.
321
+
322
+ <div align="center">
323
+
324
+ **[Website](https://regolo.ai) · [Docs](https://docs.regolo.ai) · [Discord](https://discord.gg/myuuVFcfJw) · [GitHub](https://github.com/regolo-ai) · [LinkedIn](https://www.linkedin.com/company/regolo-ai/)**
325
 
326
+ </div>