README.md · regolo/brick-complexity-extractor at main

File size: 12,853 Bytes

---
library_name: peft
license: cc-by-nc-4.0
language:
  - en
tags:
  - peft
  - safetensors
  - lora
  - complexity-classification
  - llm-routing
  - query-difficulty
  - brick
  - text-classification
  - semantic-router
  - inference-optimization
  - cost-reduction
  - reasoning-budget
datasets:
  - regolo/brick-complexity-extractor
base_model: Qwen/Qwen3.5-0.8B
pipeline_tag: text-classification
model-index:
  - name: brick-complexity-extractor
    results:
      - task:
          type: text-classification
          name: Query Complexity Classification
        dataset:
          name: brick-complexity-extractor
          type: regolo/brick-complexity-extractor
          split: test
        metrics:
          - type: accuracy
            value: 0.89
            name: Accuracy (3-class)
          - type: f1
            value: 0.87
            name: Weighted F1
---

<div align="center">

# 🧱 Brick Complexity Extractor

### A lightweight LoRA adapter for real-time query complexity classification

**[Regolo.ai](https://regolo.ai) · [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) · [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1) · [API Docs](https://docs.regolo.ai)**

[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
[![Base Model](https://img.shields.io/badge/Base-Qwen3.5--0.8B-blue)](https://huggingface.co/Qwen/Qwen3.5-0.8B)
[![Dataset](https://img.shields.io/badge/Dataset-76.8k%20samples-green)](https://huggingface.co/datasets/regolo/brick-complexity-extractor)

</div>

---

## Table of Contents

- [Overview](#overview)
- [The Problem: Why LLM Routing Needs Complexity Classification](#the-problem-why-llm-routing-needs-complexity-classification)
- [Model Details](#model-details)
- [Architecture](#architecture)
- [Label Definitions](#label-definitions)
- [Performance](#performance)
- [Quick Start](#quick-start)
- [GGUF Quantized Models](#gguf-quantized-models)
- [Integration with Brick Semantic Router](#integration-with-brick-semantic-router)
- [Intended Uses](#intended-uses)
- [Limitations](#limitations)
- [Training Details](#training-details)
- [Environmental Impact](#environmental-impact)
- [Citation](#citation)
- [About Regolo.ai](#about-regoloai)

---

## Overview

**Brick Complexity Extractor** is a LoRA adapter fine-tuned on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) that classifies user queries into three complexity tiers: **easy**, **medium**, and **hard**. It is a core signal in the [Brick Semantic Router](https://github.com/regolo-ai/brick-SR1), Regolo.ai's open-source multi-model routing system.

The adapter adds only **~2M trainable parameters** on top of the 0.8B base model, making it fast enough to run as a pre-inference classification step with negligible latency overhead (<15ms on a single GPU).

## The Problem: Why LLM Routing Needs Complexity Classification

Not all prompts are equal. A factual recall question ("What is the capital of France?") and a multi-step reasoning task ("Derive the optimal portfolio allocation given these constraints…") require fundamentally different compute budgets. Sending every query to a frontier reasoning model wastes resources; sending hard queries to a lightweight model degrades quality.

**Brick** solves this by routing each query to the right model tier in real time. Complexity classification is one of several routing signals (alongside keyword matching, domain detection, and reasoning-depth estimation) that Brick uses to make sub-50ms routing decisions.


<img src="https://cdn-uploads.huggingface.co/production/uploads/66e9a629df006ca4588b82bd/ZoRBcn8rD8sTEHdkiczOO.png" alt="brick_router" width="800">


## Model Details

| Property | Value |
|---|---|
| **Model type** | LoRA adapter (PEFT) |
| **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
| **Trainable parameters** | ~2M (LoRA rank 16, alpha 32) |
| **Total parameters** | ~875M (base + adapter) |
| **Output classes** | 3 (`easy`, `medium`, `hard`) |
| **Language** | English |
| **License** | CC BY-NC 4.0 |
| **Developed by** | [Regolo.ai](https://regolo.ai) (Seeweb S.r.l.) |
| **Release date** | April 2026 |

## Architecture

The adapter applies LoRA to the query and value projection matrices (`q_proj`, `v_proj`) across all attention layers of Qwen3.5-0.8B, with a classification head on top of the last hidden state.

```
Qwen3.5-0.8B (frozen)
    └── Attention Layers × 24
         ├── q_proj ← LoRA(r=16, α=32)
         └── v_proj ← LoRA(r=16, α=32)
    └── Last Hidden State
         └── Classification Head (3 classes)
```

## Label Definitions

| Label | Reasoning Steps | Description | Example |
|---|---|---|---|
| **easy** | 1–2 | Surface knowledge, factual recall, simple lookups | "What is the capital of Italy?" |
| **medium** | 3–5 | Domain familiarity, multi-step reasoning, comparison | "Compare REST and GraphQL for a mobile app backend" |
| **hard** | 6+ | Deep expertise, multi-constraint optimization, creative synthesis | "Design a distributed cache eviction policy that minimizes tail latency under bursty traffic" |

Labels were generated by **Qwen3.5-122B** acting as an LLM judge on 76,831 diverse user prompts. See the [dataset card](https://huggingface.co/datasets/regolo/brick-complexity-extractor) for full labeling methodology.

## Performance

### Classification Metrics (Test Set — 3,841 samples)

| Metric | Value |
|---|---|
| **Accuracy** | 89.2% |
| **Weighted F1** | 87.4% |
| **Macro F1** | 85.1% |

### Per-Class Performance

| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| easy | 0.92 | 0.94 | 0.93 | 1,057 |
| medium | 0.88 | 0.90 | 0.89 | 1,660 |
| hard | 0.84 | 0.79 | 0.81 | 519 |

### Latency

| Setup | Inference Time (p50) | Inference Time (p99) |
|---|---|---|
| NVIDIA A100 (bf16) | 8ms | 14ms |
| NVIDIA L4 (fp16) | 12ms | 22ms |
| CPU (Intel Xeon, fp32) | 45ms | 78ms |

## Quick Start

### Installation

```bash
pip install peft transformers torch
```

### Inference

```python
from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load base model + adapter
base_model_id = "Qwen/Qwen3.5-0.8B"
adapter_id = "regolo/brick-complexity-extractor"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id, num_labels=3
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

# Classify a query
query = "Explain the difference between TCP and UDP"
inputs = tokenizer(query, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)

labels = ["easy", "medium", "hard"]
predicted = labels[outputs.logits.argmax(dim=-1).item()]
print(f"Complexity: {predicted}")
# Output: Complexity: medium
```

### Using with vLLM (recommended for production)

```python
# The adapter can be loaded as a LoRA module in vLLM
# See Brick SR1 documentation for full integration guide
# https://github.com/regolo-ai/brick-SR1
```

## GGUF Quantized Models

Pre-built GGUF files are available for inference with llama.cpp, Ollama, LM Studio, vLLM, and other GGUF-compatible runtimes. Each quantization is published as a separate model:

| Model | Quant | Size | BPW | Notes |
|---|---|---|---|---|
| [brick-complexity-extractor-BF16-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-BF16-GGUF) | BF16 | 1.5 GB | 16.0 | Full precision |
| [brick-complexity-extractor-Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q8_0-GGUF) | Q8_0 | 775 MB | 8.0 | Recommended |
| [brick-complexity-extractor-Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q4_K_M-GGUF) | Q4_K_M | 494 MB | 5.5 | Best size/quality ratio |

See the [brick-complexity-extractor collection](https://huggingface.co/collections/regolo/brick-complexity-extractor-69dcc2dec2fe3b54a70b3415) for all available formats.

## Integration with Brick Semantic Router

Brick Complexity Extractor is designed to work as a signal within the **Brick Semantic Router** pipeline. In a typical deployment:

1. **Query arrives** at the Brick router endpoint
2. **Parallel signal extraction** runs complexity classification alongside keyword matching, domain detection, and reasoning estimation
3. **Routing decision** combines all signals to select the optimal model from the pool
4. **Query forwarded** to the chosen model (e.g., Qwen 7B for easy, Llama 70B for medium, Claude for hard)

```python
# Brick router configuration example (brick-config.yaml)
signals:
  complexity:
    model: regolo/brick-complexity-extractor
    weight: 0.35
  domain:
    model: regolo/brick-domain-classifier  # coming soon
    weight: 0.25
  keyword:
    type: rule-based
    weight: 0.20
  reasoning:
    type: heuristic
    weight: 0.20

model_pools:
  easy:
    - qwen3.5-7b
    - llama-3.3-8b
  medium:
    - qwen3.5-32b
    - llama-3.3-70b
  hard:
    - claude-sonnet-4-20250514
    - deepseek-r1
```

## Intended Uses

### ✅ Primary Use Cases
- **LLM routing**: Classify query complexity to route to the optimal model tier, reducing inference cost by 30–60% compared to always-frontier routing
- **Reasoning budget allocation**: Decide how many reasoning tokens to allocate before inference begins
- **Traffic shaping**: Balance GPU load across model pools based on real-time complexity distribution
- **Cost monitoring**: Track complexity distribution over time to optimize fleet sizing

### ⚠️ Out-of-Scope Uses
- **Content moderation or safety filtering** — this model classifies cognitive difficulty, not content safety
- **Non-English queries** trained on English data only; accuracy degrades significantly on other languages
- **Direct use as a chatbot or generative model** this is a classification adapter, not a generative model

## Limitations

- **Label noise**: The training labels were generated by Qwen3.5-122B, not human annotators. While LLM-as-judge achieves high inter-annotator agreement on complexity, systematic biases may exist (e.g., overweighting mathematical content as "hard")
- **Class imbalance**: The "hard" class represents only 13.5% of training data, which may lead to lower recall on genuinely hard queries
- **Domain coverage**: The training set covers general-purpose user prompts. Specialized domains (medical, legal, financial) may exhibit different complexity distributions
- **English only**: No multilingual support in this version
- **Adversarial robustness**: The model has not been tested against adversarial prompt manipulation designed to fool the complexity classifier

## Training Details

| Hyperparameter | Value |
|---|---|
| **Base model** | Qwen/Qwen3.5-0.8B |
| **LoRA rank (r)** | 16 |
| **LoRA alpha (α)** | 32 |
| **LoRA dropout** | 0.05 |
| **Target modules** | q_proj, v_proj |
| **Learning rate** | 2e-4 |
| **Batch size** | 32 |
| **Epochs** | 3 |
| **Optimizer** | AdamW |
| **Scheduler** | Cosine with warmup (5% steps) |
| **Max sequence length** | 512 tokens |
| **Training samples** | 65,307 |
| **Validation samples** | 7,683 |
| **Test samples** | 3,841 |
| **Training hardware** | 1× NVIDIA A100 80GB |
| **Training time** | ~2 hours |
| **Framework** | PyTorch + HuggingFace PEFT |

## Environmental Impact

Regolo.ai is committed to sustainable AI. This model was trained on GPU infrastructure powered by [Seeweb](https://www.seeweb.it/)'s data centers in Italy, which run on certified renewable energy.

| Metric | Value |
|---|---|
| **Hardware** | 1× NVIDIA A100 80GB |
| **Training duration** | ~2 hours |
| **Estimated CO₂** | < 0.5 kg CO₂eq |
| **Energy source** | Renewable (certified) |
| **Location** | Italy (EU) |

## Citation

```bibtex
@misc{regolo2026brick-complexity,
  title  = {Brick Complexity Extractor: A LoRA Adapter for Query Complexity Classification in LLM Routing},
  author = {Regolo.ai Team},
  year   = {2026},
  url    = {https://huggingface.co/regolo/brick-complexity-extractor}
}
```

## About Regolo.ai

[Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. We provide zero-data-retention, GDPR-native AI inference for enterprises that need privacy, compliance, and performance all from European data centers powered by renewable energy.

**Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.

<div align="center">

**[Website](https://regolo.ai) · [Docs](https://docs.regolo.ai) · [Discord](https://discord.gg/myuuVFcfJw) · [GitHub](https://github.com/regolo-ai) · [LinkedIn](https://www.linkedin.com/company/regolo-ai/)**

</div>