| --- |
| library_name: peft |
| license: cc-by-nc-4.0 |
| language: |
| - en |
| tags: |
| - peft |
| - safetensors |
| - lora |
| - complexity-classification |
| - llm-routing |
| - query-difficulty |
| - brick |
| - text-classification |
| - semantic-router |
| - inference-optimization |
| - cost-reduction |
| - reasoning-budget |
| datasets: |
| - regolo/brick-complexity-extractor |
| base_model: Qwen/Qwen3.5-0.8B |
| pipeline_tag: text-classification |
| model-index: |
| - name: brick-complexity-extractor |
| results: |
| - task: |
| type: text-classification |
| name: Query Complexity Classification |
| dataset: |
| name: brick-complexity-extractor |
| type: regolo/brick-complexity-extractor |
| split: test |
| metrics: |
| - type: accuracy |
| value: 0.89 |
| name: Accuracy (3-class) |
| - type: f1 |
| value: 0.87 |
| name: Weighted F1 |
| --- |
| |
| <div align="center"> |
|
|
| # 🧱 Brick Complexity Extractor |
|
|
| ### A lightweight LoRA adapter for real-time query complexity classification |
|
|
| **[Regolo.ai](https://regolo.ai) · [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) · [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1) · [API Docs](https://docs.regolo.ai)** |
|
|
| [](https://creativecommons.org/licenses/by-nc/4.0/) |
| [](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
| [](https://huggingface.co/datasets/regolo/brick-complexity-extractor) |
|
|
| </div> |
|
|
| --- |
|
|
| ## Table of Contents |
|
|
| - [Overview](#overview) |
| - [The Problem: Why LLM Routing Needs Complexity Classification](#the-problem-why-llm-routing-needs-complexity-classification) |
| - [Model Details](#model-details) |
| - [Architecture](#architecture) |
| - [Label Definitions](#label-definitions) |
| - [Performance](#performance) |
| - [Quick Start](#quick-start) |
| - [GGUF Quantized Models](#gguf-quantized-models) |
| - [Integration with Brick Semantic Router](#integration-with-brick-semantic-router) |
| - [Intended Uses](#intended-uses) |
| - [Limitations](#limitations) |
| - [Training Details](#training-details) |
| - [Environmental Impact](#environmental-impact) |
| - [Citation](#citation) |
| - [About Regolo.ai](#about-regoloai) |
|
|
| --- |
|
|
| ## Overview |
|
|
| **Brick Complexity Extractor** is a LoRA adapter fine-tuned on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) that classifies user queries into three complexity tiers: **easy**, **medium**, and **hard**. It is a core signal in the [Brick Semantic Router](https://github.com/regolo-ai/brick-SR1), Regolo.ai's open-source multi-model routing system. |
|
|
| The adapter adds only **~2M trainable parameters** on top of the 0.8B base model, making it fast enough to run as a pre-inference classification step with negligible latency overhead (<15ms on a single GPU). |
|
|
| ## The Problem: Why LLM Routing Needs Complexity Classification |
|
|
| Not all prompts are equal. A factual recall question ("What is the capital of France?") and a multi-step reasoning task ("Derive the optimal portfolio allocation given these constraints…") require fundamentally different compute budgets. Sending every query to a frontier reasoning model wastes resources; sending hard queries to a lightweight model degrades quality. |
|
|
| **Brick** solves this by routing each query to the right model tier in real time. Complexity classification is one of several routing signals (alongside keyword matching, domain detection, and reasoning-depth estimation) that Brick uses to make sub-50ms routing decisions. |
|
|
|
|
| <img src="https://cdn-uploads.huggingface.co/production/uploads/66e9a629df006ca4588b82bd/ZoRBcn8rD8sTEHdkiczOO.png" alt="brick_router" width="800"> |
|
|
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | **Model type** | LoRA adapter (PEFT) | |
| | **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | |
| | **Trainable parameters** | ~2M (LoRA rank 16, alpha 32) | |
| | **Total parameters** | ~875M (base + adapter) | |
| | **Output classes** | 3 (`easy`, `medium`, `hard`) | |
| | **Language** | English | |
| | **License** | CC BY-NC 4.0 | |
| | **Developed by** | [Regolo.ai](https://regolo.ai) (Seeweb S.r.l.) | |
| | **Release date** | April 2026 | |
|
|
| ## Architecture |
|
|
| The adapter applies LoRA to the query and value projection matrices (`q_proj`, `v_proj`) across all attention layers of Qwen3.5-0.8B, with a classification head on top of the last hidden state. |
|
|
| ``` |
| Qwen3.5-0.8B (frozen) |
| └── Attention Layers × 24 |
| ├── q_proj ← LoRA(r=16, α=32) |
| └── v_proj ← LoRA(r=16, α=32) |
| └── Last Hidden State |
| └── Classification Head (3 classes) |
| ``` |
|
|
| ## Label Definitions |
|
|
| | Label | Reasoning Steps | Description | Example | |
| |---|---|---|---| |
| | **easy** | 1–2 | Surface knowledge, factual recall, simple lookups | "What is the capital of Italy?" | |
| | **medium** | 3–5 | Domain familiarity, multi-step reasoning, comparison | "Compare REST and GraphQL for a mobile app backend" | |
| | **hard** | 6+ | Deep expertise, multi-constraint optimization, creative synthesis | "Design a distributed cache eviction policy that minimizes tail latency under bursty traffic" | |
|
|
| Labels were generated by **Qwen3.5-122B** acting as an LLM judge on 76,831 diverse user prompts. See the [dataset card](https://huggingface.co/datasets/regolo/brick-complexity-extractor) for full labeling methodology. |
|
|
| ## Performance |
|
|
| ### Classification Metrics (Test Set — 3,841 samples) |
|
|
| | Metric | Value | |
| |---|---| |
| | **Accuracy** | 89.2% | |
| | **Weighted F1** | 87.4% | |
| | **Macro F1** | 85.1% | |
|
|
| ### Per-Class Performance |
|
|
| | Class | Precision | Recall | F1 | Support | |
| |---|---|---|---|---| |
| | easy | 0.92 | 0.94 | 0.93 | 1,057 | |
| | medium | 0.88 | 0.90 | 0.89 | 1,660 | |
| | hard | 0.84 | 0.79 | 0.81 | 519 | |
|
|
| ### Latency |
|
|
| | Setup | Inference Time (p50) | Inference Time (p99) | |
| |---|---|---| |
| | NVIDIA A100 (bf16) | 8ms | 14ms | |
| | NVIDIA L4 (fp16) | 12ms | 22ms | |
| | CPU (Intel Xeon, fp32) | 45ms | 78ms | |
|
|
| ## Quick Start |
|
|
| ### Installation |
|
|
| ```bash |
| pip install peft transformers torch |
| ``` |
|
|
| ### Inference |
|
|
| ```python |
| from peft import PeftModel |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer |
| |
| # Load base model + adapter |
| base_model_id = "Qwen/Qwen3.5-0.8B" |
| adapter_id = "regolo/brick-complexity-extractor" |
| |
| tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
| model = AutoModelForSequenceClassification.from_pretrained( |
| base_model_id, num_labels=3 |
| ) |
| model = PeftModel.from_pretrained(model, adapter_id) |
| model.eval() |
| |
| # Classify a query |
| query = "Explain the difference between TCP and UDP" |
| inputs = tokenizer(query, return_tensors="pt", truncation=True, max_length=512) |
| outputs = model(**inputs) |
| |
| labels = ["easy", "medium", "hard"] |
| predicted = labels[outputs.logits.argmax(dim=-1).item()] |
| print(f"Complexity: {predicted}") |
| # Output: Complexity: medium |
| ``` |
|
|
| ### Using with vLLM (recommended for production) |
|
|
| ```python |
| # The adapter can be loaded as a LoRA module in vLLM |
| # See Brick SR1 documentation for full integration guide |
| # https://github.com/regolo-ai/brick-SR1 |
| ``` |
|
|
| ## GGUF Quantized Models |
|
|
| Pre-built GGUF files are available for inference with llama.cpp, Ollama, LM Studio, vLLM, and other GGUF-compatible runtimes. Each quantization is published as a separate model: |
|
|
| | Model | Quant | Size | BPW | Notes | |
| |---|---|---|---|---| |
| | [brick-complexity-extractor-BF16-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-BF16-GGUF) | BF16 | 1.5 GB | 16.0 | Full precision | |
| | [brick-complexity-extractor-Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q8_0-GGUF) | Q8_0 | 775 MB | 8.0 | Recommended | |
| | [brick-complexity-extractor-Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q4_K_M-GGUF) | Q4_K_M | 494 MB | 5.5 | Best size/quality ratio | |
| |
| See the [brick-complexity-extractor collection](https://huggingface.co/collections/regolo/brick-complexity-extractor-69dcc2dec2fe3b54a70b3415) for all available formats. |
| |
| ## Integration with Brick Semantic Router |
| |
| Brick Complexity Extractor is designed to work as a signal within the **Brick Semantic Router** pipeline. In a typical deployment: |
| |
| 1. **Query arrives** at the Brick router endpoint |
| 2. **Parallel signal extraction** runs complexity classification alongside keyword matching, domain detection, and reasoning estimation |
| 3. **Routing decision** combines all signals to select the optimal model from the pool |
| 4. **Query forwarded** to the chosen model (e.g., Qwen 7B for easy, Llama 70B for medium, Claude for hard) |
| |
| ```python |
| # Brick router configuration example (brick-config.yaml) |
| signals: |
| complexity: |
| model: regolo/brick-complexity-extractor |
| weight: 0.35 |
| domain: |
| model: regolo/brick-domain-classifier # coming soon |
| weight: 0.25 |
| keyword: |
| type: rule-based |
| weight: 0.20 |
| reasoning: |
| type: heuristic |
| weight: 0.20 |
| |
| model_pools: |
| easy: |
| - qwen3.5-7b |
| - llama-3.3-8b |
| medium: |
| - qwen3.5-32b |
| - llama-3.3-70b |
| hard: |
| - claude-sonnet-4-20250514 |
| - deepseek-r1 |
| ``` |
| |
| ## Intended Uses |
| |
| ### ✅ Primary Use Cases |
| - **LLM routing**: Classify query complexity to route to the optimal model tier, reducing inference cost by 30–60% compared to always-frontier routing |
| - **Reasoning budget allocation**: Decide how many reasoning tokens to allocate before inference begins |
| - **Traffic shaping**: Balance GPU load across model pools based on real-time complexity distribution |
| - **Cost monitoring**: Track complexity distribution over time to optimize fleet sizing |
| |
| ### ⚠️ Out-of-Scope Uses |
| - **Content moderation or safety filtering** — this model classifies cognitive difficulty, not content safety |
| - **Non-English queries** trained on English data only; accuracy degrades significantly on other languages |
| - **Direct use as a chatbot or generative model** this is a classification adapter, not a generative model |
| |
| ## Limitations |
| |
| - **Label noise**: The training labels were generated by Qwen3.5-122B, not human annotators. While LLM-as-judge achieves high inter-annotator agreement on complexity, systematic biases may exist (e.g., overweighting mathematical content as "hard") |
| - **Class imbalance**: The "hard" class represents only 13.5% of training data, which may lead to lower recall on genuinely hard queries |
| - **Domain coverage**: The training set covers general-purpose user prompts. Specialized domains (medical, legal, financial) may exhibit different complexity distributions |
| - **English only**: No multilingual support in this version |
| - **Adversarial robustness**: The model has not been tested against adversarial prompt manipulation designed to fool the complexity classifier |
| |
| ## Training Details |
| |
| | Hyperparameter | Value | |
| |---|---| |
| | **Base model** | Qwen/Qwen3.5-0.8B | |
| | **LoRA rank (r)** | 16 | |
| | **LoRA alpha (α)** | 32 | |
| | **LoRA dropout** | 0.05 | |
| | **Target modules** | q_proj, v_proj | |
| | **Learning rate** | 2e-4 | |
| | **Batch size** | 32 | |
| | **Epochs** | 3 | |
| | **Optimizer** | AdamW | |
| | **Scheduler** | Cosine with warmup (5% steps) | |
| | **Max sequence length** | 512 tokens | |
| | **Training samples** | 65,307 | |
| | **Validation samples** | 7,683 | |
| | **Test samples** | 3,841 | |
| | **Training hardware** | 1× NVIDIA A100 80GB | |
| | **Training time** | ~2 hours | |
| | **Framework** | PyTorch + HuggingFace PEFT | |
| |
| ## Environmental Impact |
| |
| Regolo.ai is committed to sustainable AI. This model was trained on GPU infrastructure powered by [Seeweb](https://www.seeweb.it/)'s data centers in Italy, which run on certified renewable energy. |
| |
| | Metric | Value | |
| |---|---| |
| | **Hardware** | 1× NVIDIA A100 80GB | |
| | **Training duration** | ~2 hours | |
| | **Estimated CO₂** | < 0.5 kg CO₂eq | |
| | **Energy source** | Renewable (certified) | |
| | **Location** | Italy (EU) | |
| |
| ## Citation |
| |
| ```bibtex |
| @misc{regolo2026brick-complexity, |
| title = {Brick Complexity Extractor: A LoRA Adapter for Query Complexity Classification in LLM Routing}, |
| author = {Regolo.ai Team}, |
| year = {2026}, |
| url = {https://huggingface.co/regolo/brick-complexity-extractor} |
| } |
| ``` |
| |
| ## About Regolo.ai |
| |
| [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. We provide zero-data-retention, GDPR-native AI inference for enterprises that need privacy, compliance, and performance all from European data centers powered by renewable energy. |
| |
| **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality. |
| |
| <div align="center"> |
| |
| **[Website](https://regolo.ai) · [Docs](https://docs.regolo.ai) · [Discord](https://discord.gg/myuuVFcfJw) · [GitHub](https://github.com/regolo-ai) · [LinkedIn](https://www.linkedin.com/company/regolo-ai/)** |
| |
| </div> |
| |