--- library_name: peft license: cc-by-nc-4.0 language: - en tags: - peft - safetensors - lora - complexity-classification - llm-routing - query-difficulty - brick - text-classification - semantic-router - inference-optimization - cost-reduction - reasoning-budget datasets: - regolo/brick-complexity-extractor base_model: Qwen/Qwen3.5-0.8B pipeline_tag: text-classification model-index: - name: brick-complexity-extractor results: - task: type: text-classification name: Query Complexity Classification dataset: name: brick-complexity-extractor type: regolo/brick-complexity-extractor split: test metrics: - type: accuracy value: 0.89 name: Accuracy (3-class) - type: f1 value: 0.87 name: Weighted F1 ---
# 🧱 Brick Complexity Extractor ### A lightweight LoRA adapter for real-time query complexity classification **[Regolo.ai](https://regolo.ai) · [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) · [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1) · [API Docs](https://docs.regolo.ai)** [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) [![Base Model](https://img.shields.io/badge/Base-Qwen3.5--0.8B-blue)](https://huggingface.co/Qwen/Qwen3.5-0.8B) [![Dataset](https://img.shields.io/badge/Dataset-76.8k%20samples-green)](https://huggingface.co/datasets/regolo/brick-complexity-extractor)
--- ## Table of Contents - [Overview](#overview) - [The Problem: Why LLM Routing Needs Complexity Classification](#the-problem-why-llm-routing-needs-complexity-classification) - [Model Details](#model-details) - [Architecture](#architecture) - [Label Definitions](#label-definitions) - [Performance](#performance) - [Quick Start](#quick-start) - [GGUF Quantized Models](#gguf-quantized-models) - [Integration with Brick Semantic Router](#integration-with-brick-semantic-router) - [Intended Uses](#intended-uses) - [Limitations](#limitations) - [Training Details](#training-details) - [Environmental Impact](#environmental-impact) - [Citation](#citation) - [About Regolo.ai](#about-regoloai) --- ## Overview **Brick Complexity Extractor** is a LoRA adapter fine-tuned on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) that classifies user queries into three complexity tiers: **easy**, **medium**, and **hard**. It is a core signal in the [Brick Semantic Router](https://github.com/regolo-ai/brick-SR1), Regolo.ai's open-source multi-model routing system. The adapter adds only **~2M trainable parameters** on top of the 0.8B base model, making it fast enough to run as a pre-inference classification step with negligible latency overhead (<15ms on a single GPU). ## The Problem: Why LLM Routing Needs Complexity Classification Not all prompts are equal. A factual recall question ("What is the capital of France?") and a multi-step reasoning task ("Derive the optimal portfolio allocation given these constraints…") require fundamentally different compute budgets. Sending every query to a frontier reasoning model wastes resources; sending hard queries to a lightweight model degrades quality. **Brick** solves this by routing each query to the right model tier in real time. Complexity classification is one of several routing signals (alongside keyword matching, domain detection, and reasoning-depth estimation) that Brick uses to make sub-50ms routing decisions. brick_router ## Model Details | Property | Value | |---|---| | **Model type** | LoRA adapter (PEFT) | | **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | | **Trainable parameters** | ~2M (LoRA rank 16, alpha 32) | | **Total parameters** | ~875M (base + adapter) | | **Output classes** | 3 (`easy`, `medium`, `hard`) | | **Language** | English | | **License** | CC BY-NC 4.0 | | **Developed by** | [Regolo.ai](https://regolo.ai) (Seeweb S.r.l.) | | **Release date** | April 2026 | ## Architecture The adapter applies LoRA to the query and value projection matrices (`q_proj`, `v_proj`) across all attention layers of Qwen3.5-0.8B, with a classification head on top of the last hidden state. ``` Qwen3.5-0.8B (frozen) └── Attention Layers × 24 ├── q_proj ← LoRA(r=16, α=32) └── v_proj ← LoRA(r=16, α=32) └── Last Hidden State └── Classification Head (3 classes) ``` ## Label Definitions | Label | Reasoning Steps | Description | Example | |---|---|---|---| | **easy** | 1–2 | Surface knowledge, factual recall, simple lookups | "What is the capital of Italy?" | | **medium** | 3–5 | Domain familiarity, multi-step reasoning, comparison | "Compare REST and GraphQL for a mobile app backend" | | **hard** | 6+ | Deep expertise, multi-constraint optimization, creative synthesis | "Design a distributed cache eviction policy that minimizes tail latency under bursty traffic" | Labels were generated by **Qwen3.5-122B** acting as an LLM judge on 76,831 diverse user prompts. See the [dataset card](https://huggingface.co/datasets/regolo/brick-complexity-extractor) for full labeling methodology. ## Performance ### Classification Metrics (Test Set — 3,841 samples) | Metric | Value | |---|---| | **Accuracy** | 89.2% | | **Weighted F1** | 87.4% | | **Macro F1** | 85.1% | ### Per-Class Performance | Class | Precision | Recall | F1 | Support | |---|---|---|---|---| | easy | 0.92 | 0.94 | 0.93 | 1,057 | | medium | 0.88 | 0.90 | 0.89 | 1,660 | | hard | 0.84 | 0.79 | 0.81 | 519 | ### Latency | Setup | Inference Time (p50) | Inference Time (p99) | |---|---|---| | NVIDIA A100 (bf16) | 8ms | 14ms | | NVIDIA L4 (fp16) | 12ms | 22ms | | CPU (Intel Xeon, fp32) | 45ms | 78ms | ## Quick Start ### Installation ```bash pip install peft transformers torch ``` ### Inference ```python from peft import PeftModel from transformers import AutoModelForSequenceClassification, AutoTokenizer # Load base model + adapter base_model_id = "Qwen/Qwen3.5-0.8B" adapter_id = "regolo/brick-complexity-extractor" tokenizer = AutoTokenizer.from_pretrained(base_model_id) model = AutoModelForSequenceClassification.from_pretrained( base_model_id, num_labels=3 ) model = PeftModel.from_pretrained(model, adapter_id) model.eval() # Classify a query query = "Explain the difference between TCP and UDP" inputs = tokenizer(query, return_tensors="pt", truncation=True, max_length=512) outputs = model(**inputs) labels = ["easy", "medium", "hard"] predicted = labels[outputs.logits.argmax(dim=-1).item()] print(f"Complexity: {predicted}") # Output: Complexity: medium ``` ### Using with vLLM (recommended for production) ```python # The adapter can be loaded as a LoRA module in vLLM # See Brick SR1 documentation for full integration guide # https://github.com/regolo-ai/brick-SR1 ``` ## GGUF Quantized Models Pre-built GGUF files are available for inference with llama.cpp, Ollama, LM Studio, vLLM, and other GGUF-compatible runtimes. Each quantization is published as a separate model: | Model | Quant | Size | BPW | Notes | |---|---|---|---|---| | [brick-complexity-extractor-BF16-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-BF16-GGUF) | BF16 | 1.5 GB | 16.0 | Full precision | | [brick-complexity-extractor-Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q8_0-GGUF) | Q8_0 | 775 MB | 8.0 | Recommended | | [brick-complexity-extractor-Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q4_K_M-GGUF) | Q4_K_M | 494 MB | 5.5 | Best size/quality ratio | See the [brick-complexity-extractor collection](https://huggingface.co/collections/regolo/brick-complexity-extractor-69dcc2dec2fe3b54a70b3415) for all available formats. ## Integration with Brick Semantic Router Brick Complexity Extractor is designed to work as a signal within the **Brick Semantic Router** pipeline. In a typical deployment: 1. **Query arrives** at the Brick router endpoint 2. **Parallel signal extraction** runs complexity classification alongside keyword matching, domain detection, and reasoning estimation 3. **Routing decision** combines all signals to select the optimal model from the pool 4. **Query forwarded** to the chosen model (e.g., Qwen 7B for easy, Llama 70B for medium, Claude for hard) ```python # Brick router configuration example (brick-config.yaml) signals: complexity: model: regolo/brick-complexity-extractor weight: 0.35 domain: model: regolo/brick-domain-classifier # coming soon weight: 0.25 keyword: type: rule-based weight: 0.20 reasoning: type: heuristic weight: 0.20 model_pools: easy: - qwen3.5-7b - llama-3.3-8b medium: - qwen3.5-32b - llama-3.3-70b hard: - claude-sonnet-4-20250514 - deepseek-r1 ``` ## Intended Uses ### ✅ Primary Use Cases - **LLM routing**: Classify query complexity to route to the optimal model tier, reducing inference cost by 30–60% compared to always-frontier routing - **Reasoning budget allocation**: Decide how many reasoning tokens to allocate before inference begins - **Traffic shaping**: Balance GPU load across model pools based on real-time complexity distribution - **Cost monitoring**: Track complexity distribution over time to optimize fleet sizing ### ⚠️ Out-of-Scope Uses - **Content moderation or safety filtering** — this model classifies cognitive difficulty, not content safety - **Non-English queries** trained on English data only; accuracy degrades significantly on other languages - **Direct use as a chatbot or generative model** this is a classification adapter, not a generative model ## Limitations - **Label noise**: The training labels were generated by Qwen3.5-122B, not human annotators. While LLM-as-judge achieves high inter-annotator agreement on complexity, systematic biases may exist (e.g., overweighting mathematical content as "hard") - **Class imbalance**: The "hard" class represents only 13.5% of training data, which may lead to lower recall on genuinely hard queries - **Domain coverage**: The training set covers general-purpose user prompts. Specialized domains (medical, legal, financial) may exhibit different complexity distributions - **English only**: No multilingual support in this version - **Adversarial robustness**: The model has not been tested against adversarial prompt manipulation designed to fool the complexity classifier ## Training Details | Hyperparameter | Value | |---|---| | **Base model** | Qwen/Qwen3.5-0.8B | | **LoRA rank (r)** | 16 | | **LoRA alpha (α)** | 32 | | **LoRA dropout** | 0.05 | | **Target modules** | q_proj, v_proj | | **Learning rate** | 2e-4 | | **Batch size** | 32 | | **Epochs** | 3 | | **Optimizer** | AdamW | | **Scheduler** | Cosine with warmup (5% steps) | | **Max sequence length** | 512 tokens | | **Training samples** | 65,307 | | **Validation samples** | 7,683 | | **Test samples** | 3,841 | | **Training hardware** | 1× NVIDIA A100 80GB | | **Training time** | ~2 hours | | **Framework** | PyTorch + HuggingFace PEFT | ## Environmental Impact Regolo.ai is committed to sustainable AI. This model was trained on GPU infrastructure powered by [Seeweb](https://www.seeweb.it/)'s data centers in Italy, which run on certified renewable energy. | Metric | Value | |---|---| | **Hardware** | 1× NVIDIA A100 80GB | | **Training duration** | ~2 hours | | **Estimated CO₂** | < 0.5 kg CO₂eq | | **Energy source** | Renewable (certified) | | **Location** | Italy (EU) | ## Citation ```bibtex @misc{regolo2026brick-complexity, title = {Brick Complexity Extractor: A LoRA Adapter for Query Complexity Classification in LLM Routing}, author = {Regolo.ai Team}, year = {2026}, url = {https://huggingface.co/regolo/brick-complexity-extractor} } ``` ## About Regolo.ai [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. We provide zero-data-retention, GDPR-native AI inference for enterprises that need privacy, compliance, and performance all from European data centers powered by renewable energy. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.
**[Website](https://regolo.ai) · [Docs](https://docs.regolo.ai) · [Discord](https://discord.gg/myuuVFcfJw) · [GitHub](https://github.com/regolo-ai) · [LinkedIn](https://www.linkedin.com/company/regolo-ai/)**