| --- |
| library_name: peft |
| license: cc-by-nc-4.0 |
| language: |
| - en |
| tags: |
| - peft |
| - safetensors |
| - lora |
| - complexity-classification |
| - llm-routing |
| - query-difficulty |
| - brick |
| - text-classification |
| - semantic-router |
| - inference-optimization |
| - cost-reduction |
| - reasoning-budget |
| base_model: Qwen/Qwen3.5-0.8B |
| pipeline_tag: text-classification |
| --- |
| |
| <div align="center"> |
|
|
| # Brick Complexity Classifier v2: `max` |
|
|
| </div> |
|
|
| ## What is this? |
|
|
| Classifier v2 is a family of small adapters that score each incoming prompt as **`easy` / `medium` / `hard`**, so a router can send it to the right tier of a model pool. Two variants optimize for different goals: |
|
|
| - **`eco`**: optimized for **cost**. Biases predictions toward `easy` so most traffic stays on the cheap tier. Use when the cost-per-query bill matters more than squeezing the last accuracy point. |
| - **`max`**: optimized for **routing accuracy**. Gives the sharpest easy/medium/hard split, so hard queries reliably reach the strongest tier and easy ones stay cheap. Use when answer quality is paramount. |
|
|
| <div align="center"> |
|
|
| Maximum-accuracy variant tuned to classify query complexity as precisely as possible. Prioritizes routing quality over cost. |
|
|
| **[Regolo.ai](https://regolo.ai) | [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1)** |
|
|
| [](https://creativecommons.org/licenses/by-nc/4.0/) |
| [](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
|
|
| </div> |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | **Variant** | `max` | |
| | **Target** | Best classification accuracy, maximum routing quality | |
| | **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | |
| | **Adapter type** | LoRA (r=32, α=32, dropout=0.1) | |
| | **Output classes** | 3 (`easy`, `medium`, `hard`) | |
| | **License** | CC BY-NC 4.0 | |
|
|
| ## Available Formats |
|
|
| | Format | Link | |
| |---|---| |
| | LoRA adapter | [regolo/brick-complexity-2-max](https://huggingface.co/regolo/brick-complexity-2-max) | |
| | GGUF BF16 | [regolo/brick-complexity-2-max-BF16-GGUF](https://huggingface.co/regolo/brick-complexity-2-max-BF16-GGUF) | |
| | GGUF Q8_0 | [regolo/brick-complexity-2-max-Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-2-max-Q8_0-GGUF) | |
| | GGUF Q4_K_M | [regolo/brick-complexity-2-max-Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-2-max-Q4_K_M-GGUF) | |
| |
| ## Usage (PEFT) |
| |
| ```python |
| from peft import PeftModel |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype=torch.bfloat16) |
| tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-0.8B") |
| model = PeftModel.from_pretrained(base, "regolo/brick-complexity-2-max").eval() |
| |
| system = """You are a query difficulty classifier for an LLM routing system. |
| Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly. |
| Respond with ONLY one word: easy, medium, or hard.""" |
| prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Design a distributed consensus algorithm<|im_end|>\n<|im_start|>assistant\n" |
| ids = tok(prompt, return_tensors="pt").input_ids |
| out = model.generate(ids, max_new_tokens=3, do_sample=False) |
| print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip()) |
| # Output: hard |
| ``` |
| |
| ## Usage (vLLM) |
| |
| ```python |
| from vllm import LLM, SamplingParams |
| from vllm.lora.request import LoRARequest |
| |
| llm = LLM( |
| model="Qwen/Qwen3.5-0.8B", |
| enable_lora=True, |
| max_lora_rank=32, |
| dtype="bfloat16", |
| ) |
| sp = SamplingParams(temperature=0, max_tokens=3) |
| |
| system = """You are a query difficulty classifier for an LLM routing system. |
| Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly. |
| Respond with ONLY one word: easy, medium, or hard.""" |
| prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Explain the rendering equation from radiometric first principles<|im_end|>\n<|im_start|>assistant\n" |
| |
| out = llm.generate( |
| [prompt], |
| sp, |
| lora_request=LoRARequest("brick-complexity-2-max", 1, "regolo/brick-complexity-2-max"), |
| ) |
| print(out[0].outputs[0].text.strip()) |
| # Output: hard |
| ``` |
| |
| ## About Brick |
| |
| [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality. |
| |
| **[Website](https://regolo.ai) | [Docs](https://docs.regolo.ai) | [GitHub](https://github.com/regolo-ai) | [Discord](https://discord.gg/myuuVFcfJw)** |
| |