Text Classification
PEFT
Safetensors
English
lora
complexity-classification
llm-routing
query-difficulty
brick
semantic-router
inference-optimization
cost-reduction
reasoning-budget
Instructions to use regolo/brick-complexity-2-eco with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use regolo/brick-complexity-2-eco with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B") model = PeftModel.from_pretrained(base_model, "regolo/brick-complexity-2-eco") - Notebooks
- Google Colab
- Kaggle
| library_name: peft | |
| license: cc-by-nc-4.0 | |
| language: | |
| - en | |
| tags: | |
| - peft | |
| - safetensors | |
| - lora | |
| - complexity-classification | |
| - llm-routing | |
| - query-difficulty | |
| - brick | |
| - text-classification | |
| - semantic-router | |
| - inference-optimization | |
| - cost-reduction | |
| - reasoning-budget | |
| base_model: Qwen/Qwen3.5-0.8B | |
| pipeline_tag: text-classification | |
| <div align="center"> | |
| # Brick Complexity Classifier v2: `eco` | |
| </div> | |
| ## What is this? | |
| Classifier v2 is a family of small adapters that score each incoming prompt as **`easy` / `medium` / `hard`**, so a router can send it to the right tier of a model pool. Two variants optimize for different goals: | |
| - **`eco`**: optimized for **cost**. Biases predictions toward `easy` so most traffic stays on the cheap tier. Use when the cost-per-query bill matters more than squeezing the last accuracy point. | |
| - **`max`**: optimized for **routing accuracy**. Gives the sharpest easy/medium/hard split, so hard queries reliably reach the strongest tier and easy ones stay cheap. Use when answer quality is paramount. | |
| <div align="center"> | |
| Efficient variant tuned to route queries toward the cheap/easy tier. Prioritizes token cost savings: when uncertain, it prefers the smaller model. | |
| **[Regolo.ai](https://regolo.ai) | [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1)** | |
| [](https://creativecommons.org/licenses/by-nc/4.0/) | |
| [](https://huggingface.co/Qwen/Qwen3.5-0.8B) | |
| </div> | |
| --- | |
| ## Model Details | |
| | Property | Value | | |
| |---|---| | |
| | **Variant** | `eco` | | |
| | **Target** | Cost savings, minimize expensive-tier routing | | |
| | **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | | |
| | **Adapter type** | LoRA (r=32, α=32, dropout=0.1) | | |
| | **Output classes** | 3 (`easy`, `medium`, `hard`) | | |
| | **License** | CC BY-NC 4.0 | | |
| ## Available Formats | |
| | Format | Link | | |
| |---|---| | |
| | LoRA adapter | [regolo/brick-complexity-2-eco](https://huggingface.co/regolo/brick-complexity-2-eco) | | |
| | GGUF BF16 | [regolo/brick-complexity-2-eco-BF16-GGUF](https://huggingface.co/regolo/brick-complexity-2-eco-BF16-GGUF) | | |
| | GGUF Q8_0 | [regolo/brick-complexity-2-eco-Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-2-eco-Q8_0-GGUF) | | |
| | GGUF Q4_K_M | [regolo/brick-complexity-2-eco-Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-2-eco-Q4_K_M-GGUF) | | |
| ## Usage (PEFT) | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype=torch.bfloat16) | |
| tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-0.8B") | |
| model = PeftModel.from_pretrained(base, "regolo/brick-complexity-2-eco").eval() | |
| system = """You are a query difficulty classifier for an LLM routing system. | |
| Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly. | |
| Respond with ONLY one word: easy, medium, or hard.""" | |
| prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Design a distributed consensus algorithm<|im_end|>\n<|im_start|>assistant\n" | |
| ids = tok(prompt, return_tensors="pt").input_ids | |
| out = model.generate(ids, max_new_tokens=3, do_sample=False) | |
| print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip()) | |
| # Output: hard | |
| ``` | |
| ## Usage (vLLM) | |
| ```python | |
| from vllm import LLM, SamplingParams | |
| from vllm.lora.request import LoRARequest | |
| llm = LLM( | |
| model="Qwen/Qwen3.5-0.8B", | |
| enable_lora=True, | |
| max_lora_rank=32, | |
| dtype="bfloat16", | |
| ) | |
| sp = SamplingParams(temperature=0, max_tokens=3) | |
| system = """You are a query difficulty classifier for an LLM routing system. | |
| Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly. | |
| Respond with ONLY one word: easy, medium, or hard.""" | |
| prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Explain the rendering equation from radiometric first principles<|im_end|>\n<|im_start|>assistant\n" | |
| out = llm.generate( | |
| [prompt], | |
| sp, | |
| lora_request=LoRARequest("brick-complexity-2-eco", 1, "regolo/brick-complexity-2-eco"), | |
| ) | |
| print(out[0].outputs[0].text.strip()) | |
| # Output: hard | |
| ``` | |
| ## About Brick | |
| [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality. | |
| **[Website](https://regolo.ai) | [Docs](https://docs.regolo.ai) | [GitHub](https://github.com/regolo-ai) | [Discord](https://discord.gg/myuuVFcfJw)** | |