Brick Complexity Extractor (Q8_0 GGUF)
Q8_0 quantized GGUF of regolo/brick-complexity-extractor
Model Details
| Property | Value |
|---|---|
| Quantization | Q8_0 |
| File | brick-complexity-extractor-Q8_0.gguf |
| Size | 775 MB |
| Bits per weight | 8.0 |
| Original model | regolo/brick-complexity-extractor |
| Base model | Qwen/Qwen3.5-0.8B |
| Output classes | 3 (easy, medium, hard) |
| License | CC BY-NC 4.0 |
8-bit integer quantization. Near-lossless quality with ~50% size reduction. Recommended for most deployments.
This is a full merged model (base Qwen3.5-0.8B + LoRA adapter merged and quantized), so no separate adapter loading is needed.
All Available Quantizations
| Model | Quant | Size | BPW |
|---|---|---|---|
| BF16-GGUF | BF16 | 1.5 GB | 16.0 |
| Q8_0-GGUF | Q8_0 | 775 MB | 8.0 |
| Q4_K_M-GGUF | Q4_K_M | 494 MB | 5.5 |
Usage with llama.cpp
# Download
huggingface-cli download regolo/brick-complexity-extractor-Q8_0-GGUF \
brick-complexity-extractor-Q8_0.gguf --local-dir ./models
# Run inference
./llama-cli -m ./models/brick-complexity-extractor-Q8_0.gguf \
-p "<|im_start|>system
You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
<|im_start|>user
Classify: What is the capital of France?<|im_end|>
<|im_start|>assistant
" \
-n 5 --temp 0
Usage with Ollama
cat > Modelfile <<EOF
FROM ./brick-complexity-extractor-Q8_0.gguf
SYSTEM \"\"\"You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard.\"\"\"
TEMPLATE \"\"\"<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
Classify: {{ .Prompt }}<|im_end|>
<|im_start|>assistant
\"\"\"
PARAMETER temperature 0
PARAMETER num_predict 5
EOF
ollama create brick-complexity -f Modelfile
ollama run brick-complexity "Design a distributed consensus algorithm"
# Output: hard
Usage with vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="regolo/brick-complexity-extractor-Q8_0-GGUF")
sampling_params = SamplingParams(temperature=0, max_tokens=5)
prompt = \"\"\"<|im_start|>system
You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard.
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
<|im_start|>user
Classify: Explain the rendering equation from radiometric first principles<|im_end|>
<|im_start|>assistant
\"\"\"
output = llm.generate([prompt], sampling_params)
print(output[0].outputs[0].text.strip())
# Output: hard
Note on GGUF Inference
The GGUF model uses generative text output (generates "easy", "medium", or "hard") rather than logit-based classification used by the original LoRA adapter. For production deployments requiring maximum accuracy, consider using the original LoRA adapter with the PEFT library.
About
Regolo.ai is the EU-sovereign LLM inference platform built on Seeweb infrastructure. Brick is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.
- Downloads last month
- 15
8-bit
Model tree for regolo/brick-complexity-extractor-Q8_0-GGUF
Base model
Qwen/Qwen3.5-0.8B-BaseCollection including regolo/brick-complexity-extractor-Q8_0-GGUF
Evaluation results
- Accuracy (3-class) on brick-complexity-extractortest set self-reported0.890
- Weighted F1 on brick-complexity-extractortest set self-reported0.870