UTN Student Chatbot β€” Finetuned Qwen3-0.6B

A domain-adapted chatbot for the University of Technology Nuremberg (UTN), built by finetuning Qwen3-0.6B on curated UTN-specific Q&A data using parameter-efficient methods.

Available Adapters

Adapter Method Trainable Params Path
LoRA (recommended) Low-Rank Adaptation (r=64, alpha=128) 161M (21.4%) models/utn-qwen3-lora
VeRA Vector-based Random Matrix Adaptation (r=256) 8M (1.1%) models/utn-qwen3-vera

Evaluation Results

Validation Set (17 examples)

Metric LoRA
ROUGE-1 0.5924
ROUGE-2 0.4967
ROUGE-L 0.5687

FAQ Benchmark (34 questions, with CRAG RAG pipeline)

Metric LoRA + CRAG
ROUGE-1 0.7096
ROUGE-2 0.6124
ROUGE-L 0.6815

Quick Start β€” LoRA (Recommended)

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen3-0.6B"
adapter_repo = "saeedbenadeeb/UTN_LLMs_Chatbot"
adapter_path = "models/utn-qwen3-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    model,
    adapter_repo,
    subfolder=adapter_path,
)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant for the University of Technology Nuremberg (UTN)."},
    {"role": "user", "content": "What are the admission requirements for AI & Robotics?"},
]

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.3,
        top_p=0.9,
        do_sample=True,
    )

response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Quick Start β€” VeRA

# Same as above, but change the adapter path:
adapter_path = "models/utn-qwen3-vera"

model = PeftModel.from_pretrained(
    model,
    adapter_repo,
    subfolder=adapter_path,
)

Training Details

  • Base model: Qwen/Qwen3-0.6B
  • Training data: 1,289 curated UTN Q&A pairs (scraped from utn.de, FAQs, module handbooks)
  • Validation data: 17 held-out examples
  • Trainer: TRL SFTTrainer
  • Hardware: NVIDIA A40 (48 GB)
  • LoRA config: r=64, alpha=128, dropout=0.05, target=all linear layers, lr=3e-4, 5 epochs
  • VeRA config: r=256, d_initial=0.1, prng_key=42, target=all linear layers, lr=5e-4, 5 epochs
  • Framework: PEFT 0.18.1, Transformers 5.2.0, PyTorch 2.6.0

Architecture

The full system uses a Corrective RAG (CRAG) pipeline:

  1. Hybrid retrieval: FAISS dense search (BGE-small-en-v1.5) + BM25 sparse search, merged via Reciprocal Rank Fusion
  2. Relevance grading: Score-based heuristic to verify retrieved documents answer the question
  3. Query rewriting: If documents are irrelevant, the query is rewritten and retrieval retried
  4. Generation: The finetuned Qwen3-0.6B + LoRA generates grounded answers from retrieved context

Citation

@misc{utn-chatbot-2026,
  title={UTN Student Chatbot: Domain-Adapted Qwen3-0.6B with CRAG},
  author={Saeed Adeeb},
  year={2026},
  url={https://huggingface.co/saeedbenadeeb/UTN_LLMs_Chatbot}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for saeedbenadeeb/UTN_LLMs_Chatbot

Finetuned
Qwen/Qwen3-0.6B
Adapter
(348)
this model