saeedbenadeeb
/

UTN_LLMs_Chatbot

+---
+license: mit
+language:
+  - en
+library_name: peft
+base_model: Qwen/Qwen3-0.6B
+tags:
+  - lora
+  - vera
+  - peft
+  - sft
+  - chatbot
+  - rag
+  - qwen3
+  - university
+pipeline_tag: text-generation
+---
+# UTN Student Chatbot — Finetuned Qwen3-0.6B
+A domain-adapted chatbot for the **University of Technology Nuremberg (UTN)**, built by finetuning [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) on curated UTN-specific Q&A data using parameter-efficient methods.
+## Available Adapters
+| Adapter | Method | Trainable Params | Path |
+|---------|--------|-----------------|------|
+| **LoRA** (recommended) | Low-Rank Adaptation (r=64, alpha=128) | 161M (21.4%) | `models/utn-qwen3-lora` |
+| VeRA | Vector-based Random Matrix Adaptation (r=256) | 8M (1.1%) | `models/utn-qwen3-vera` |
+## Evaluation Results
+### Validation Set (17 examples)
+| Metric | LoRA |
+|--------|------|
+| ROUGE-1 | 0.5924 |
+| ROUGE-2 | 0.4967 |
+| ROUGE-L | 0.5687 |
+### FAQ Benchmark (34 questions, with CRAG RAG pipeline)
+| Metric | LoRA + CRAG |
+|--------|-------------|
+| ROUGE-1 | 0.7096 |
+| ROUGE-2 | 0.6124 |
+| ROUGE-L | 0.6815 |
+## Quick Start — LoRA (Recommended)
+```python
+import torch
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+base_model_id = "Qwen/Qwen3-0.6B"
+adapter_repo = "saeedbenadeeb/UTN_LLMs_Chatbot"
+adapter_path = "models/utn-qwen3-lora"
+tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+model = PeftModel.from_pretrained(
+    model,
+    adapter_repo,
+    subfolder=adapter_path,
+)
+model.eval()
+messages = [
+    {"role": "system", "content": "You are a helpful assistant for the University of Technology Nuremberg (UTN)."},
+    {"role": "user", "content": "What are the admission requirements for AI & Robotics?"},
+]
+prompt = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
+)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    output = model.generate(
+        **inputs,
+        max_new_tokens=512,
+        temperature=0.3,
+        top_p=0.9,
+        do_sample=True,
+    )
+response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+## Quick Start — VeRA
+```python
+# Same as above, but change the adapter path:
+adapter_path = "models/utn-qwen3-vera"
+model = PeftModel.from_pretrained(
+    model,
+    adapter_repo,
+    subfolder=adapter_path,
+)
+```
+## Training Details
+- **Base model**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
+- **Training data**: 1,289 curated UTN Q&A pairs (scraped from utn.de, FAQs, module handbooks)
+- **Validation data**: 17 held-out examples
+- **Trainer**: TRL SFTTrainer
+- **Hardware**: NVIDIA A40 (48 GB)
+- **LoRA config**: r=64, alpha=128, dropout=0.05, target=all linear layers, lr=3e-4, 5 epochs
+- **VeRA config**: r=256, d_initial=0.1, prng_key=42, target=all linear layers, lr=5e-4, 5 epochs
+- **Framework**: PEFT 0.18.1, Transformers 5.2.0, PyTorch 2.6.0
+## Architecture
+The full system uses a **Corrective RAG (CRAG)** pipeline:
+1. **Hybrid retrieval**: FAISS dense search (BGE-small-en-v1.5) + BM25 sparse search, merged via Reciprocal Rank Fusion
+2. **Relevance grading**: Score-based heuristic to verify retrieved documents answer the question
+3. **Query rewriting**: If documents are irrelevant, the query is rewritten and retrieval retried
+4. **Generation**: The finetuned Qwen3-0.6B + LoRA generates grounded answers from retrieved context
+## Citation
+```bibtex
+@misc{utn-chatbot-2026,
+  title={UTN Student Chatbot: Domain-Adapted Qwen3-0.6B with CRAG},
+  author={Saeed Adeeb},
+  year={2026},
+  url={https://huggingface.co/saeedbenadeeb/UTN_LLMs_Chatbot}
+}
+```