--- license: mit language: - en library_name: peft base_model: Qwen/Qwen3-0.6B tags: - lora - vera - peft - sft - chatbot - rag - qwen3 - university pipeline_tag: text-generation --- # UTN Student Chatbot — Finetuned Qwen3-0.6B A domain-adapted chatbot for the **University of Technology Nuremberg (UTN)**, built by finetuning [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) on curated UTN-specific Q&A data using parameter-efficient methods. ## Available Adapters | Adapter | Method | Trainable Params | Path | |---------|--------|-----------------|------| | **LoRA** (recommended) | Low-Rank Adaptation (r=64, alpha=128) | 161M (21.4%) | `models/utn-qwen3-lora` | | VeRA | Vector-based Random Matrix Adaptation (r=256) | 8M (1.1%) | `models/utn-qwen3-vera` | ## Evaluation Results ### Validation Set (17 examples) | Metric | LoRA | |--------|------| | ROUGE-1 | 0.5924 | | ROUGE-2 | 0.4967 | | ROUGE-L | 0.5687 | ### FAQ Benchmark (34 questions, with CRAG RAG pipeline) | Metric | LoRA + CRAG | |--------|-------------| | ROUGE-1 | 0.7096 | | ROUGE-2 | 0.6124 | | ROUGE-L | 0.6815 | ## Quick Start — LoRA (Recommended) ```python import torch from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_model_id = "Qwen/Qwen3-0.6B" adapter_repo = "saeedbenadeeb/UTN_LLMs_Chatbot" adapter_path = "models/utn-qwen3-lora" tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained( model, adapter_repo, subfolder=adapter_path, ) model.eval() messages = [ {"role": "system", "content": "You are a helpful assistant for the University of Technology Nuremberg (UTN)."}, {"role": "user", "content": "What are the admission requirements for AI & Robotics?"}, ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=512, temperature=0.3, top_p=0.9, do_sample=True, ) response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) ``` ## Quick Start — VeRA ```python # Same as above, but change the adapter path: adapter_path = "models/utn-qwen3-vera" model = PeftModel.from_pretrained( model, adapter_repo, subfolder=adapter_path, ) ``` ## Training Details - **Base model**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) - **Training data**: 1,289 curated UTN Q&A pairs (scraped from utn.de, FAQs, module handbooks) - **Validation data**: 17 held-out examples - **Trainer**: TRL SFTTrainer - **Hardware**: NVIDIA A40 (48 GB) - **LoRA config**: r=64, alpha=128, dropout=0.05, target=all linear layers, lr=3e-4, 5 epochs - **VeRA config**: r=256, d_initial=0.1, prng_key=42, target=all linear layers, lr=5e-4, 5 epochs - **Framework**: PEFT 0.18.1, Transformers 5.2.0, PyTorch 2.6.0 ## Architecture The full system uses a **Corrective RAG (CRAG)** pipeline: 1. **Hybrid retrieval**: FAISS dense search (BGE-small-en-v1.5) + BM25 sparse search, merged via Reciprocal Rank Fusion 2. **Relevance grading**: Score-based heuristic to verify retrieved documents answer the question 3. **Query rewriting**: If documents are irrelevant, the query is rewritten and retrieval retried 4. **Generation**: The finetuned Qwen3-0.6B + LoRA generates grounded answers from retrieved context ## Citation ```bibtex @misc{utn-chatbot-2026, title={UTN Student Chatbot: Domain-Adapted Qwen3-0.6B with CRAG}, author={Saeed Adeeb}, year={2026}, url={https://huggingface.co/saeedbenadeeb/UTN_LLMs_Chatbot} } ```