UTN_LLMs_Chatbot / README.md
saeedbenadeeb's picture
Upload README.md with huggingface_hub
ea7db3b verified
---
license: mit
language:
- en
library_name: peft
base_model: Qwen/Qwen3-0.6B
tags:
- lora
- vera
- peft
- sft
- chatbot
- rag
- qwen3
- university
pipeline_tag: text-generation
---
# UTN Student Chatbot — Finetuned Qwen3-0.6B
A domain-adapted chatbot for the **University of Technology Nuremberg (UTN)**, built by finetuning [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) on curated UTN-specific Q&A data using parameter-efficient methods.
## Available Adapters
| Adapter | Method | Trainable Params | Path |
|---------|--------|-----------------|------|
| **LoRA** (recommended) | Low-Rank Adaptation (r=64, alpha=128) | 161M (21.4%) | `models/utn-qwen3-lora` |
| VeRA | Vector-based Random Matrix Adaptation (r=256) | 8M (1.1%) | `models/utn-qwen3-vera` |
## Evaluation Results
### Validation Set (17 examples)
| Metric | LoRA |
|--------|------|
| ROUGE-1 | 0.5924 |
| ROUGE-2 | 0.4967 |
| ROUGE-L | 0.5687 |
### FAQ Benchmark (34 questions, with CRAG RAG pipeline)
| Metric | LoRA + CRAG |
|--------|-------------|
| ROUGE-1 | 0.7096 |
| ROUGE-2 | 0.6124 |
| ROUGE-L | 0.6815 |
## Quick Start — LoRA (Recommended)
```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "Qwen/Qwen3-0.6B"
adapter_repo = "saeedbenadeeb/UTN_LLMs_Chatbot"
adapter_path = "models/utn-qwen3-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(
model,
adapter_repo,
subfolder=adapter_path,
)
model.eval()
messages = [
{"role": "system", "content": "You are a helpful assistant for the University of Technology Nuremberg (UTN)."},
{"role": "user", "content": "What are the admission requirements for AI & Robotics?"},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```
## Quick Start — VeRA
```python
# Same as above, but change the adapter path:
adapter_path = "models/utn-qwen3-vera"
model = PeftModel.from_pretrained(
model,
adapter_repo,
subfolder=adapter_path,
)
```
## Training Details
- **Base model**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- **Training data**: 1,289 curated UTN Q&A pairs (scraped from utn.de, FAQs, module handbooks)
- **Validation data**: 17 held-out examples
- **Trainer**: TRL SFTTrainer
- **Hardware**: NVIDIA A40 (48 GB)
- **LoRA config**: r=64, alpha=128, dropout=0.05, target=all linear layers, lr=3e-4, 5 epochs
- **VeRA config**: r=256, d_initial=0.1, prng_key=42, target=all linear layers, lr=5e-4, 5 epochs
- **Framework**: PEFT 0.18.1, Transformers 5.2.0, PyTorch 2.6.0
## Architecture
The full system uses a **Corrective RAG (CRAG)** pipeline:
1. **Hybrid retrieval**: FAISS dense search (BGE-small-en-v1.5) + BM25 sparse search, merged via Reciprocal Rank Fusion
2. **Relevance grading**: Score-based heuristic to verify retrieved documents answer the question
3. **Query rewriting**: If documents are irrelevant, the query is rewritten and retrieval retried
4. **Generation**: The finetuned Qwen3-0.6B + LoRA generates grounded answers from retrieved context
## Citation
```bibtex
@misc{utn-chatbot-2026,
title={UTN Student Chatbot: Domain-Adapted Qwen3-0.6B with CRAG},
author={Saeed Adeeb},
year={2026},
url={https://huggingface.co/saeedbenadeeb/UTN_LLMs_Chatbot}
}
```