File size: 3,928 Bytes
ea7db3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
license: mit
language:
  - en
library_name: peft
base_model: Qwen/Qwen3-0.6B
tags:
  - lora
  - vera
  - peft
  - sft
  - chatbot
  - rag
  - qwen3
  - university
pipeline_tag: text-generation
---

# UTN Student Chatbot — Finetuned Qwen3-0.6B

A domain-adapted chatbot for the **University of Technology Nuremberg (UTN)**, built by finetuning [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) on curated UTN-specific Q&A data using parameter-efficient methods.

## Available Adapters

| Adapter | Method | Trainable Params | Path |
|---------|--------|-----------------|------|
| **LoRA** (recommended) | Low-Rank Adaptation (r=64, alpha=128) | 161M (21.4%) | `models/utn-qwen3-lora` |
| VeRA | Vector-based Random Matrix Adaptation (r=256) | 8M (1.1%) | `models/utn-qwen3-vera` |

## Evaluation Results

### Validation Set (17 examples)

| Metric | LoRA |
|--------|------|
| ROUGE-1 | 0.5924 |
| ROUGE-2 | 0.4967 |
| ROUGE-L | 0.5687 |

### FAQ Benchmark (34 questions, with CRAG RAG pipeline)

| Metric | LoRA + CRAG |
|--------|-------------|
| ROUGE-1 | 0.7096 |
| ROUGE-2 | 0.6124 |
| ROUGE-L | 0.6815 |

## Quick Start — LoRA (Recommended)

```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen3-0.6B"
adapter_repo = "saeedbenadeeb/UTN_LLMs_Chatbot"
adapter_path = "models/utn-qwen3-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    model,
    adapter_repo,
    subfolder=adapter_path,
)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant for the University of Technology Nuremberg (UTN)."},
    {"role": "user", "content": "What are the admission requirements for AI & Robotics?"},
]

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.3,
        top_p=0.9,
        do_sample=True,
    )

response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```

## Quick Start — VeRA

```python
# Same as above, but change the adapter path:
adapter_path = "models/utn-qwen3-vera"

model = PeftModel.from_pretrained(
    model,
    adapter_repo,
    subfolder=adapter_path,
)
```

## Training Details

- **Base model**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- **Training data**: 1,289 curated UTN Q&A pairs (scraped from utn.de, FAQs, module handbooks)
- **Validation data**: 17 held-out examples
- **Trainer**: TRL SFTTrainer
- **Hardware**: NVIDIA A40 (48 GB)
- **LoRA config**: r=64, alpha=128, dropout=0.05, target=all linear layers, lr=3e-4, 5 epochs
- **VeRA config**: r=256, d_initial=0.1, prng_key=42, target=all linear layers, lr=5e-4, 5 epochs
- **Framework**: PEFT 0.18.1, Transformers 5.2.0, PyTorch 2.6.0

## Architecture

The full system uses a **Corrective RAG (CRAG)** pipeline:

1. **Hybrid retrieval**: FAISS dense search (BGE-small-en-v1.5) + BM25 sparse search, merged via Reciprocal Rank Fusion
2. **Relevance grading**: Score-based heuristic to verify retrieved documents answer the question
3. **Query rewriting**: If documents are irrelevant, the query is rewritten and retrieval retried
4. **Generation**: The finetuned Qwen3-0.6B + LoRA generates grounded answers from retrieved context

## Citation

```bibtex
@misc{utn-chatbot-2026,
  title={UTN Student Chatbot: Domain-Adapted Qwen3-0.6B with CRAG},
  author={Saeed Adeeb},
  year={2026},
  url={https://huggingface.co/saeedbenadeeb/UTN_LLMs_Chatbot}
}
```