Upload README.md with huggingface_hub

ea7db3b verified 11 days ago

3.93 kB

	---
	license: mit
	language:
	- en
	library_name: peft
	base_model: Qwen/Qwen3-0.6B
	tags:
	- lora
	- vera
	- peft
	- sft
	- chatbot
	- rag
	- qwen3
	- university
	pipeline_tag: text-generation
	---

	# UTN Student Chatbot — Finetuned Qwen3-0.6B

	A domain-adapted chatbot for the University of Technology Nuremberg (UTN), built by finetuning [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) on curated UTN-specific Q&A data using parameter-efficient methods.

	## Available Adapters

	\| Adapter \| Method \| Trainable Params \| Path \|
	\|---------\|--------\|-----------------\|------\|
	\| LoRA (recommended) \| Low-Rank Adaptation (r=64, alpha=128) \| 161M (21.4%) \| `models/utn-qwen3-lora` \|
	\| VeRA \| Vector-based Random Matrix Adaptation (r=256) \| 8M (1.1%) \| `models/utn-qwen3-vera` \|

	## Evaluation Results

	### Validation Set (17 examples)

	\| Metric \| LoRA \|
	\|--------\|------\|
	\| ROUGE-1 \| 0.5924 \|
	\| ROUGE-2 \| 0.4967 \|
	\| ROUGE-L \| 0.5687 \|

	### FAQ Benchmark (34 questions, with CRAG RAG pipeline)

	\| Metric \| LoRA + CRAG \|
	\|--------\|-------------\|
	\| ROUGE-1 \| 0.7096 \|
	\| ROUGE-2 \| 0.6124 \|
	\| ROUGE-L \| 0.6815 \|

	## Quick Start — LoRA (Recommended)

	```python
	import torch
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_model_id = "Qwen/Qwen3-0.6B"
	adapter_repo = "saeedbenadeeb/UTN_LLMs_Chatbot"
	adapter_path = "models/utn-qwen3-lora"

	tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)
	model = PeftModel.from_pretrained(
	model,
	adapter_repo,
	subfolder=adapter_path,
	)
	model.eval()

	messages = [
	{"role": "system", "content": "You are a helpful assistant for the University of Technology Nuremberg (UTN)."},
	{"role": "user", "content": "What are the admission requirements for AI & Robotics?"},
	]

	prompt = tokenizer.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
	)
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	output = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.3,
	top_p=0.9,
	do_sample=True,
	)

	response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
	print(response)
	```

	## Quick Start — VeRA

	```python
	# Same as above, but change the adapter path:
	adapter_path = "models/utn-qwen3-vera"

	model = PeftModel.from_pretrained(
	model,
	adapter_repo,
	subfolder=adapter_path,
	)
	```

	## Training Details

	- Base model: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
	- Training data: 1,289 curated UTN Q&A pairs (scraped from utn.de, FAQs, module handbooks)
	- Validation data: 17 held-out examples
	- Trainer: TRL SFTTrainer
	- Hardware: NVIDIA A40 (48 GB)
	- LoRA config: r=64, alpha=128, dropout=0.05, target=all linear layers, lr=3e-4, 5 epochs
	- VeRA config: r=256, d_initial=0.1, prng_key=42, target=all linear layers, lr=5e-4, 5 epochs
	- Framework: PEFT 0.18.1, Transformers 5.2.0, PyTorch 2.6.0

	## Architecture

	The full system uses a Corrective RAG (CRAG) pipeline:

	1. Hybrid retrieval: FAISS dense search (BGE-small-en-v1.5) + BM25 sparse search, merged via Reciprocal Rank Fusion
	2. Relevance grading: Score-based heuristic to verify retrieved documents answer the question
	3. Query rewriting: If documents are irrelevant, the query is rewritten and retrieval retried
	4. Generation: The finetuned Qwen3-0.6B + LoRA generates grounded answers from retrieved context

	## Citation

	```bibtex
	@misc{utn-chatbot-2026,
	title={UTN Student Chatbot: Domain-Adapted Qwen3-0.6B with CRAG},
	author={Saeed Adeeb},
	year={2026},
	url={https://huggingface.co/saeedbenadeeb/UTN_LLMs_Chatbot}
	}
	```