NLP and RAG Expert — Fine-Tuned with QLoRA

A TinyLlama 1.1B model fine-tuned on NLP and RAG domain knowledge using QLoRA (4-bit quantisation + LoRA adapters) on Google Colab T4 GPU.

Built by: Rohith Kumar Reddipogula MSc Data Science — University of Europe for Applied Sciences, Berlin

What This Model Knows

Fine-tuned to answer expert questions about:

Retrieval-Augmented Generation (RAG) systems
BM25 vs dense retrieval differences
FAISS vector databases and indexing
LoRA and QLoRA fine-tuning techniques
LLM evaluation frameworks (RAGAS, MRR, Recall@K)
Semantic search and dense embeddings
Hybrid retrieval strategies

Fine-Tuning Details

Parameter	Value
Base model	TinyLlama 1.1B Chat
Fine-tuning method	QLoRA (4-bit NF4 quantisation)
LoRA rank	16
LoRA alpha	16
Training examples	10 instruction pairs
Epochs	3
Learning rate	2e-4
Hardware	NVIDIA T4 GPU (Google Colab free tier)
Library	Unsloth (2x faster than standard HuggingFace)
Parameters trained	~0.089% of total parameters

My Observations

Training only 0.089% of parameters using QLoRA achieved domain adaptation while preserving the base model's general capabilities.

With 10 high-quality training examples on a 1.1B parameter model, the experiment demonstrated the complete QLoRA workflow end-to-end. Larger datasets (1000+ examples) and bigger models (7B+) would produce stronger domain specialisation.

This experiment was intentionally kept small to fit within free Colab T4 GPU memory limits — the same QLoRA technique scales directly to 7B-70B models in production environments.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model + fine-tuned adapters
base_model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
)
model = PeftModel.from_pretrained(base_model, "Rohith2026/nlp-rag-expert")
tokenizer = AutoTokenizer.from_pretrained("Rohith2026/nlp-rag-expert")

# Ask a question
prompt = "### Instruction:\nWhat is RAG?\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Related Projects

Project	Description	Live Demo
Hybrid RAG System	BM25 + dense embeddings, 93% Recall@10, 8.84M passages	Live Demo
AI Agent System	ReAct agent with 3 tools — web search, calculator, RAG	Live Demo
LLM Fine-Tuning (this)	QLoRA fine-tuning on NLP/RAG domain	Model

Author

Rohith Kumar Reddipogula

LinkedIn: linkedin.com/in/rohith-kumar-reddipogula-a6692030b
GitHub: github.com/RohithkumarReddipogula
Email: rohithkumar336699@gmail.com
HuggingFace: huggingface.co/Rohith2026

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Rohith2026/nlp-rag-expert

Base model

unsloth/tinyllama-chat-bnb-4bit

Adapter

(122)

this model