NLP and RAG Expert — Fine-Tuned with QLoRA
A TinyLlama 1.1B model fine-tuned on NLP and RAG domain knowledge using QLoRA (4-bit quantisation + LoRA adapters) on Google Colab T4 GPU.
Built by: Rohith Kumar Reddipogula MSc Data Science — University of Europe for Applied Sciences, Berlin
What This Model Knows
Fine-tuned to answer expert questions about:
- Retrieval-Augmented Generation (RAG) systems
- BM25 vs dense retrieval differences
- FAISS vector databases and indexing
- LoRA and QLoRA fine-tuning techniques
- LLM evaluation frameworks (RAGAS, MRR, Recall@K)
- Semantic search and dense embeddings
- Hybrid retrieval strategies
Fine-Tuning Details
| Parameter | Value |
|---|---|
| Base model | TinyLlama 1.1B Chat |
| Fine-tuning method | QLoRA (4-bit NF4 quantisation) |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| Training examples | 10 instruction pairs |
| Epochs | 3 |
| Learning rate | 2e-4 |
| Hardware | NVIDIA T4 GPU (Google Colab free tier) |
| Library | Unsloth (2x faster than standard HuggingFace) |
| Parameters trained | ~0.089% of total parameters |
My Observations
Training only 0.089% of parameters using QLoRA achieved domain adaptation while preserving the base model's general capabilities.
With 10 high-quality training examples on a 1.1B parameter model, the experiment demonstrated the complete QLoRA workflow end-to-end. Larger datasets (1000+ examples) and bigger models (7B+) would produce stronger domain specialisation.
This experiment was intentionally kept small to fit within free Colab T4 GPU memory limits — the same QLoRA technique scales directly to 7B-70B models in production environments.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model + fine-tuned adapters
base_model = AutoModelForCausalLM.from_pretrained(
"TinyLlama/TinyLlama-1.1B-Chat-v1.0"
)
model = PeftModel.from_pretrained(base_model, "Rohith2026/nlp-rag-expert")
tokenizer = AutoTokenizer.from_pretrained("Rohith2026/nlp-rag-expert")
# Ask a question
prompt = "### Instruction:\nWhat is RAG?\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Related Projects
| Project | Description | Live Demo |
|---|---|---|
| Hybrid RAG System | BM25 + dense embeddings, 93% Recall@10, 8.84M passages | Live Demo |
| AI Agent System | ReAct agent with 3 tools — web search, calculator, RAG | Live Demo |
| LLM Fine-Tuning (this) | QLoRA fine-tuning on NLP/RAG domain | Model |
Author
Rohith Kumar Reddipogula
- LinkedIn: linkedin.com/in/rohith-kumar-reddipogula-a6692030b
- GitHub: github.com/RohithkumarReddipogula
- Email: rohithkumar336699@gmail.com
- HuggingFace: huggingface.co/Rohith2026
Model tree for Rohith2026/nlp-rag-expert
Base model
unsloth/tinyllama-chat-bnb-4bit