NLP and RAG Expert — Fine-Tuned with QLoRA

A TinyLlama 1.1B model fine-tuned on NLP and RAG domain knowledge using QLoRA (4-bit quantisation + LoRA adapters) on Google Colab T4 GPU.

Built by: Rohith Kumar Reddipogula MSc Data Science — University of Europe for Applied Sciences, Berlin

What This Model Knows

Fine-tuned to answer expert questions about:

  • Retrieval-Augmented Generation (RAG) systems
  • BM25 vs dense retrieval differences
  • FAISS vector databases and indexing
  • LoRA and QLoRA fine-tuning techniques
  • LLM evaluation frameworks (RAGAS, MRR, Recall@K)
  • Semantic search and dense embeddings
  • Hybrid retrieval strategies

Fine-Tuning Details

Parameter Value
Base model TinyLlama 1.1B Chat
Fine-tuning method QLoRA (4-bit NF4 quantisation)
LoRA rank 16
LoRA alpha 16
Training examples 10 instruction pairs
Epochs 3
Learning rate 2e-4
Hardware NVIDIA T4 GPU (Google Colab free tier)
Library Unsloth (2x faster than standard HuggingFace)
Parameters trained ~0.089% of total parameters

My Observations

Training only 0.089% of parameters using QLoRA achieved domain adaptation while preserving the base model's general capabilities.

With 10 high-quality training examples on a 1.1B parameter model, the experiment demonstrated the complete QLoRA workflow end-to-end. Larger datasets (1000+ examples) and bigger models (7B+) would produce stronger domain specialisation.

This experiment was intentionally kept small to fit within free Colab T4 GPU memory limits — the same QLoRA technique scales directly to 7B-70B models in production environments.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model + fine-tuned adapters
base_model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
)
model = PeftModel.from_pretrained(base_model, "Rohith2026/nlp-rag-expert")
tokenizer = AutoTokenizer.from_pretrained("Rohith2026/nlp-rag-expert")

# Ask a question
prompt = "### Instruction:\nWhat is RAG?\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Related Projects

Project Description Live Demo
Hybrid RAG System BM25 + dense embeddings, 93% Recall@10, 8.84M passages Live Demo
AI Agent System ReAct agent with 3 tools — web search, calculator, RAG Live Demo
LLM Fine-Tuning (this) QLoRA fine-tuning on NLP/RAG domain Model

Author

Rohith Kumar Reddipogula

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rohith2026/nlp-rag-expert

Adapter
(122)
this model