π€ Qwen2.5-7B Fine-tuned for RAG using RAFT Approach
A powerful LoRA-fine-tuned Qwen2.5-7B model using RAFT (Retrieval-Augmented Fine-Tuning) for RAG tasks
π Model Overview
This model is a LoRA (Low-Rank Adaptation) fine-tuned version of unsloth/Qwen2.5-7B using the RAFT (Retrieval-Augmented Fine-Tuning) approach on the MS MARCO v1.1 dataset. The model excels at answering questions based on retrieved passages and is optimized for RAG (Retrieval-Augmented Generation) workflows.
π§ Model Details
| Attribute | Value |
|---|---|
| Base Model | unsloth/Qwen2.5-7B |
| Fine-tuning Method | LoRA + RAFT |
| Dataset | MS MARCO v1.1 (75k samples) |
| Training Steps | 9,128 |
| Task | Retrieval-Augmented Generation |
| Language | English |
| Model Size | ~7B parameters (24K context) |
| Adapter Size | ~10-50MB |
π Quick Start
Installation
pip install transformers peft torch accelerate
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
base_model_id = "unsloth/Qwen2.5-7B"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, "DangIT02/qwen-msmarco-rag")
# Generate response
def generate_answer(question, context=""):
if context:
prompt = f"Context: {context}\n\nQuestion: {question}\n\nAnswer:"
else:
prompt = f"Question: {question}\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=512,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("Answer:")[-1].strip()
# Example usage
question = "What is machine learning?"
context = "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data."
answer = generate_answer(question, context)
print(answer)
RAG Pipeline Integration
def rag_pipeline(question, knowledge_base):
"""
Complete RAG pipeline with retrieval and generation
"""
# Step 1: Retrieval (add your retrieval method)
retrieved_passages = retrieve_passages(question, knowledge_base)
# Step 2: Generation using fine-tuned model
context = "\n\n".join([f"Document {i+1}:\n{p}" for i, p in enumerate(retrieved_passages)])
return generate_answer(question, context)
# Example RAG usage
def retrieve_passages(query, knowledge_base):
# Implement your retrieval (vector search, BM25, etc.)
# This is just a placeholder
return ["Retrieved passage 1...", "Retrieved passage 2..."]
question = "How does photosynthesis work?"
answer = rag_pipeline(question, your_knowledge_base)
print(answer)
π Use Cases
- π RAG Systems: Perfect for retrieval-augmented generation pipelines
- π Document QA: Answer questions based on retrieved documents
- π¬ Knowledge-based Chat: Build chatbots with external knowledge integration
- π Enterprise Search: Intelligent search with contextual answers
- π Educational AI: Learning assistants with document retrieval capabilities
π― Performance & Capabilities
This model has been specifically optimized using RAFT approach for:
- β RAG Integration: Seamless integration into RAG pipelines
- β Context-aware Generation: Excellent understanding of retrieved passages
- β Query-passage Alignment: Strong alignment between questions and relevant context
- β Multi-document Processing: Effective handling of multiple retrieved passages
- β Domain Adaptability: Robust performance across different knowledge domains
βοΈ Training Configuration
Base Model: unsloth/Qwen2.5-7B
Dataset: microsoft/ms_marco (v1.1) - RAFT format
Method: LoRA + RAFT Fine-tuning
Training Steps: 9,128
Hardware: RTX 5090 32GB
Model Config:
- Max Sequence Length: 24,576 tokens (24K)
- Data Type: bfloat16
- Load in 4bit: False
LoRA Config:
- Rank: 64
- Alpha: 128
- Dropout: 0.1
- Target Modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
- Use RSLoRA: True
- Gradient Checkpointing: unsloth
Training Config:
- Epochs: 2
- Batch Size: 2
- Gradient Accumulation: 8 (Effective batch: 16)
- Learning Rate: 1e-4
- Weight Decay: 0.01
- Warmup Ratio: 0.1
- Scheduler: cosine
- Optimizer: adamw_torch
- Dataset Size: 75,000 samples
β Model Performance & Capabilities
Optimized for MS MARCO Reading Comprehension Tasks:
- π Excellent Information Retrieval: Outstanding at extracting specific facts from provided passages
- π Multi-passage Processing: Handles multiple documents and selects relevant information effectively
- π― Context-based QA: Accurately answers questions when relevant context is provided
- π§ Factual Extraction: Precise extraction of names, dates, statistics, and specific details
- β‘ RAG Integration: Seamless integration into retrieval-augmented generation pipelines
Demonstrated Strengths:
- Passage Comprehension: 9/10 accuracy on reading comprehension tasks
- Information Extraction: 8.5/10 on factual data retrieval
- Multi-document Analysis: 8/10 on handling multiple context sources
- Query-passage Alignment: Strong understanding of question-context relationships
β οΈ Task Scope & Appropriate Use
Primary Use Case: Document-based Question Answering
This model has been specifically fine-tuned for MS MARCO-style reading comprehension and performs best when:
β Ideal scenarios:
- Questions have answers within provided context
- Factual information extraction from documents
- Reading comprehension with clear passages
- Multi-document information synthesis
- Enterprise knowledge base queries
Outside Training Scope:
β Not optimized for:
- Mathematical computations (use specialized math models)
- Open-domain generation without context (use general LLMs)
- Creative writing tasks (use creative-focused models)
- Code generation (use code-specific models)
- Scientific calculations (use domain-specific tools)
π― Performance Expectations
When used appropriately (RAG/Reading Comprehension):
- High accuracy factual extraction
- Reliable information synthesis
- Consistent context-aware responses
- Minimal hallucination with relevant context
When used outside scope:
- May produce suboptimal results for non-reading tasks
- Not designed for mathematical or computational reasoning
- Best combined with specialized tools for multi-modal tasks
π‘ Integration Recommendations
Production RAG Pipeline:
def optimal_usage(query, documents):
# 1. Ensure documents contain relevant information
relevant_docs = filter_relevant_documents(query, documents)
# 2. Use model for what it's trained for
if has_factual_content(relevant_docs):
answer = model.generate(query, relevant_docs)
return answer
else:
return "No relevant information found in provided documents"
Task-specific Routing:
- Reading comprehension β This model β
- Mathematical queries β Route to math-specialized models
- Creative tasks β Route to general-purpose LLMs
- Code questions β Route to code-specialized models
π Citation
@misc{qwen-raft-rag-2025,
author = {DangIT02},
title = {Qwen2.5-7B Fine-tuned for RAG using RAFT Approach},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DangIT02/qwen-msmarco-rag}
}
π Contact & Support
- Author: DangIT02
- Issues: Please report issues on the model repository
- Base Model: unsloth/Qwen2.5-7B
Trained for 12 hours on RTX 5090 with blood, sweat, and GPU rental bills πΈ
- Downloads last month
- 1