PyTorch-RAG LoRA adapter

LoRA adapter for Qwen/Qwen2.5-Coder-7B-Instruct, fine-tuned for a retrieval-augmented QA system over PyTorch documentation + StackOverflow.

Part of an HSE university project on RAG over a private knowledge base.

Training summary

  • Base model: Qwen/Qwen2.5-Coder-7B-Instruct
  • Method: LoRA (PEFT), bf16, RAG-aware SFT
  • Rank / alpha / dropout: 16 / 32 / 0.05
  • Target modules: down_proj, gate_proj, k_proj, o_proj, q_proj, up_proj, v_proj
  • Trainable params: ~40M (0.53% of 7.66B)
  • Dataset: ~1.8k StackOverflow PyTorch Q&A pairs, each enriched with top-k retrieved documentation chunks as context, plus ~15% adversarial "cannot answer from context" examples.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct", dtype="bfloat16", device_map="auto"
)
model = PeftModel.from_pretrained(base, "akeakeaki/pytorch-rag-lora-synth-v2")
tok = AutoTokenizer.from_pretrained("akeakeaki/pytorch-rag-lora-synth-v2")

messages = [
    {"role": "system", "content": "You are an expert PyTorch assistant. "
      "Answer using ONLY the provided Context."},
    {"role": "user", "content": "Context:\n<retrieved chunks>\n\nQuestion: <q>"},
]
inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=400)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Serving with vLLM

vllm serve Qwen/Qwen2.5-Coder-7B-Instruct \
    --enable-lora \
    --lora-modules pytorch-rag=akeakeaki/pytorch-rag-lora-synth-v2 \
    --max-lora-rank 16

Note on results

This is a research artifact. In our evaluation the v1 adapter did not outperform the base model on the RAG task (composite RAG score dropped vs the base-vanilla baseline) — most likely due to a stylistic shift toward terse StackOverflow answers and limited training data. See the project report for the full analysis. Use as a baseline / starting point.

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akeakeaki/pytorch-rag-lora-synth-v2

Base model

Qwen/Qwen2.5-7B
Adapter
(685)
this model