---
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
tags:
- lora
- qlora
- peft
- qwen2.5
- mcp
- edge-ai
- offline-rag
---

# EdgeAI Docs Qwen2.5 Coder 7B Instruct (LoRA Adapter)

This repository contains a **LoRA adapter** (not full model weights) trained for an offline Edge AI + MCP documentation assistant workflow.

Base model:
- `Qwen/Qwen2.5-Coder-7B-Instruct`

## Intended use

- Use this adapter with a local RAG pipeline.
- Keep retrieval output as the factual source.
- Use the adapter for response behavior: format, citation style, and grounded answering.

## Training summary

- Train examples: `115`
- Eval examples: `13`
- Max steps: `30`
- Precision/load strategy: `QLoRA 4-bit (NF4), bf16 compute`
- Final eval loss: `0.0641`
- Device: `cuda` (8GB VRAM class local GPU profile)

## Files

- `adapter_model.safetensors`: trained LoRA adapter weights
- `adapter_config.json`: PEFT adapter config
- `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`: tokenizer/chat formatting assets
- `run_summary.json`, `trainer_train_metrics.json`, `training_args.bin`: training metadata/artifacts

## Quick start

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_repo = "eoinedge/EdgeAI-Docs-Qwen2.5-Coder-7B-Instruct"

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_repo)
tokenizer = AutoTokenizer.from_pretrained(base_model)
```

## Notes

- This adapter is optimized for docs-assistant behavior, not as a standalone factual memory.
- For best results, pair with MCP tools + document retrieval context.