--- base_model: Qwen/Qwen2.5-Coder-7B-Instruct library_name: peft pipeline_tag: text-generation license: apache-2.0 tags: - lora - qlora - peft - qwen2.5 - mcp - edge-ai - offline-rag --- # EdgeAI Docs Qwen2.5 Coder 7B Instruct (LoRA Adapter) This repository contains a **LoRA adapter** (not full model weights) trained for an offline Edge AI + MCP documentation assistant workflow. Base model: - `Qwen/Qwen2.5-Coder-7B-Instruct` ## Intended use - Use this adapter with a local RAG pipeline. - Keep retrieval output as the factual source. - Use the adapter for response behavior: format, citation style, and grounded answering. ## Training summary - Train examples: `115` - Eval examples: `13` - Max steps: `30` - Precision/load strategy: `QLoRA 4-bit (NF4), bf16 compute` - Final eval loss: `0.0641` - Device: `cuda` (8GB VRAM class local GPU profile) ## Files - `adapter_model.safetensors`: trained LoRA adapter weights - `adapter_config.json`: PEFT adapter config - `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`: tokenizer/chat formatting assets - `run_summary.json`, `trainer_train_metrics.json`, `training_args.bin`: training metadata/artifacts ## Quick start ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel base_model = "Qwen/Qwen2.5-Coder-7B-Instruct" adapter_repo = "eoinedge/EdgeAI-Docs-Qwen2.5-Coder-7B-Instruct" bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16, ) model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=bnb, device_map="auto", ) model = PeftModel.from_pretrained(model, adapter_repo) tokenizer = AutoTokenizer.from_pretrained(base_model) ``` ## Notes - This adapter is optimized for docs-assistant behavior, not as a standalone factual memory. - For best results, pair with MCP tools + document retrieval context.