archit11/qwen2.5-coder-3b-hyperswitch-track-a-lora

LoRA adapter trained for repository-specific extended pretraining on hyperswitch source code.

Model details

Base model: Qwen/Qwen2.5-Coder-3B
Fine-tuning method: LoRA (r=16)
Training corpus: https://huggingface.co/datasets/archit11/hyperswitch-code-corpus-track-a
Split strategy: file-level train/validation/test split
Sequence curriculum: [768, 1024, 1536]
Effective learning rate: 0.001
Batch size: 1
Gradient accumulation: 8

Evaluation summary

Baseline perplexity (primary): 2.2832
Post-training perplexity (primary): 1.5429
Perplexity reduction: 0.7403 (32.42%)

Usage

This repo stores adapter weights and tokenizer artifacts. Load it with PEFT on top of the base model.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = "Qwen/Qwen2.5-Coder-3B"
adapter = "archit11/qwen2.5-coder-3b-hyperswitch-track-a-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True)
model = PeftModel.from_pretrained(model, adapter)

Downloads last month: 1

Model tree for archit11/qwen2.5-coder-3b-hyperswitch-track-a-lora

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Adapter

(17)

this model