CodeK v3 β€” Qwen2.5-Coder-7B LoRA

A LoRA adapter fine-tuned on CodeK, a synthetic dataset of Python programming tasks written in the style of Andrej Karpathy's open-source code. The model is trained to reason carefully about code: explaining implementations, diagnosing bugs, contrasting correct vs. incorrect versions, and generating multi-hypothesis debugging chains.

Best checkpoint: checkpoint-800 (eval loss: 0.5888)


Model Details

Field Value
Base model Qwen/Qwen2.5-Coder-7B-Instruct
Adapter type LoRA (rank 16, alpha 32, RSLoRA)
Target modules q/k/v/o proj, gate/up/down proj
Training tokens response-only (prompt tokens masked)
Best checkpoint checkpoint-800
Eval loss 0.5888
Training hardware NVIDIA A100 80GB SXM4

Training Data

The CodeK v3 dataset combines v2 (398 seeds) and v3 (161 seeds) augmentation pipelines for a total of 559 unique Python tasks across 9 categories:

  • Data structures, algorithms, graphs, dynamic programming
  • Numerical methods, parsing, concurrency, bit manipulation, compression

Each seed is augmented across up to 5 passes:

Pass Type Description
Pass 1 Reasoning Step-by-step explanation of the correct implementation
Pass 2 Debugging Single-line surgical bug + model diagnosis (via Codex, 100% coverage)
Pass 3 Contrast Correct vs. incorrect comparison with explanation
Pass 4 Research loop Multi-turn investigation of the implementation
Pass 5 Multi-hypothesis Competing bug hypotheses, ranked by plausibility

Training split: 6,757 pairs (504 seed-level train tasks) Validation split: 728 pairs (55 seed-level held-out tasks, zero task overlap with train)

Key improvements over v2 model

  • Seed-level val split β€” validation set has no task overlap with training (eval loss is meaningful)
  • Response-only loss β€” prompt tokens masked; model only trained on assistant responses
  • Pass 5 β€” multi-hypothesis bug reasoning signal (new in v3)
  • Pass 2 via Codex β€” 100% pass 2 coverage with sharper change_token annotations
  • change_token field β€” targets the change_hit failure mode from the v1/v2 evals

Evaluation

Ground-truth Pass 2 eval on 50 held-out v1 seeds (same seeds used across all versions for apples-to-apples comparison). A prediction passes if it correctly identifies both the function containing the bug and the nature of the change.

Version Dataset LoRA Pass@1 Base Pass@1
v0 201 seeds, 4 passes 58% 64%
v1 398 seeds, 4 passes 60% 62%
v3 559 seeds, 5 passes pending pending

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "mechramc/codek-qwen2.5-coder-7b-lora-v3"

tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

messages = [
    {"role": "system", "content": "You are a Python debugging expert. When shown code with a bug, identify the exact location and nature of the bug. Be precise and concise."},
    {"role": "user", "content": "The following Python code has a subtle bug. Find it.\n\n```python\ndef binary_search(arr, target):\n    lo, hi = 0, len(arr) - 1\n    while lo <= hi:\n        mid = (lo + hi) // 2\n        if arr[mid] == target:\n            return mid\n        elif arr[mid] < target:\n            lo = mid\n        else:\n            hi = mid - 1\n    return -1\n```"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Framework Versions

  • PEFT: 0.18.1
  • TRL: 0.24.0
  • Transformers: 5.5.0
  • PyTorch: 2.6.0
  • Unsloth: 2026.4.1
  • CUDA: 12.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mechramc/codek-qwen2.5-coder-7b-lora-v3

Base model

Qwen/Qwen2.5-7B
Adapter
(610)
this model

Dataset used to train mechramc/codek-qwen2.5-coder-7b-lora-v3