mechramc
/

codek-qwen2.5-coder-7b-lora-v2

@@ -21,9 +21,34 @@ datasets:
 # CodeK LoRA v1 -- Qwen2.5-Coder-7B-Instruct
 A LoRA adapter fine-tuned on the **CodeK v2** dataset: a reasoning-first, pedagogical
-coding dataset with ~2x the seeds of v1. Teaches decomposition, bug diagnosis, contrast
 reasoning, and hypothesis-driven thinking about code.
 ## Training
 | Setting | Value |
@@ -53,27 +78,15 @@ reasoning, and hypothesis-driven thinking about code.
 | 600  | 0.0747 |
 | 700  | 0.0747 |
 | 800  | 0.0689 |
-| **900** | **0.0664 ← best** |
 | 1000 | 0.0755 |
 | 1100 | 0.0765 |
 | 1200 | 0.0761 |
-| 1300 | 0.0767 |
-Best checkpoint (step 900) was rotated out by save_total_limit=3.
-Checkpoint-1300 used for eval (eval loss 0.077).
-## v0 Baseline Comparison
-| Model | Train pairs | Best eval loss | Pass@1 (bug diagnosis) |
-|-------|------------|---------------|----------------------|
-| CodeK LoRA v0 (checkpoint-800) | 2,351 | 0.0583 | 58% |
-| **CodeK LoRA v1 (checkpoint-1300)** | **4,567** | **0.0664** | **TBD** |
-Pass@1 eval pending. See [CodeK LoRA v0](https://huggingface.co/mechramc/codek-qwen2.5-coder-7b-lora) for baseline analysis.
 ## Dataset
-mechramc/codek-v2 (coming soon) --
 398 seeds, 4 augmentation passes, 5,075 ShareGPT pairs.
 Categories: data structures, algorithms, ML fundamentals, NN components,
 training infra, utilities, numerical, parsing.
@@ -88,3 +101,7 @@ base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
 model = PeftModel.from_pretrained(base, "mechramc/codek-qwen2.5-coder-7b-lora-v2")
 tokenizer = AutoTokenizer.from_pretrained("mechramc/codek-qwen2.5-coder-7b-lora-v2")
 ```

 # CodeK LoRA v1 -- Qwen2.5-Coder-7B-Instruct
 A LoRA adapter fine-tuned on the **CodeK v2** dataset: a reasoning-first, pedagogical
+coding dataset with ~2x the seeds of v0. Teaches decomposition, bug diagnosis, contrast
 reasoning, and hypothesis-driven thinking about code.
+## Eval Results (Pass 2 ground-truth, 50 seeds)
+| Model | Pass@1 | vs v0 |
+|-------|--------|-------|
+| Base (Qwen2.5-Coder-7B-Instruct) | 62% | -2% |
+| **LoRA v1 (checkpoint-1300)** | **60%** | **+2%** |
+The regression gap vs base model closed from **-6% (v0)** to **-2% (v1)**.
+Evaluated on the same 50 seeds as v0 for direct comparison.
+Note: best checkpoint (step 900, eval loss 0.0664) was rotated out during training
+(save_total_limit=3). checkpoint-1300 (eval loss 0.077) used instead. True best
+checkpoint would likely score 62–64%.
+## v0 → v1 Comparison
+| | v0 | v1 |
+|--|----|----|
+| Dataset | codek-v1 (201 seeds) | codek-v2 (398 seeds) |
+| Train pairs | 2,351 | 4,567 |
+| Best eval loss | 0.0583 | 0.0664 (best surviving: 0.077) |
+| LoRA Pass@1 | 58% | **60%** |
+| Base Pass@1 | 64% | 62% |
+| Gap (LoRA vs base) | -6% | **-2%** |
 ## Training
 | Setting | Value |
 | 600  | 0.0747 |
 | 700  | 0.0747 |
 | 800  | 0.0689 |
+| **900** | **0.0664 ← best (rotated out)** |
 | 1000 | 0.0755 |
 | 1100 | 0.0765 |
 | 1200 | 0.0761 |
+| 1300 | 0.0767 ← used for eval |
 ## Dataset
+[mechramc/codek-v2](https://huggingface.co/datasets/mechramc/codek-v2) (coming soon) --
 398 seeds, 4 augmentation passes, 5,075 ShareGPT pairs.
 Categories: data structures, algorithms, ML fundamentals, NN components,
 training infra, utilities, numerical, parsing.
 model = PeftModel.from_pretrained(base, "mechramc/codek-qwen2.5-coder-7b-lora-v2")
 tokenizer = AutoTokenizer.from_pretrained("mechramc/codek-qwen2.5-coder-7b-lora-v2")
 ```
+## Links
+- [v0 adapter (baseline)](https://huggingface.co/mechramc/codek-qwen2.5-coder-7b-lora)