mechramc commited on
Commit
e40488c
·
verified ·
1 Parent(s): 26b23e6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +78 -63
README.md CHANGED
@@ -1,63 +1,78 @@
1
- ---
2
- base_model: unsloth/Qwen2.5-Coder-7B-Instruct
3
- library_name: peft
4
- model_name: codek-lora
5
- tags:
6
- - base_model:adapter:unsloth/Qwen2.5-Coder-7B-Instruct
7
- - lora
8
- - sft
9
- - transformers
10
- - trl
11
- - unsloth
12
- licence: license
13
- pipeline_tag: text-generation
14
- ---
15
-
16
- # Model Card for codek-lora
17
-
18
- This model is a fine-tuned version of [unsloth/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/unsloth/Qwen2.5-Coder-7B-Instruct).
19
- It has been trained using [TRL](https://github.com/huggingface/trl).
20
-
21
- ## Quick start
22
-
23
- ```python
24
- from transformers import pipeline
25
-
26
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
27
- generator = pipeline("text-generation", model="None", device="cuda")
28
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
29
- print(output["generated_text"])
30
- ```
31
-
32
- ## Training procedure
33
-
34
-
35
-
36
-
37
- This model was trained with SFT.
38
-
39
- ### Framework versions
40
-
41
- - PEFT 0.18.1
42
- - TRL: 0.23.1
43
- - Transformers: 4.57.6
44
- - Pytorch: 2.10.0+cu128
45
- - Datasets: 4.3.0
46
- - Tokenizers: 0.22.2
47
-
48
- ## Citations
49
-
50
-
51
-
52
- Cite TRL as:
53
-
54
- ```bibtex
55
- @misc{vonwerra2022trl,
56
- title = {{TRL: Transformer Reinforcement Learning}},
57
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
58
- year = 2020,
59
- journal = {GitHub repository},
60
- publisher = {GitHub},
61
- howpublished = {\url{https://github.com/huggingface/trl}}
62
- }
63
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - lora
7
+ - peft
8
+ - qwen2.5
9
+ - qwen2.5-coder
10
+ - code
11
+ - reasoning
12
+ - pedagogy
13
+ - fine-tuned
14
+ language:
15
+ - en
16
+ license: apache-2.0
17
+ datasets:
18
+ - mechramc/codek-v1
19
+ ---
20
+
21
+ # CodeK LoRA -- Qwen2.5-Coder-7B-Instruct
22
+
23
+ A LoRA adapter fine-tuned on the **CodeK v1** dataset: a reasoning-first, pedagogical coding dataset in Karpathy's register.
24
+
25
+ ## What it does
26
+
27
+ Teaches the model to reason like an experienced engineer -- not just write code, but:
28
+ - Decompose problems before implementing
29
+ - Explain *why* a solution is the right one
30
+ - Identify and diagnose subtle bugs
31
+ - Contrast clean vs clunky implementations with precision
32
+ - Apply a hypothesis->experiment->evaluate loop to code
33
+
34
+ ## Training
35
+
36
+ | Setting | Value |
37
+ |---------|-------|
38
+ | Base model | `Qwen/Qwen2.5-Coder-7B-Instruct` |
39
+ | Method | LoRA (RS-LoRA) |
40
+ | Rank | 16, Alpha 32 |
41
+ | Dropout | 0.05 |
42
+ | Epochs | 3 |
43
+ | Batch size | 2 (effective 8 with grad accum) |
44
+ | Learning rate | 2e-4 |
45
+ | Train pairs | 2,351 |
46
+ | Val pairs | 262 |
47
+ | Final eval loss | 0.0600 |
48
+ | Hardware | RunPod A100 80GB |
49
+ | Training time | 59 minutes |
50
+
51
+ ## Dataset
52
+
53
+ [mechramc/codek-v1](https://huggingface.co/datasets/mechramc/codek-v1) -- 201 seeds, 2,613 ShareGPT pairs across 4 augmentation passes:
54
+
55
+ - **Pass 1** -- Reasoning decomposition (intuition, plan, why this solution, alternatives)
56
+ - **Pass 2** -- Debugging (introduce + diagnose a subtle one-line bug)
57
+ - **Pass 3** -- Contrast (clunky vs clean, with specific explanation)
58
+ - **Pass 4** -- Research loop (hypothesis, minimal test, success metric, simplicity check, abandon condition)
59
+
60
+ ## Usage
61
+
62
+ ```python
63
+ from peft import PeftModel
64
+ from transformers import AutoModelForCausalLM, AutoTokenizer
65
+
66
+ base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
67
+ model = PeftModel.from_pretrained(base, "mechramc/codek-qwen2.5-coder-7b-lora")
68
+ tokenizer = AutoTokenizer.from_pretrained("mechramc/codek-qwen2.5-coder-7b-lora")
69
+ ```
70
+
71
+ Or with Unsloth (2-5x faster inference):
72
+ ```python
73
+ from unsloth import FastLanguageModel
74
+ model, tokenizer = FastLanguageModel.from_pretrained(
75
+ "mechramc/codek-qwen2.5-coder-7b-lora",
76
+ max_seq_length=4096,
77
+ )
78
+ ```