aadityabuilds commited on
Commit
8f19b4b
·
verified ·
1 Parent(s): 238a8e5

Update model card with KernelBook post-training description

Browse files
Files changed (1) hide show
  1. README.md +52 -55
README.md CHANGED
@@ -1,75 +1,72 @@
1
  ---
2
  library_name: transformers
 
 
3
  tags:
4
- - generated_from_trainer
5
- model-index:
6
- - name: qwen2-5-coder-7b-kernelbook-sdft
7
- results: []
 
 
 
 
 
 
 
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
- should probably proofread and complete it, then remove this comment. -->
12
 
13
- # qwen2-5-coder-7b-kernelbook-sdft
14
 
15
- This model was trained from scratch on the None dataset.
16
- It achieves the following results on the evaluation set:
17
- - Loss: 0.0272
18
 
19
- ## Model description
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
 
 
24
 
25
- More information needed
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
30
 
31
- ## Training procedure
 
32
 
33
- ### Training hyperparameters
 
 
 
 
34
 
35
- The following hyperparameters were used during training:
36
- - learning_rate: 5e-06
37
- - train_batch_size: 1
38
- - eval_batch_size: 1
39
- - seed: 42
40
- - distributed_type: multi-GPU
41
- - num_devices: 4
42
- - gradient_accumulation_steps: 2
43
- - total_train_batch_size: 8
44
- - total_eval_batch_size: 4
45
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
46
- - lr_scheduler_type: cosine
47
- - lr_scheduler_warmup_steps: 0.03
48
- - num_epochs: 1.0
49
 
50
- ### Training results
51
 
52
- | Training Loss | Epoch | Step | Validation Loss |
53
- |:-------------:|:------:|:----:|:---------------:|
54
- | 0.0646 | 0.0756 | 100 | 0.0782 |
55
- | 0.0499 | 0.1512 | 200 | 0.0564 |
56
- | 0.0459 | 0.2268 | 300 | 0.0496 |
57
- | 0.0402 | 0.3025 | 400 | 0.0421 |
58
- | 0.0386 | 0.3781 | 500 | 0.0394 |
59
- | 0.0417 | 0.4537 | 600 | 0.0357 |
60
- | 0.0290 | 0.5293 | 700 | 0.0332 |
61
- | 0.0266 | 0.6049 | 800 | 0.0310 |
62
- | 0.0242 | 0.6805 | 900 | 0.0298 |
63
- | 0.0194 | 0.7561 | 1000 | 0.0287 |
64
- | 0.0222 | 0.8318 | 1100 | 0.0277 |
65
- | 0.0213 | 0.9074 | 1200 | 0.0273 |
66
- | 0.0255 | 0.9830 | 1300 | 0.0272 |
67
- | 0.0190 | 1.0 | 1323 | 0.0272 |
68
 
 
69
 
70
- ### Framework versions
71
-
72
- - Transformers 5.9.0
73
- - Pytorch 2.11.0+cu128
74
- - Datasets 4.8.5
75
- - Tokenizers 0.22.2
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
5
  tags:
6
+ - triton
7
+ - kernelbook
8
+ - code-generation
9
+ - self-distillation
10
+ - sdft
11
+ - text-generation
12
+ datasets:
13
+ - custom
14
+ language:
15
+ - en
16
+ pipeline_tag: text-generation
17
  ---
18
 
19
+ # Qwen2.5-Coder-7B KernelBook SDFT
 
20
 
21
+ **Self-Distillation Fine-Tuning (SDFT)** checkpoint of [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct), post-trained on the **KernelBook** Triton kernel dataset.
22
 
23
+ ## Method
 
 
24
 
25
+ This model was trained with **SDFT** (self-distillation fine-tuning): the student sees the user prompt plus privileged reference context (the target Triton implementation) and learns to reproduce the reference completion via forced-completion distillation (cross-entropy + KL on completion tokens). Training used a custom `KernelBookSDFTTrainer` on top of `transformers.Trainer` with DeepSpeed ZeRO-3.
26
 
27
+ ## Dataset
28
 
29
+ - **KernelBook** PyTorch module prompts paired with reference Triton kernels
30
+ - Deduplicated, filtered to completions ≤4096 tokens, repo-stratified 80/10/10 split
31
+ - **1 training epoch** on the KernelBook train split
32
 
33
+ ## Intended use
34
 
35
+ Generate Triton GPU kernels from PyTorch-style module descriptions. Best for KernelBook-style conversion prompts; not evaluated as a general-purpose chat or reasoning model.
36
 
37
+ ## Quick start
38
 
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer
41
 
42
+ model_id = "aadityabuilds/qwen2-5-coder-7b-kernelbook-sdft"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
46
+ )
47
 
48
+ messages = [
49
+ {
50
+ "role": "user",
51
+ "content": "Convert the following PyTorch code to an equivalent Triton kernel...",
52
+ }
53
+ ]
54
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
55
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
56
+ outputs = model.generate(**inputs, max_new_tokens=1200, do_sample=False)
57
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1] :], skip_special_tokens=True))
58
+ ```
 
 
 
59
 
60
+ ## Training summary
61
 
62
+ | Setting | Value |
63
+ |---------|-------|
64
+ | Base model | Qwen2.5-Coder-7B-Instruct |
65
+ | Method | SDFT (forced-completion distillation) |
66
+ | Epochs | 1 |
67
+ | Hardware | H100 (Modal) |
68
+ | Parallelism | DeepSpeed ZeRO-3, bf16 |
 
 
 
 
 
 
 
 
 
69
 
70
+ ## Limitations
71
 
72
+ Specialized for KernelBook Triton codegen. May show reduced performance on general coding, math, and knowledge benchmarks compared to the base instruct model.