archit11 commited on
Commit
fba88f9
·
verified ·
1 Parent(s): faa4ba9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +26 -48
README.md CHANGED
@@ -1,61 +1,39 @@
1
  ---
2
  base_model: Qwen/Qwen2.5-Coder-1.5B
3
- library_name: peft
4
- model_name: track_b_sft
5
  tags:
6
- - base_model:adapter:Qwen/Qwen2.5-Coder-1.5B
7
- - lora
8
- - sft
9
- - transformers
10
- - trl
11
- licence: license
12
- pipeline_tag: text-generation
13
  ---
14
 
15
- # Model Card for track_b_sft
16
 
17
- This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B).
18
- It has been trained using [TRL](https://github.com/huggingface/trl).
19
 
20
- ## Quick start
21
 
22
- ```python
23
- from transformers import pipeline
24
-
25
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
26
- generator = pipeline("text-generation", model="None", device="cuda")
27
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
28
- print(output["generated_text"])
29
- ```
30
-
31
- ## Training procedure
32
-
33
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/dumbal/huggingface/runs/xwrn72zo)
34
 
 
35
 
36
- This model was trained with SFT.
37
-
38
- ### Framework versions
39
-
40
- - PEFT 0.18.1
41
- - TRL: 0.28.0
42
- - Transformers: 5.2.0
43
- - Pytorch: 2.9.1
44
- - Datasets: 4.5.0
45
- - Tokenizers: 0.22.2
46
-
47
- ## Citations
48
 
 
49
 
 
 
 
50
 
51
- Cite TRL as:
52
-
53
- ```bibtex
54
- @software{vonwerra2020trl,
55
- title = {{TRL: Transformers Reinforcement Learning}},
56
- author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
57
- license = {Apache-2.0},
58
- url = {https://github.com/huggingface/trl},
59
- year = {2020}
60
- }
61
- ```
 
1
  ---
2
  base_model: Qwen/Qwen2.5-Coder-1.5B
 
 
3
  tags:
4
+ - lora
5
+ - sft
6
+ - code
7
+ - python
8
+ - instruction-tuning
9
+ license: apache-2.0
 
10
  ---
11
 
12
+ # Track B SFT – Qwen2.5-Coder-1.5B + LoRA
13
 
14
+ Fine-tuned on ~250 synthetic coding instruction pairs generated from the [verl](https://github.com/volcengine/verl) corpus.
 
15
 
16
+ ## Results
17
 
18
+ | Metric | Baseline | Post-SFT | Δ |
19
+ |--------|----------|----------|---|
20
+ | pass@1 | 0.565 | **0.804** | +0.239 |
21
+ | pass@3 | 0.783 | 0.848 | +0.065 |
 
 
 
 
 
 
 
 
22
 
23
+ ## Training
24
 
25
+ - **Base model:** `Qwen/Qwen2.5-Coder-1.5B`
26
+ - **Method:** LoRA (r=16, alpha=32)
27
+ - **Data:** `archit11/track_b_sft` (~257 train examples)
28
+ - **Epochs:** 3, **LR:** 2e-4, **Hardware:** T4 GPU
 
 
 
 
 
 
 
 
29
 
30
+ ## Usage
31
 
32
+ ```python
33
+ from peft import PeftModel
34
+ from transformers import AutoModelForCausalLM, AutoTokenizer
35
 
36
+ base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-1.5B")
37
+ model = PeftModel.from_pretrained(base, "archit11/track_b_sft_model").merge_and_unload()
38
+ tokenizer = AutoTokenizer.from_pretrained("archit11/track_b_sft_model")
39
+ ```