usama10 commited on
Commit
4f224ef
·
verified ·
1 Parent(s): c2991e4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +94 -38
README.md CHANGED
@@ -1,58 +1,114 @@
1
  ---
 
2
  base_model: Qwen/Qwen2.5-7B-Instruct
3
- library_name: transformers
4
- model_name: qwen-7b-code-instruct
5
  tags:
6
- - generated_from_trainer
7
- - sft
8
- - trl
9
- licence: license
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- # Model Card for qwen-7b-code-instruct
13
 
14
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
- ## Quick start
18
 
19
- ```python
20
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="usama10/qwen-7b-code-instruct", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
26
- ```
 
 
 
 
27
 
28
- ## Training procedure
29
 
30
-
 
 
31
 
 
32
 
 
33
 
34
- This model was trained with SFT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ### Framework versions
37
 
38
- - TRL: 0.29.1
39
- - Transformers: 5.3.0
40
- - Pytorch: 2.10.0+cu128
41
- - Datasets: 4.8.3
42
- - Tokenizers: 0.22.2
43
 
44
- ## Citations
 
 
45
 
 
46
 
 
47
 
48
- Cite TRL as:
49
-
50
- ```bibtex
51
- @software{vonwerra2020trl,
52
- title = {{TRL: Transformers Reinforcement Learning}},
53
- author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
54
- license = {Apache-2.0},
55
- url = {https://github.com/huggingface/trl},
56
- year = {2020}
57
- }
58
- ```
 
1
  ---
2
+ license: apache-2.0
3
  base_model: Qwen/Qwen2.5-7B-Instruct
 
 
4
  tags:
5
+ - code
6
+ - code-generation
7
+ - sft
8
+ - lora
9
+ - qwen
10
+ - programming
11
+ datasets:
12
+ - TokenBender/code_instructions_122k_alpaca_style
13
+ pipeline_tag: text-generation
14
+ model-index:
15
+ - name: qwen-7b-code-instruct
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Code Generation
20
+ dataset:
21
+ name: Code Instructions 122K
22
+ type: TokenBender/code_instructions_122k_alpaca_style
23
+ split: train
24
+ metrics:
25
+ - type: loss
26
+ value: 0.507
27
+ name: Final Training Loss
28
  ---
29
 
30
+ # Qwen2.5-7B Code Instruct
31
 
32
+ A **Qwen2.5-7B-Instruct** model fine-tuned with **SFT + LoRA** on [122K code instructions](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style) covering 40+ programming languages. The model generates clean, correct code from natural language descriptions.
 
33
 
34
+ ## Training Details
35
 
36
+ | Parameter | Value |
37
+ |-----------|-------|
38
+ | **Base model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
39
+ | **Method** | SFT with LoRA (r=32, alpha=64) |
40
+ | **Quantization** | None (full bf16) |
41
+ | **Dataset** | [TokenBender/code_instructions_122k_alpaca_style](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style) |
42
+ | **Training examples** | 119,519 |
43
+ | **Hardware** | NVIDIA RTX 5090 (32GB VRAM) |
44
+ | **Training time** | ~3.3 hours |
45
+ | **Epochs** | 1 |
46
+ | **Effective batch size** | 16 (4 per device x 4 gradient accumulation) |
47
+ | **Learning rate** | 2e-5 (cosine schedule, 100 warmup steps) |
48
+ | **Max sequence length** | 1,024 tokens |
49
+ | **Precision** | bf16 |
50
+ | **Framework** | TRL 0.29.1 + Transformers 5.3.0 |
51
 
52
+ ## Performance
53
+
54
+ | Metric | Value |
55
+ |--------|-------|
56
+ | **Starting loss** | 2.10 |
57
+ | **Final loss** | **0.46** |
58
+ | **Loss reduction** | 78% |
59
+
60
+ ## Training Curves
61
 
62
+ ![Training Metrics](code_training_metrics_plots.png)
63
 
64
+ - **Training Loss**: Sharp drop from 2.1 to ~0.5 within the first 200 steps, then continued gradual improvement
65
+ - **Learning Rate**: Cosine decay from 2e-5 to 0
66
+ - **Gradient Norm**: Stable around 1.0 throughout training
67
 
68
+ ## Languages Covered
69
 
70
+ The training dataset spans 40+ programming languages including Python, JavaScript, Java, C++, C#, Go, Rust, TypeScript, SQL, Ruby, PHP, Swift, Kotlin, R, Bash, and more.
71
 
72
+ ## Usage
73
+
74
+ ```python
75
+ from transformers import AutoModelForCausalLM, AutoTokenizer
76
+ from peft import PeftModel
77
+ import torch
78
+
79
+ base_model = AutoModelForCausalLM.from_pretrained(
80
+ "Qwen/Qwen2.5-7B-Instruct",
81
+ torch_dtype=torch.bfloat16,
82
+ device_map="auto",
83
+ )
84
+ model = PeftModel.from_pretrained(base_model, "usama10/qwen-7b-code-instruct")
85
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
86
+
87
+ messages = [
88
+ {"role": "system", "content": "You are an expert programmer. Given a programming task, write clean, correct, and well-commented code."},
89
+ {"role": "user", "content": "Write a Python function that finds the longest common subsequence of two strings."},
90
+ ]
91
+
92
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
93
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
94
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
95
+ print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
96
+ ```
97
 
98
+ ## Dataset
99
 
100
+ The [code_instructions_122k_alpaca_style](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style) dataset contains 122K instruction-output pairs in Alpaca format. Each example has:
 
 
 
 
101
 
102
+ - **instruction**: A natural language description of the coding task
103
+ - **input**: Optional context or additional information
104
+ - **output**: The expected code solution
105
 
106
+ Examples range from simple utility functions to complex algorithms, data structures, and system design patterns.
107
 
108
+ ## Limitations
109
 
110
+ - Trained for 1 epoch; more epochs could improve code quality
111
+ - The 1,024-token max length means very long code solutions may be truncated during training
112
+ - Code correctness is not verified during training (no execution-based feedback)
113
+ - Performance varies across languages; Python and JavaScript likely have the most training signal
114
+ - LoRA adapter requires the base Qwen2.5-7B-Instruct model for inference