usama10
/

qwen-7b-code-instruct

@@ -1,58 +1,114 @@
 ---
 base_model: Qwen/Qwen2.5-7B-Instruct
-library_name: transformers
-model_name: qwen-7b-code-instruct
 tags:
-- generated_from_trainer
-- sft
-- trl
-licence: license
 ---
-# Model Card for qwen-7b-code-instruct
-This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="usama10/qwen-7b-code-instruct", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-This model was trained with SFT.
-### Framework versions
-- TRL: 0.29.1
-- Transformers: 5.3.0
-- Pytorch: 2.10.0+cu128
-- Datasets: 4.8.3
-- Tokenizers: 0.22.2
-## Citations
-Cite TRL as:
-```bibtex
-@software{vonwerra2020trl,
-  title   = {{TRL: Transformers Reinforcement Learning}},
-  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
-  license = {Apache-2.0},
-  url     = {https://github.com/huggingface/trl},
-  year    = {2020}
-}
-```

 ---
+license: apache-2.0
 base_model: Qwen/Qwen2.5-7B-Instruct
 tags:
+  - code
+  - code-generation
+  - sft
+  - lora
+  - qwen
+  - programming
+datasets:
+  - TokenBender/code_instructions_122k_alpaca_style
+pipeline_tag: text-generation
+model-index:
+  - name: qwen-7b-code-instruct
+    results:
+      - task:
+          type: text-generation
+          name: Code Generation
+        dataset:
+          name: Code Instructions 122K
+          type: TokenBender/code_instructions_122k_alpaca_style
+          split: train
+        metrics:
+          - type: loss
+            value: 0.507
+            name: Final Training Loss
 ---
+# Qwen2.5-7B Code Instruct
+A **Qwen2.5-7B-Instruct** model fine-tuned with **SFT + LoRA** on [122K code instructions](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style) covering 40+ programming languages. The model generates clean, correct code from natural language descriptions.
+## Training Details
+| Parameter | Value |
+|-----------|-------|
+| **Base model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
+| **Method** | SFT with LoRA (r=32, alpha=64) |
+| **Quantization** | None (full bf16) |
+| **Dataset** | [TokenBender/code_instructions_122k_alpaca_style](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style) |
+| **Training examples** | 119,519 |
+| **Hardware** | NVIDIA RTX 5090 (32GB VRAM) |
+| **Training time** | ~3.3 hours |
+| **Epochs** | 1 |
+| **Effective batch size** | 16 (4 per device x 4 gradient accumulation) |
+| **Learning rate** | 2e-5 (cosine schedule, 100 warmup steps) |
+| **Max sequence length** | 1,024 tokens |
+| **Precision** | bf16 |
+| **Framework** | TRL 0.29.1 + Transformers 5.3.0 |
+## Performance
+| Metric | Value |
+|--------|-------|
+| **Starting loss** | 2.10 |
+| **Final loss** | **0.46** |
+| **Loss reduction** | 78% |
+## Training Curves
+![Training Metrics](code_training_metrics_plots.png)
+- **Training Loss**: Sharp drop from 2.1 to ~0.5 within the first 200 steps, then continued gradual improvement
+- **Learning Rate**: Cosine decay from 2e-5 to 0
+- **Gradient Norm**: Stable around 1.0 throughout training
+## Languages Covered
+The training dataset spans 40+ programming languages including Python, JavaScript, Java, C++, C#, Go, Rust, TypeScript, SQL, Ruby, PHP, Swift, Kotlin, R, Bash, and more.
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-7B-Instruct",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+model = PeftModel.from_pretrained(base_model, "usama10/qwen-7b-code-instruct")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
+messages = [
+    {"role": "system", "content": "You are an expert programmer. Given a programming task, write clean, correct, and well-commented code."},
+    {"role": "user", "content": "Write a Python function that finds the longest common subsequence of two strings."},
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
+print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
+```
+## Dataset
+The [code_instructions_122k_alpaca_style](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style) dataset contains 122K instruction-output pairs in Alpaca format. Each example has:
+- **instruction**: A natural language description of the coding task
+- **input**: Optional context or additional information
+- **output**: The expected code solution
+Examples range from simple utility functions to complex algorithms, data structures, and system design patterns.
+## Limitations
+- Trained for 1 epoch; more epochs could improve code quality
+- The 1,024-token max length means very long code solutions may be truncated during training
+- Code correctness is not verified during training (no execution-based feedback)
+- Performance varies across languages; Python and JavaScript likely have the most training signal
+- LoRA adapter requires the base Qwen2.5-7B-Instruct model for inference