File size: 3,083 Bytes
d8d9cba 9a175d2 288967a 9a175d2 288967a d8d9cba 9a175d2 288967a 9a175d2 288967a 9a175d2 d8d9cba 9a175d2 288967a 9a175d2 288967a 9a175d2 d8d9cba 9a175d2 288967a 9a175d2 288967a 9a175d2 d8d9cba 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 d8d9cba 9a175d2 288967a 9a175d2 d8d9cba 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 288967a 9a175d2 d8d9cba 9a175d2 288967a 9a175d2 288967a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | ---
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen2.5-1.5B-Instruct
datasets:
- iamtarun/python_code_instructions_18k_alpaca
language:
- ar
- en
pipeline_tag: text-generation
tags:
- llama-factory
- lora
- qwen2
- python
- arabic
- code
- instruction-tuning
- fine-tuned
---
# 🐍 Python Assistant (Arabic)
A fine-tuned version of **Qwen2.5-1.5B-Instruct** that answers Python programming questions in **Arabic**, with structured JSON output. Fine-tuned using LoRA via LLaMA-Factory.
---
## Model Details
- **Developed by:** jana-ashraf-ai
- **Base Model:** [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
- **Model type:** Causal Language Model (text-generation)
- **Language(s):** Arabic (answers) + English (questions)
- **License:** Apache 2.0
- **Fine-tuning method:** QLoRA (LoRA rank=32) via LLaMA-Factory
---
## What does this model do?
Given a Python programming question in English, the model returns a structured JSON answer **in Arabic**, explaining the solution step by step.
---
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "jana-ashraf-ai/python-assistant"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
system_prompt = """You are a Python expert assistant.
Answer the user's Python question in Arabic following the Output Schema.
Do not add any introduction or conclusion."""
question = "How do I reverse a list in Python?"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": question}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Training Details
| Parameter | Value |
|-----------|-------|
| Base model | Qwen2.5-1.5B-Instruct |
| Fine-tuning method | LoRA (QLoRA) |
| LoRA rank | 32 |
| LoRA target | all |
| Training samples | 1,000 |
| Epochs | 3 |
| Learning rate | 1e-4 |
| LR scheduler | cosine |
| Warmup ratio | 0.1 |
| Batch size | 1 (grad accum = 8) |
| Precision | fp16 |
| Quantization | 4-bit (nf4) |
| Framework | LLaMA-Factory |
| Hardware | Google Colab T4 GPU |
---
## Training Data
Fine-tuned on a curated subset (1,000 samples) from [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca).
The answers were annotated and structured using GPT to produce Arabic explanations in a JSON schema format.
**Train / Val split:** 90% / 10%
---
## Limitations
- The model is optimized for Python questions only.
- Answers are in Arabic — not suitable for English-only use cases.
- Small model size (1.5B) may struggle with very complex programming problems.
- Output quality depends on the question being clear and specific. |