---
language:
- en
- code
tags:
- python
- text-generation
- qwen
- qlora
- custom-finetune
- code
- ollama
datasets:
- iamtarun/python_code_instructions_18k_alpaca
base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
---

# 🤖 Qwen2.5-Coder-1.5B-python-MyTune

**Fine-tuned with ❤️ by Karim**

Welcome to **Qwen2.5-Coder-1.5B-python-MyTune**! This is a highly optimized, fine-tuned version of `Qwen/Qwen2.5-Coder-1.5B-Instruct`, specifically engineered to understand complex algorithmic instructions and generate clean, efficient, and highly accurate **Python** code.

## 📌 Model Overview

The training architecture utilized the **QLoRA** (Quantized Low-Rank Adaptation) method. This approach ensures high parameter efficiency, allowing the model to acquire advanced coding skills while preserving the robust logical reasoning capabilities of the original base weights.

- **Base Model:** Qwen/Qwen2.5-Coder-1.5B-Instruct
- **Language:** English / Python
- **Training Method:** PEFT / QLoRA Integration
- **Precision:** Mixed Precision (4-bit Base + float16 Adapters)
- **Compute:** Google Colab T4 GPU (16GB VRAM)

## 📊 Training Data

The model was fine-tuned on a carefully curated subset of the [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) dataset. This dataset provides high-quality Python coding instructions, algorithmic challenges, and their corresponding structured solutions.

## 🎯 Intended Use

This model is designed to assist software engineers, data scientists, and quantitative analysts with:
- Generating Python scripts from natural language prompts.
- Solving complex algorithmic problems.
- Writing data engineering and mathematical logic code.

---

## 🚀 Quick Start: How to Use

You can easily load and run this model locally or on a cloud server using either the standard Hugging Face `transformers` library, or deploy it instantly using **Ollama** for local inference.

### Option A: Local Deployment via Ollama (Recommended for Speed)

Run this model entirely on your local machine without internet connection using Ollama!

**Step 1: Download the Model Files**
First, download the safetensors weights to a local directory:
```bash
pip install -U huggingface_hub
huggingface-cli download karim0010/Qwen2.5-Coder-1.5B-python-MyTune --local-dir ./my_qwen_model

```

**Step 2: Create a `Modelfile**`
In the same folder, create a file named `Modelfile` (no extension) and paste the following ChatML configuration:

```dockerfile
FROM ./my_qwen_model

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.3
PARAMETER top_p 0.9

```

**Step 3: Compile and Run**
Build the model in Ollama and start chatting:

```bash
ollama create karim-coder -f ./Modelfile
ollama run karim-coder

```

*Now you can ask it to write Python code right in your terminal!*

---

### Option B: Python Inference (Hugging Face Transformers)

If you prefer integrating the model directly into your Python pipeline, use the following code.

**1. Install Dependencies**

```bash
pip install transformers torch accelerate

```

**2. Inference Script**

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the repository
model_id = "karim0010/Qwen2.5-Coder-1.5B-python-MyTune"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Prepare the prompt using the ChatML template
instruction = "Write a complete and clean Python function to calculate the Fibonacci sequence up to a given number 'n'."
prompt = f"<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n"

# Tokenize inputs
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate code
print("Generating code...")
outputs = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    max_new_tokens=256,
    temperature=0.3, # Low temperature is recommended for accurate coding
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

# Decode and print the result
response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
print("\n--- Output ---")
print(response.strip())

```