karim0010's picture
Upload tokenizer
a60a942 verified
---
language:
- en
- code
tags:
- python
- text-generation
- qwen
- qlora
- custom-finetune
- code
- ollama
datasets:
- iamtarun/python_code_instructions_18k_alpaca
base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
---
# ๐Ÿค– Qwen2.5-Coder-1.5B-python-MyTune
**Fine-tuned with โค๏ธ by Karim**
Welcome to **Qwen2.5-Coder-1.5B-python-MyTune**! This is a highly optimized, fine-tuned version of `Qwen/Qwen2.5-Coder-1.5B-Instruct`, specifically engineered to understand complex algorithmic instructions and generate clean, efficient, and highly accurate **Python** code.
## ๐Ÿ“Œ Model Overview
The training architecture utilized the **QLoRA** (Quantized Low-Rank Adaptation) method. This approach ensures high parameter efficiency, allowing the model to acquire advanced coding skills while preserving the robust logical reasoning capabilities of the original base weights.
- **Base Model:** Qwen/Qwen2.5-Coder-1.5B-Instruct
- **Language:** English / Python
- **Training Method:** PEFT / QLoRA Integration
- **Precision:** Mixed Precision (4-bit Base + float16 Adapters)
- **Compute:** Google Colab T4 GPU (16GB VRAM)
## ๐Ÿ“Š Training Data
The model was fine-tuned on a carefully curated subset of the [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) dataset. This dataset provides high-quality Python coding instructions, algorithmic challenges, and their corresponding structured solutions.
## ๐ŸŽฏ Intended Use
This model is designed to assist software engineers, data scientists, and quantitative analysts with:
- Generating Python scripts from natural language prompts.
- Solving complex algorithmic problems.
- Writing data engineering and mathematical logic code.
---
## ๐Ÿš€ Quick Start: How to Use
You can easily load and run this model locally or on a cloud server using either the standard Hugging Face `transformers` library, or deploy it instantly using **Ollama** for local inference.
### Option A: Local Deployment via Ollama (Recommended for Speed)
Run this model entirely on your local machine without internet connection using Ollama!
**Step 1: Download the Model Files**
First, download the safetensors weights to a local directory:
```bash
pip install -U huggingface_hub
huggingface-cli download karim0010/Qwen2.5-Coder-1.5B-python-MyTune --local-dir ./my_qwen_model
```
**Step 2: Create a `Modelfile**`
In the same folder, create a file named `Modelfile` (no extension) and paste the following ChatML configuration:
```dockerfile
FROM ./my_qwen_model
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.3
PARAMETER top_p 0.9
```
**Step 3: Compile and Run**
Build the model in Ollama and start chatting:
```bash
ollama create karim-coder -f ./Modelfile
ollama run karim-coder
```
*Now you can ask it to write Python code right in your terminal!*
---
### Option B: Python Inference (Hugging Face Transformers)
If you prefer integrating the model directly into your Python pipeline, use the following code.
**1. Install Dependencies**
```bash
pip install transformers torch accelerate
```
**2. Inference Script**
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Define the repository
model_id = "karim0010/Qwen2.5-Coder-1.5B-python-MyTune"
# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Prepare the prompt using the ChatML template
instruction = "Write a complete and clean Python function to calculate the Fibonacci sequence up to a given number 'n'."
prompt = f"<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n"
# Tokenize inputs
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate code
print("Generating code...")
outputs = model.generate(
inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_new_tokens=256,
temperature=0.3, # Low temperature is recommended for accurate coding
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode and print the result
response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
print("\n--- Output ---")
print(response.strip())
```