Upload tokenizer

a60a942 verified 9 days ago

4.58 kB

	---
	language:
	- en
	- code
	tags:
	- python
	- text-generation
	- qwen
	- qlora
	- custom-finetune
	- code
	- ollama
	datasets:
	- iamtarun/python_code_instructions_18k_alpaca
	base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
	---

	# 🤖 Qwen2.5-Coder-1.5B-python-MyTune

	Fine-tuned with ❤️ by Karim

	Welcome to Qwen2.5-Coder-1.5B-python-MyTune! This is a highly optimized, fine-tuned version of `Qwen/Qwen2.5-Coder-1.5B-Instruct`, specifically engineered to understand complex algorithmic instructions and generate clean, efficient, and highly accurate Python code.

	## 📌 Model Overview

	The training architecture utilized the QLoRA (Quantized Low-Rank Adaptation) method. This approach ensures high parameter efficiency, allowing the model to acquire advanced coding skills while preserving the robust logical reasoning capabilities of the original base weights.

	- Base Model: Qwen/Qwen2.5-Coder-1.5B-Instruct
	- Language: English / Python
	- Training Method: PEFT / QLoRA Integration
	- Precision: Mixed Precision (4-bit Base + float16 Adapters)
	- Compute: Google Colab T4 GPU (16GB VRAM)

	## 📊 Training Data

	The model was fine-tuned on a carefully curated subset of the [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) dataset. This dataset provides high-quality Python coding instructions, algorithmic challenges, and their corresponding structured solutions.

	## 🎯 Intended Use

	This model is designed to assist software engineers, data scientists, and quantitative analysts with:
	- Generating Python scripts from natural language prompts.
	- Solving complex algorithmic problems.
	- Writing data engineering and mathematical logic code.

	---

	## 🚀 Quick Start: How to Use

	You can easily load and run this model locally or on a cloud server using either the standard Hugging Face `transformers` library, or deploy it instantly using Ollama for local inference.

	### Option A: Local Deployment via Ollama (Recommended for Speed)

	Run this model entirely on your local machine without internet connection using Ollama!

	Step 1: Download the Model Files
	First, download the safetensors weights to a local directory:
	```bash
	pip install -U huggingface_hub
	huggingface-cli download karim0010/Qwen2.5-Coder-1.5B-python-MyTune --local-dir ./my_qwen_model

	```

	Step 2: Create a `Modelfile`
	In the same folder, create a file named `Modelfile` (no extension) and paste the following ChatML configuration:

	```dockerfile
	FROM ./my_qwen_model

	TEMPLATE """{{ if .System }}<\|im_start\|>system
	{{ .System }}<\|im_end\|>
	{{ end }}{{ if .Prompt }}<\|im_start\|>user
	{{ .Prompt }}<\|im_end\|>
	{{ end }}<\|im_start\|>assistant
	"""

	PARAMETER stop "<\|im_start\|>"
	PARAMETER stop "<\|im_end\|>"
	PARAMETER temperature 0.3
	PARAMETER top_p 0.9

	```

	Step 3: Compile and Run
	Build the model in Ollama and start chatting:

	```bash
	ollama create karim-coder -f ./Modelfile
	ollama run karim-coder

	```

	Now you can ask it to write Python code right in your terminal!

	---

	### Option B: Python Inference (Hugging Face Transformers)

	If you prefer integrating the model directly into your Python pipeline, use the following code.

	1. Install Dependencies

	```bash
	pip install transformers torch accelerate

	```

	2. Inference Script

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Define the repository
	model_id = "karim0010/Qwen2.5-Coder-1.5B-python-MyTune"

	# Load Tokenizer and Model
	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True
	)

	# Prepare the prompt using the ChatML template
	instruction = "Write a complete and clean Python function to calculate the Fibonacci sequence up to a given number 'n'."
	prompt = f"<\|im_start\|>user\n{instruction}<\|im_end\|>\n<\|im_start\|>assistant\n"

	# Tokenize inputs
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	# Generate code
	print("Generating code...")
	outputs = model.generate(
	inputs["input_ids"],
	attention_mask=inputs["attention_mask"],
	max_new_tokens=256,
	temperature=0.3, # Low temperature is recommended for accurate coding
	top_p=0.9,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	# Decode and print the result
	response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
	print("\n--- Output ---")
	print(response.strip())

	```