--- language: - en - code tags: - python - text-generation - qwen - qlora - custom-finetune - code - ollama datasets: - iamtarun/python_code_instructions_18k_alpaca base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct --- # 🤖 Qwen2.5-Coder-1.5B-python-MyTune **Fine-tuned with ❤️ by Karim** Welcome to **Qwen2.5-Coder-1.5B-python-MyTune**! This is a highly optimized, fine-tuned version of `Qwen/Qwen2.5-Coder-1.5B-Instruct`, specifically engineered to understand complex algorithmic instructions and generate clean, efficient, and highly accurate **Python** code. ## 📌 Model Overview The training architecture utilized the **QLoRA** (Quantized Low-Rank Adaptation) method. This approach ensures high parameter efficiency, allowing the model to acquire advanced coding skills while preserving the robust logical reasoning capabilities of the original base weights. - **Base Model:** Qwen/Qwen2.5-Coder-1.5B-Instruct - **Language:** English / Python - **Training Method:** PEFT / QLoRA Integration - **Precision:** Mixed Precision (4-bit Base + float16 Adapters) - **Compute:** Google Colab T4 GPU (16GB VRAM) ## 📊 Training Data The model was fine-tuned on a carefully curated subset of the [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) dataset. This dataset provides high-quality Python coding instructions, algorithmic challenges, and their corresponding structured solutions. ## 🎯 Intended Use This model is designed to assist software engineers, data scientists, and quantitative analysts with: - Generating Python scripts from natural language prompts. - Solving complex algorithmic problems. - Writing data engineering and mathematical logic code. --- ## 🚀 Quick Start: How to Use You can easily load and run this model locally or on a cloud server using either the standard Hugging Face `transformers` library, or deploy it instantly using **Ollama** for local inference. ### Option A: Local Deployment via Ollama (Recommended for Speed) Run this model entirely on your local machine without internet connection using Ollama! **Step 1: Download the Model Files** First, download the safetensors weights to a local directory: ```bash pip install -U huggingface_hub huggingface-cli download karim0010/Qwen2.5-Coder-1.5B-python-MyTune --local-dir ./my_qwen_model ``` **Step 2: Create a `Modelfile**` In the same folder, create a file named `Modelfile` (no extension) and paste the following ChatML configuration: ```dockerfile FROM ./my_qwen_model TEMPLATE """{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant """ PARAMETER stop "<|im_start|>" PARAMETER stop "<|im_end|>" PARAMETER temperature 0.3 PARAMETER top_p 0.9 ``` **Step 3: Compile and Run** Build the model in Ollama and start chatting: ```bash ollama create karim-coder -f ./Modelfile ollama run karim-coder ``` *Now you can ask it to write Python code right in your terminal!* --- ### Option B: Python Inference (Hugging Face Transformers) If you prefer integrating the model directly into your Python pipeline, use the following code. **1. Install Dependencies** ```bash pip install transformers torch accelerate ``` **2. Inference Script** ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer # Define the repository model_id = "karim0010/Qwen2.5-Coder-1.5B-python-MyTune" # Load Tokenizer and Model tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) # Prepare the prompt using the ChatML template instruction = "Write a complete and clean Python function to calculate the Fibonacci sequence up to a given number 'n'." prompt = f"<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n" # Tokenize inputs inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # Generate code print("Generating code...") outputs = model.generate( inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=256, temperature=0.3, # Low temperature is recommended for accurate coding top_p=0.9, do_sample=True, pad_token_id=tokenizer.eos_token_id ) # Decode and print the result response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True) print("\n--- Output ---") print(response.strip()) ```