| --- |
| language: |
| - en |
| - code |
| tags: |
| - python |
| - text-generation |
| - qwen |
| - qlora |
| - custom-finetune |
| - code |
| - ollama |
| datasets: |
| - iamtarun/python_code_instructions_18k_alpaca |
| base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct |
| --- |
| |
| # ๐ค Qwen2.5-Coder-1.5B-python-MyTune |
|
|
| **Fine-tuned with โค๏ธ by Karim** |
|
|
| Welcome to **Qwen2.5-Coder-1.5B-python-MyTune**! This is a highly optimized, fine-tuned version of `Qwen/Qwen2.5-Coder-1.5B-Instruct`, specifically engineered to understand complex algorithmic instructions and generate clean, efficient, and highly accurate **Python** code. |
|
|
| ## ๐ Model Overview |
|
|
| The training architecture utilized the **QLoRA** (Quantized Low-Rank Adaptation) method. This approach ensures high parameter efficiency, allowing the model to acquire advanced coding skills while preserving the robust logical reasoning capabilities of the original base weights. |
|
|
| - **Base Model:** Qwen/Qwen2.5-Coder-1.5B-Instruct |
| - **Language:** English / Python |
| - **Training Method:** PEFT / QLoRA Integration |
| - **Precision:** Mixed Precision (4-bit Base + float16 Adapters) |
| - **Compute:** Google Colab T4 GPU (16GB VRAM) |
|
|
| ## ๐ Training Data |
|
|
| The model was fine-tuned on a carefully curated subset of the [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) dataset. This dataset provides high-quality Python coding instructions, algorithmic challenges, and their corresponding structured solutions. |
|
|
| ## ๐ฏ Intended Use |
|
|
| This model is designed to assist software engineers, data scientists, and quantitative analysts with: |
| - Generating Python scripts from natural language prompts. |
| - Solving complex algorithmic problems. |
| - Writing data engineering and mathematical logic code. |
|
|
| --- |
|
|
| ## ๐ Quick Start: How to Use |
|
|
| You can easily load and run this model locally or on a cloud server using either the standard Hugging Face `transformers` library, or deploy it instantly using **Ollama** for local inference. |
|
|
| ### Option A: Local Deployment via Ollama (Recommended for Speed) |
|
|
| Run this model entirely on your local machine without internet connection using Ollama! |
|
|
| **Step 1: Download the Model Files** |
| First, download the safetensors weights to a local directory: |
| ```bash |
| pip install -U huggingface_hub |
| huggingface-cli download karim0010/Qwen2.5-Coder-1.5B-python-MyTune --local-dir ./my_qwen_model |
| |
| ``` |
|
|
| **Step 2: Create a `Modelfile**` |
| In the same folder, create a file named `Modelfile` (no extension) and paste the following ChatML configuration: |
|
|
| ```dockerfile |
| FROM ./my_qwen_model |
| |
| TEMPLATE """{{ if .System }}<|im_start|>system |
| {{ .System }}<|im_end|> |
| {{ end }}{{ if .Prompt }}<|im_start|>user |
| {{ .Prompt }}<|im_end|> |
| {{ end }}<|im_start|>assistant |
| """ |
| |
| PARAMETER stop "<|im_start|>" |
| PARAMETER stop "<|im_end|>" |
| PARAMETER temperature 0.3 |
| PARAMETER top_p 0.9 |
| |
| ``` |
|
|
| **Step 3: Compile and Run** |
| Build the model in Ollama and start chatting: |
|
|
| ```bash |
| ollama create karim-coder -f ./Modelfile |
| ollama run karim-coder |
| |
| ``` |
|
|
| *Now you can ask it to write Python code right in your terminal!* |
|
|
| --- |
|
|
| ### Option B: Python Inference (Hugging Face Transformers) |
|
|
| If you prefer integrating the model directly into your Python pipeline, use the following code. |
|
|
| **1. Install Dependencies** |
|
|
| ```bash |
| pip install transformers torch accelerate |
| |
| ``` |
|
|
| **2. Inference Script** |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| # Define the repository |
| model_id = "karim0010/Qwen2.5-Coder-1.5B-python-MyTune" |
| |
| # Load Tokenizer and Model |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.float16, |
| device_map="auto", |
| trust_remote_code=True |
| ) |
| |
| # Prepare the prompt using the ChatML template |
| instruction = "Write a complete and clean Python function to calculate the Fibonacci sequence up to a given number 'n'." |
| prompt = f"<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n" |
| |
| # Tokenize inputs |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| # Generate code |
| print("Generating code...") |
| outputs = model.generate( |
| inputs["input_ids"], |
| attention_mask=inputs["attention_mask"], |
| max_new_tokens=256, |
| temperature=0.3, # Low temperature is recommended for accurate coding |
| top_p=0.9, |
| do_sample=True, |
| pad_token_id=tokenizer.eos_token_id |
| ) |
| |
| # Decode and print the result |
| response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True) |
| print("\n--- Output ---") |
| print(response.strip()) |
| |
| ``` |
|
|