---
library_name: transformers
tags: []
---
library_name: transformers
tags:
- qwen
- code
- text-generation
- fine-tuned

# Model Card for qwen2.5-coder-ft

This model is a fine-tuned and merged version of Qwen2.5-Coder-1.5B-Instruct, specialized in Python programming and precise code generation.

## Model Details

### Model Description

This model has been fine-tuned using Low-Rank Adaptation (LoRA) and subsequently merged into full 16-bit precision weights. It is optimized to act as a strict code assistant, delivering accurate programming solutions while minimizing conversational overhead.

- **Developed by:** Soulama Haicanama Ismael
- **Model type:** Causal Language Model (Transformer Architecture)
- **Language(s) (NLP):** English, Python
- **License:** Apache 2.0 (inherited from Qwen base model)
- **Finetuned from model:** Qwen/Qwen2.5-Coder-1.5B-Instruct

### Model Sources

- **Repository:** SOULAMA/qwen2.5-coder-ft

## Uses

### Direct Use

This model is intended for direct code generation and answering programming questions. It is designed to work within a Chat Template infrastructure using specific system prompts to isolate python code blocks.

### Out-of-Scope Use

The model should not be used for generic non-coding tasks (such as writing creative essays, general chat, or translation), as its attention layers have been heavily adjusted towards script structures and programmatic vocabulary.

## Bias, Risks, and Limitations

Due to its 1.5B parameter size, the model can suffer from context-loop repetition if the stopping criteria are not explicitly configured during inference. Users must handle stop tokens (`<|im_end|>`) strictly in their generation script to ensure execution stability.

### Recommendations

It is highly recommended to lower the generation temperature ($\le 0.2$) and provide clear, standalone system instructions to ensure deterministic code results.

## How to Get Started with the Model

Use the code below to get started with the model using proper generation boundaries:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "SOULAMA/qwen2.5-coder-ft"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)

question = "Write a Python function that takes two values c and d and returns c+d."

def build_prompt(question: str) -> str:
    return (
        "<|im_start|>system\n"
        "Tu es un expert en programmation. Écris uniquement le code Python qui résout le problème.\n"
        "<|im_end|>\n"
        "<|im_start|>user\n"
        f"{question}\n"
        "<|im_end|>\n"
        "<|im_start|>assistant\n"
    )

messages=build_prompt(question)

inputs = tokenizer(messages, add_generation_prompt=True, return_tensors="pt").to(device)

with torch.no_grad():
    output_ids = model.generate(
        inputs,
        max_new_tokens=256,
        temperature=0.1,
        repetition_penalty=1.2,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

new_tokens = output_ids[0][inputs.shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

```

## Training Details

### Training Data

The model was trained on a custom instruction dataset containing coding exercises, software engineering questions, and structured Python scripts.

### Training Procedure

#### Preprocessing

Prompts were structured using the Qwen ChatML format, dividing blocks into `<|im_start|>system`, `<|im_start|>user`, and `<|im_start|>assistant` segments to maintain deep semantic alignment with the original instruct template.

#### Training Hyperparameters

* **Training regime:** PEFT (LoRA) followed by a full matrix `merge_and_unload()` into float16 precision.
* **Base model precision:** 4-bit quantized base setup during training (BitsAndBytes).
* **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.

#### Speeds, Sizes, Times

* **Checkpoint size:** ~3.09 GB (Full Safetensors model)
* **Adaptation layer size:** ~73.9 MB (LoRA Weights)

## Technical Specifications

### Model Architecture and Objective

Based on the Qwen2.5-Coder dense architecture with Grouped-Query Attention (GQA) and RoPE (Rotary Position Embedding) optimized for dense source code token sequences.

### Compute Infrastructure

#### Hardware

* **GPU Type:** 1 x NVIDIA Tesla T4 (via Google Colab Ecosystem)

#### Software

* **Libraries:** PyTorch, Transformers, PEFT, BitsAndBytes, TRL.

## Model Card Authors
```
Soulama Haicanama Ismael
```

## Model Card Contact

[More Information Needed]