--- library_name: transformers tags: [] --- library_name: transformers tags: - qwen - code - text-generation - fine-tuned # Model Card for qwen2.5-coder-ft This model is a fine-tuned and merged version of Qwen2.5-Coder-1.5B-Instruct, specialized in Python programming and precise code generation. ## Model Details ### Model Description This model has been fine-tuned using Low-Rank Adaptation (LoRA) and subsequently merged into full 16-bit precision weights. It is optimized to act as a strict code assistant, delivering accurate programming solutions while minimizing conversational overhead. - **Developed by:** Soulama Haicanama Ismael - **Model type:** Causal Language Model (Transformer Architecture) - **Language(s) (NLP):** English, Python - **License:** Apache 2.0 (inherited from Qwen base model) - **Finetuned from model:** Qwen/Qwen2.5-Coder-1.5B-Instruct ### Model Sources - **Repository:** SOULAMA/qwen2.5-coder-ft ## Uses ### Direct Use This model is intended for direct code generation and answering programming questions. It is designed to work within a Chat Template infrastructure using specific system prompts to isolate python code blocks. ### Out-of-Scope Use The model should not be used for generic non-coding tasks (such as writing creative essays, general chat, or translation), as its attention layers have been heavily adjusted towards script structures and programmatic vocabulary. ## Bias, Risks, and Limitations Due to its 1.5B parameter size, the model can suffer from context-loop repetition if the stopping criteria are not explicitly configured during inference. Users must handle stop tokens (`<|im_end|>`) strictly in their generation script to ensure execution stability. ### Recommendations It is highly recommended to lower the generation temperature ($\le 0.2$) and provide clear, standalone system instructions to ensure deterministic code results. ## How to Get Started with the Model Use the code below to get started with the model using proper generation boundaries: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_ID = "SOULAMA/qwen2.5-coder-ft" device = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, torch_dtype=torch.float16, device_map="auto" ) question = "Write a Python function that takes two values c and d and returns c+d." def build_prompt(question: str) -> str: return ( "<|im_start|>system\n" "Tu es un expert en programmation. Écris uniquement le code Python qui résout le problème.\n" "<|im_end|>\n" "<|im_start|>user\n" f"{question}\n" "<|im_end|>\n" "<|im_start|>assistant\n" ) messages=build_prompt(question) inputs = tokenizer(messages, add_generation_prompt=True, return_tensors="pt").to(device) with torch.no_grad(): output_ids = model.generate( inputs, max_new_tokens=256, temperature=0.1, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id, eos_token_id=tokenizer.eos_token_id ) new_tokens = output_ids[0][inputs.shape[1]:] print(tokenizer.decode(new_tokens, skip_special_tokens=True)) ``` ## Training Details ### Training Data The model was trained on a custom instruction dataset containing coding exercises, software engineering questions, and structured Python scripts. ### Training Procedure #### Preprocessing Prompts were structured using the Qwen ChatML format, dividing blocks into `<|im_start|>system`, `<|im_start|>user`, and `<|im_start|>assistant` segments to maintain deep semantic alignment with the original instruct template. #### Training Hyperparameters * **Training regime:** PEFT (LoRA) followed by a full matrix `merge_and_unload()` into float16 precision. * **Base model precision:** 4-bit quantized base setup during training (BitsAndBytes). * **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj. #### Speeds, Sizes, Times * **Checkpoint size:** ~3.09 GB (Full Safetensors model) * **Adaptation layer size:** ~73.9 MB (LoRA Weights) ## Technical Specifications ### Model Architecture and Objective Based on the Qwen2.5-Coder dense architecture with Grouped-Query Attention (GQA) and RoPE (Rotary Position Embedding) optimized for dense source code token sequences. ### Compute Infrastructure #### Hardware * **GPU Type:** 1 x NVIDIA Tesla T4 (via Google Colab Ecosystem) #### Software * **Libraries:** PyTorch, Transformers, PEFT, BitsAndBytes, TRL. ## Model Card Authors ``` Soulama Haicanama Ismael ``` ## Model Card Contact [More Information Needed]