| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - code-llm |
| - qwen |
| - sft |
| - dpo |
| - peft |
| - lora |
| metrics: |
| - accuracy |
| base_model: |
| - Qwen/Qwen2.5-Coder-7B |
| library_name: peft |
| --- |
| |
| # Code-Centric-Align: A Post-Training Pipeline for Code LLMs (LoRA Adapter) |
|
|
| **Notice:** This repository provides a **LoRA Adapter** trained via QLoRA. It is designed to be loaded on top of the base model `Qwen/Qwen2.5-Coder-7B`. |
|
|
| This project presents a systematic study of the post-training engineering pipeline for code-specific large language models. It establishes a "diagnosable and iterative" framework covering the full lifecycle from data engineering to deployment. |
|
|
| ## π Quick Start (Inference Example) |
|
|
| To use this LoRA adapter, you need to load the base model first and then attach the PEFT adapter. Ensure you have the required libraries installed: |
| ```bash |
| pip install transformers peft torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from peft import PeftModel |
| import torch |
| |
| base_model_id = "Qwen/Qwen2.5-Coder-7B" |
| adapter_id = "abcsk123/Code-Centric-Align" |
| |
| # 1. Load Tokenizer |
| tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
| |
| # 2. Load Base Model |
| base_model = AutoModelForCausalLM.from_pretrained( |
| base_model_id, |
| torch_dtype=torch.bfloat16, |
| device_map="auto" |
| ) |
| |
| # 3. Attach LoRA Adapter |
| model = PeftModel.from_pretrained(base_model, adapter_id) |
| |
| # 4. Generate Code |
| prompt = "def binary_search(arr, target):" |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| outputs = model.generate(**inputs, max_new_tokens=100) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
| (Note: If your adapter files are located inside a specific checkpoint folder, e.g., checkpoint-4675, please add the argument subfolder="checkpoint-4675" to PeftModel.from_pretrained()) |
| |
| |
| ## π οΈ Core Workflow |
| - Data Engineering: Implemented streaming collection, three-layer quality filtering, and MinHashLSH-based fuzzy deduplication. |
| - Instruction Evolution: Utilized DeepSeek APIs for Evol-Instruct difficulty enhancement and diversity expansion. |
| - Supervised Fine-Tuning (SFT): Applied QLoRA with a custom Instruction Masking strategy (QwenDataCollator) to ensure the model only learns from assistant responses. |
| - Rejection Sampling (RFT): Developed a high-throughput engine using vLLM for 10-path sampling, verified through a multi-process safe execution sandbox. |
| - Preference Alignment (DPO): Investigated Direct Preference Optimization, identifying critical failure modes such as length bias and low-quality negative samples. |
| - Quantization & Deployment: Performed 4-bit activation-aware quantization (AutoAWQ) and deployed the model via a vLLM OpenAI-compatible API. |
| |
| ## π Experimental Results (HumanEval Pass@1) |
| The project tracked performance gains and losses across multiple iterations: |
| - Base Model: 0.628 |
| - **SFT v3 (released): 0.671 (+6.8%)** β achieved through precise loss calculation and data cleaning. |
| - DPO Merged: 0.280 β highlighting the extreme sensitivity of code models to preference data quality. |
| |
| ## β οΈ Status & Roadmap |
| This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared. |