abcsk123
/

Code-Centric-Align

@@ -7,30 +7,69 @@ tags:
 - qwen
 - sft
 - dpo
 metrics:
 - accuracy
 base_model:
 - Qwen/Qwen2.5-Coder-7B
-library_name: transformers
 ---
-# Code-Centric-Align: A Post-Training Pipeline for Code LLMs
-This project presents a systematic study of the post-training engineering pipeline for code-specific large language models, using **Qwen2.5-Coder-7B** as the base model. It establishes a "diagnosable and iterative" framework covering the full lifecycle from data engineering to deployment.
 ## 🛠️ Core Workflow
-* **Data Engineering**: Implemented streaming collection, three-layer quality filtering, and MinHashLSH-based fuzzy deduplication.
-* **Instruction Evolution**: Utilized DeepSeek APIs for Evol-Instruct difficulty enhancement and diversity expansion.
-* **Supervised Fine-Tuning (SFT)**: Applied QLoRA with a custom **Instruction Masking** strategy (QwenDataCollator) to ensure the model only learns from assistant responses.
-* **Rejection Sampling (RFT)**: Developed a high-throughput engine using vLLM for 10-path sampling, verified through a multi-process safe execution sandbox.
-* **Preference Alignment (DPO)**: Investigated Direct Preference Optimization, identifying critical failure modes such as length bias and low-quality negative samples.
-* **Quantization & Deployment**: Performed 4-bit activation-aware quantization (AutoAWQ) and deployed the model via a vLLM OpenAI-compatible API.
 ## 📈 Experimental Results (HumanEval Pass@1)
 The project tracked performance gains and losses across multiple iterations:
-* **Base Model**: 0.628
-* **SFT v3 (released)**: **0.671 (+6.8%)** — achieved through precise loss calculation and data cleaning.
-* **DPO Merged**: 0.280 — highlighting the extreme sensitivity of code models to preference data quality.
 ## ⚠️ Status & Roadmap
 This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared.

 - qwen
 - sft
 - dpo
+- peft
+- lora
 metrics:
 - accuracy
 base_model:
 - Qwen/Qwen2.5-Coder-7B
+library_name: peft
 ---
+# Code-Centric-Align: A Post-Training Pipeline for Code LLMs (LoRA Adapter)
+**Notice:** This repository provides a **LoRA Adapter** trained via QLoRA. It is designed to be loaded on top of the base model `Qwen/Qwen2.5-Coder-7B`.
+This project presents a systematic study of the post-training engineering pipeline for code-specific large language models. It establishes a "diagnosable and iterative" framework covering the full lifecycle from data engineering to deployment.
+## 🚀 Quick Start (Inference Example)
+To use this LoRA adapter, you need to load the base model first and then attach the PEFT adapter. Ensure you have the required libraries installed:
+```bash
+pip install transformers peft torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+base_model_id = "Qwen/Qwen2.5-Coder-7B"
+adapter_id = "abcsk123/Code-Centric-Align"
+# 1. Load Tokenizer
+tokenizer = AutoTokenizer.from_pretrained(base_model_id)
+# 2. Load Base Model
+base_model = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+# 3. Attach LoRA Adapter
+model = PeftModel.from_pretrained(base_model, adapter_id)
+# 4. Generate Code
+prompt = "def binary_search(arr, target):"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+(Note: If your adapter files are located inside a specific checkpoint folder, e.g., checkpoint-4675, please add the argument subfolder="checkpoint-4675" to PeftModel.from_pretrained())
 ## 🛠️ Core Workflow
+Data Engineering: Implemented streaming collection, three-layer quality filtering, and MinHashLSH-based fuzzy deduplication.
+Instruction Evolution: Utilized DeepSeek APIs for Evol-Instruct difficulty enhancement and diversity expansion.
+Supervised Fine-Tuning (SFT): Applied QLoRA with a custom Instruction Masking strategy (QwenDataCollator) to ensure the model only learns from assistant responses.
+Rejection Sampling (RFT): Developed a high-throughput engine using vLLM for 10-path sampling, verified through a multi-process safe execution sandbox.
+Preference Alignment (DPO): Investigated Direct Preference Optimization, identifying critical failure modes such as length bias and low-quality negative samples.
+Quantization & Deployment: Performed 4-bit activation-aware quantization (AutoAWQ) and deployed the model via a vLLM OpenAI-compatible API.
 ## 📈 Experimental Results (HumanEval Pass@1)
 The project tracked performance gains and losses across multiple iterations:
+Base Model: 0.628
+SFT v3 (released): 0.671 (+6.8%) — achieved through precise loss calculation and data cleaning.
+DPO Merged: 0.280 — highlighting the extreme sensitivity of code models to preference data quality.
 ## ⚠️ Status & Roadmap
 This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared.