abcsk123
/

Code-Centric-Align

Model card Files Files and versions

Code-Centric-Align / README.md

abcsk123's picture

Update README.md

f6b76b2 verified 8 days ago

|

history blame contribute delete

3.39 kB

	---
	license: mit
	language:
	- en
	tags:
	- code-llm
	- qwen
	- sft
	- dpo
	- peft
	- lora
	metrics:
	- accuracy
	base_model:
	- Qwen/Qwen2.5-Coder-7B
	library_name: peft
	---

	# Code-Centric-Align: A Post-Training Pipeline for Code LLMs (LoRA Adapter)

	Notice: This repository provides a LoRA Adapter trained via QLoRA. It is designed to be loaded on top of the base model `Qwen/Qwen2.5-Coder-7B`.

	This project presents a systematic study of the post-training engineering pipeline for code-specific large language models. It establishes a "diagnosable and iterative" framework covering the full lifecycle from data engineering to deployment.

	## 🚀 Quick Start (Inference Example)

	To use this LoRA adapter, you need to load the base model first and then attach the PEFT adapter. Ensure you have the required libraries installed:
	```bash
	pip install transformers peft torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	base_model_id = "Qwen/Qwen2.5-Coder-7B"
	adapter_id = "abcsk123/Code-Centric-Align"

	# 1. Load Tokenizer
	tokenizer = AutoTokenizer.from_pretrained(base_model_id)

	# 2. Load Base Model
	base_model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# 3. Attach LoRA Adapter
	model = PeftModel.from_pretrained(base_model, adapter_id)

	# 4. Generate Code
	prompt = "def binary_search(arr, target):"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```
	(Note: If your adapter files are located inside a specific checkpoint folder, e.g., checkpoint-4675, please add the argument subfolder="checkpoint-4675" to PeftModel.from_pretrained())


	## 🛠️ Core Workflow
	- Data Engineering: Implemented streaming collection, three-layer quality filtering, and MinHashLSH-based fuzzy deduplication.
	- Instruction Evolution: Utilized DeepSeek APIs for Evol-Instruct difficulty enhancement and diversity expansion.
	- Supervised Fine-Tuning (SFT): Applied QLoRA with a custom Instruction Masking strategy (QwenDataCollator) to ensure the model only learns from assistant responses.
	- Rejection Sampling (RFT): Developed a high-throughput engine using vLLM for 10-path sampling, verified through a multi-process safe execution sandbox.
	- Preference Alignment (DPO): Investigated Direct Preference Optimization, identifying critical failure modes such as length bias and low-quality negative samples.
	- Quantization & Deployment: Performed 4-bit activation-aware quantization (AutoAWQ) and deployed the model via a vLLM OpenAI-compatible API.

	## 📈 Experimental Results (HumanEval Pass@1)
	The project tracked performance gains and losses across multiple iterations:
	- Base Model: 0.628
	- SFT v3 (released): 0.671 (+6.8%) — achieved through precise loss calculation and data cleaning.
	- DPO Merged: 0.280 — highlighting the extreme sensitivity of code models to preference data quality.

	## ⚠️ Status & Roadmap
	This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared.