abcsk123 commited on
Commit
385acc6
ยท
verified ยท
1 Parent(s): 40ae030

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -12
README.md CHANGED
@@ -7,30 +7,69 @@ tags:
7
  - qwen
8
  - sft
9
  - dpo
 
 
10
  metrics:
11
  - accuracy
12
  base_model:
13
  - Qwen/Qwen2.5-Coder-7B
14
- library_name: transformers
15
  ---
16
 
17
- # Code-Centric-Align: A Post-Training Pipeline for Code LLMs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- This project presents a systematic study of the post-training engineering pipeline for code-specific large language models, using **Qwen2.5-Coder-7B** as the base model. It establishes a "diagnosable and iterative" framework covering the full lifecycle from data engineering to deployment.
20
 
21
  ## ๐Ÿ› ๏ธ Core Workflow
22
- * **Data Engineering**: Implemented streaming collection, three-layer quality filtering, and MinHashLSH-based fuzzy deduplication.
23
- * **Instruction Evolution**: Utilized DeepSeek APIs for Evol-Instruct difficulty enhancement and diversity expansion.
24
- * **Supervised Fine-Tuning (SFT)**: Applied QLoRA with a custom **Instruction Masking** strategy (QwenDataCollator) to ensure the model only learns from assistant responses.
25
- * **Rejection Sampling (RFT)**: Developed a high-throughput engine using vLLM for 10-path sampling, verified through a multi-process safe execution sandbox.
26
- * **Preference Alignment (DPO)**: Investigated Direct Preference Optimization, identifying critical failure modes such as length bias and low-quality negative samples.
27
- * **Quantization & Deployment**: Performed 4-bit activation-aware quantization (AutoAWQ) and deployed the model via a vLLM OpenAI-compatible API.
28
 
29
  ## ๐Ÿ“ˆ Experimental Results (HumanEval Pass@1)
30
  The project tracked performance gains and losses across multiple iterations:
31
- * **Base Model**: 0.628
32
- * **SFT v3 (released)**: **0.671 (+6.8%)** โ€” achieved through precise loss calculation and data cleaning.
33
- * **DPO Merged**: 0.280 โ€” highlighting the extreme sensitivity of code models to preference data quality.
34
 
35
  ## โš ๏ธ Status & Roadmap
36
  This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared.
 
7
  - qwen
8
  - sft
9
  - dpo
10
+ - peft
11
+ - lora
12
  metrics:
13
  - accuracy
14
  base_model:
15
  - Qwen/Qwen2.5-Coder-7B
16
+ library_name: peft
17
  ---
18
 
19
+ # Code-Centric-Align: A Post-Training Pipeline for Code LLMs (LoRA Adapter)
20
+
21
+ **Notice:** This repository provides a **LoRA Adapter** trained via QLoRA. It is designed to be loaded on top of the base model `Qwen/Qwen2.5-Coder-7B`.
22
+
23
+ This project presents a systematic study of the post-training engineering pipeline for code-specific large language models. It establishes a "diagnosable and iterative" framework covering the full lifecycle from data engineering to deployment.
24
+
25
+ ## ๐Ÿš€ Quick Start (Inference Example)
26
+
27
+ To use this LoRA adapter, you need to load the base model first and then attach the PEFT adapter. Ensure you have the required libraries installed:
28
+ ```bash
29
+ pip install transformers peft torch
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer
31
+ from peft import PeftModel
32
+ import torch
33
+
34
+ base_model_id = "Qwen/Qwen2.5-Coder-7B"
35
+ adapter_id = "abcsk123/Code-Centric-Align"
36
+
37
+ # 1. Load Tokenizer
38
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
39
+
40
+ # 2. Load Base Model
41
+ base_model = AutoModelForCausalLM.from_pretrained(
42
+ base_model_id,
43
+ torch_dtype=torch.bfloat16,
44
+ device_map="auto"
45
+ )
46
+
47
+ # 3. Attach LoRA Adapter
48
+ model = PeftModel.from_pretrained(base_model, adapter_id)
49
+
50
+ # 4. Generate Code
51
+ prompt = "def binary_search(arr, target):"
52
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
53
+
54
+ outputs = model.generate(**inputs, max_new_tokens=100)
55
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
56
+ ```
57
+ (Note: If your adapter files are located inside a specific checkpoint folder, e.g., checkpoint-4675, please add the argument subfolder="checkpoint-4675" to PeftModel.from_pretrained())
58
 
 
59
 
60
  ## ๐Ÿ› ๏ธ Core Workflow
61
+ Data Engineering: Implemented streaming collection, three-layer quality filtering, and MinHashLSH-based fuzzy deduplication.
62
+ Instruction Evolution: Utilized DeepSeek APIs for Evol-Instruct difficulty enhancement and diversity expansion.
63
+ Supervised Fine-Tuning (SFT): Applied QLoRA with a custom Instruction Masking strategy (QwenDataCollator) to ensure the model only learns from assistant responses.
64
+ Rejection Sampling (RFT): Developed a high-throughput engine using vLLM for 10-path sampling, verified through a multi-process safe execution sandbox.
65
+ Preference Alignment (DPO): Investigated Direct Preference Optimization, identifying critical failure modes such as length bias and low-quality negative samples.
66
+ Quantization & Deployment: Performed 4-bit activation-aware quantization (AutoAWQ) and deployed the model via a vLLM OpenAI-compatible API.
67
 
68
  ## ๐Ÿ“ˆ Experimental Results (HumanEval Pass@1)
69
  The project tracked performance gains and losses across multiple iterations:
70
+ Base Model: 0.628
71
+ SFT v3 (released): 0.671 (+6.8%) โ€” achieved through precise loss calculation and data cleaning.
72
+ DPO Merged: 0.280 โ€” highlighting the extreme sensitivity of code models to preference data quality.
73
 
74
  ## โš ๏ธ Status & Roadmap
75
  This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared.