zeekay commited on
Commit
684bae5
·
verified ·
1 Parent(s): 427fa01

Add training/README.md

Browse files
Files changed (1) hide show
  1. training/README.md +92 -0
training/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Zen4 Ultra Training
2
+
3
+ QLoRA fine-tuning for Kimi K2.5 (1.04T MoE) with MoE gate/router unfreezing.
4
+
5
+ ## Why Not Standard Abliteration?
6
+
7
+ Standard linear abliteration **does not work** on Kimi K2.5's MoE architecture.
8
+ See [hamsaOmar/Kimi-K2.5-abliterated](https://huggingface.co/hamsaOmar/Kimi-K2.5-abliterated) for the research.
9
+
10
+ **Root cause**: Refusal in MoE models is encoded in **expert routing** (which of 384 experts fire),
11
+ not just the residual stream. Projecting out the refusal direction from the residual stream has
12
+ zero behavioral effect despite correctly identifying the direction (50.7% variance, cos_sim 0.88).
13
+
14
+ ## Our Approach: QLoRA + Gate Unfreeze
15
+
16
+ Instead of activation engineering, we use QLoRA fine-tuning with a key innovation:
17
+
18
+ | Component | Method | Why |
19
+ |-----------|--------|-----|
20
+ | Attention | LoRA (q/kv/o_proj) | Modify how the model processes safety-relevant context |
21
+ | Shared Experts | LoRA (gate/up/down_proj) | Modify the always-active expert computations |
22
+ | **Router/Gate** | **Direct unfreeze** | **Modify which experts are selected** (the actual refusal mechanism) |
23
+
24
+ The gate uses `nn.Parameter` (not `nn.Linear`), so LoRA can't target it.
25
+ We unfreeze it directly, allowing backpropagation to modify expert routing.
26
+
27
+ ## Quick Start
28
+
29
+ ```bash
30
+ # Install deps
31
+ pip install -r requirements.txt
32
+
33
+ # Generate compliance + identity data
34
+ python generate_compliance_data.py --output data/compliance.jsonl
35
+
36
+ # Train with SFT (recommended first)
37
+ torchrun --nproc_per_node 4 train_zen4_ultra.py \
38
+ --mode sft \
39
+ --dataset data/compliance.jsonl \
40
+ --lora-rank 32 \
41
+ --epochs 2 \
42
+ --lr 2e-5
43
+
44
+ # Or train with a HuggingFace uncensored dataset
45
+ torchrun --nproc_per_node 4 train_zen4_ultra.py \
46
+ --mode sft \
47
+ --dataset cognitivecomputations/dolphin-r1
48
+
49
+ # DPO mode (preference optimization)
50
+ torchrun --nproc_per_node 4 train_zen4_ultra.py \
51
+ --mode dpo \
52
+ --dataset argilla/ultrafeedback-binarized-preferences
53
+
54
+ # Upload adapters
55
+ python merge_and_upload.py --lora ./output/zen4-ultra-lora --repo zenlm/zen4-ultra --adapters-only
56
+ ```
57
+
58
+ ## HuggingFace Space
59
+
60
+ Deploy `app.py` as a Gradio Space with 4x A100 80GB for cloud training.
61
+
62
+ ## Hardware Requirements
63
+
64
+ - **Minimum**: 4x A100 80GB (320GB VRAM total)
65
+ - **Recommended**: 8x H200 (640GB VRAM total)
66
+ - **Training time**: ~4-8 hours for 1 epoch on ~10K examples
67
+ - **Output**: LoRA adapters (~100-500MB)
68
+
69
+ ## Architecture Reference (Kimi K2.5)
70
+
71
+ ```
72
+ DeepseekV3ForCausalLM:
73
+ Layers: 61
74
+ Hidden: 7168
75
+ Experts: 384 routed (top-8) + 1 shared
76
+ MoE intermediate: 2048
77
+ Attention: Compressed KV (kv_lora_rank=512, q_lora_rank=1536)
78
+ Context: 256K tokens
79
+ Total params: 1.04T
80
+ Active params: ~32B per token
81
+ ```
82
+
83
+ ## Files
84
+
85
+ | File | Description |
86
+ |------|-------------|
87
+ | `train_zen4_ultra.py` | Main training script (SFT + DPO) |
88
+ | `merge_and_upload.py` | Merge LoRA into base and upload |
89
+ | `generate_compliance_data.py` | Generate compliance training data |
90
+ | `app.py` | HuggingFace Spaces Gradio app |
91
+ | `requirements.txt` | Python dependencies |
92
+ | `data/train.jsonl` | Identity training data (736 examples) |