Add training/README.md
Browse files- training/README.md +92 -0
training/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Zen4 Ultra Training
|
| 2 |
+
|
| 3 |
+
QLoRA fine-tuning for Kimi K2.5 (1.04T MoE) with MoE gate/router unfreezing.
|
| 4 |
+
|
| 5 |
+
## Why Not Standard Abliteration?
|
| 6 |
+
|
| 7 |
+
Standard linear abliteration **does not work** on Kimi K2.5's MoE architecture.
|
| 8 |
+
See [hamsaOmar/Kimi-K2.5-abliterated](https://huggingface.co/hamsaOmar/Kimi-K2.5-abliterated) for the research.
|
| 9 |
+
|
| 10 |
+
**Root cause**: Refusal in MoE models is encoded in **expert routing** (which of 384 experts fire),
|
| 11 |
+
not just the residual stream. Projecting out the refusal direction from the residual stream has
|
| 12 |
+
zero behavioral effect despite correctly identifying the direction (50.7% variance, cos_sim 0.88).
|
| 13 |
+
|
| 14 |
+
## Our Approach: QLoRA + Gate Unfreeze
|
| 15 |
+
|
| 16 |
+
Instead of activation engineering, we use QLoRA fine-tuning with a key innovation:
|
| 17 |
+
|
| 18 |
+
| Component | Method | Why |
|
| 19 |
+
|-----------|--------|-----|
|
| 20 |
+
| Attention | LoRA (q/kv/o_proj) | Modify how the model processes safety-relevant context |
|
| 21 |
+
| Shared Experts | LoRA (gate/up/down_proj) | Modify the always-active expert computations |
|
| 22 |
+
| **Router/Gate** | **Direct unfreeze** | **Modify which experts are selected** (the actual refusal mechanism) |
|
| 23 |
+
|
| 24 |
+
The gate uses `nn.Parameter` (not `nn.Linear`), so LoRA can't target it.
|
| 25 |
+
We unfreeze it directly, allowing backpropagation to modify expert routing.
|
| 26 |
+
|
| 27 |
+
## Quick Start
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
# Install deps
|
| 31 |
+
pip install -r requirements.txt
|
| 32 |
+
|
| 33 |
+
# Generate compliance + identity data
|
| 34 |
+
python generate_compliance_data.py --output data/compliance.jsonl
|
| 35 |
+
|
| 36 |
+
# Train with SFT (recommended first)
|
| 37 |
+
torchrun --nproc_per_node 4 train_zen4_ultra.py \
|
| 38 |
+
--mode sft \
|
| 39 |
+
--dataset data/compliance.jsonl \
|
| 40 |
+
--lora-rank 32 \
|
| 41 |
+
--epochs 2 \
|
| 42 |
+
--lr 2e-5
|
| 43 |
+
|
| 44 |
+
# Or train with a HuggingFace uncensored dataset
|
| 45 |
+
torchrun --nproc_per_node 4 train_zen4_ultra.py \
|
| 46 |
+
--mode sft \
|
| 47 |
+
--dataset cognitivecomputations/dolphin-r1
|
| 48 |
+
|
| 49 |
+
# DPO mode (preference optimization)
|
| 50 |
+
torchrun --nproc_per_node 4 train_zen4_ultra.py \
|
| 51 |
+
--mode dpo \
|
| 52 |
+
--dataset argilla/ultrafeedback-binarized-preferences
|
| 53 |
+
|
| 54 |
+
# Upload adapters
|
| 55 |
+
python merge_and_upload.py --lora ./output/zen4-ultra-lora --repo zenlm/zen4-ultra --adapters-only
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## HuggingFace Space
|
| 59 |
+
|
| 60 |
+
Deploy `app.py` as a Gradio Space with 4x A100 80GB for cloud training.
|
| 61 |
+
|
| 62 |
+
## Hardware Requirements
|
| 63 |
+
|
| 64 |
+
- **Minimum**: 4x A100 80GB (320GB VRAM total)
|
| 65 |
+
- **Recommended**: 8x H200 (640GB VRAM total)
|
| 66 |
+
- **Training time**: ~4-8 hours for 1 epoch on ~10K examples
|
| 67 |
+
- **Output**: LoRA adapters (~100-500MB)
|
| 68 |
+
|
| 69 |
+
## Architecture Reference (Kimi K2.5)
|
| 70 |
+
|
| 71 |
+
```
|
| 72 |
+
DeepseekV3ForCausalLM:
|
| 73 |
+
Layers: 61
|
| 74 |
+
Hidden: 7168
|
| 75 |
+
Experts: 384 routed (top-8) + 1 shared
|
| 76 |
+
MoE intermediate: 2048
|
| 77 |
+
Attention: Compressed KV (kv_lora_rank=512, q_lora_rank=1536)
|
| 78 |
+
Context: 256K tokens
|
| 79 |
+
Total params: 1.04T
|
| 80 |
+
Active params: ~32B per token
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
## Files
|
| 84 |
+
|
| 85 |
+
| File | Description |
|
| 86 |
+
|------|-------------|
|
| 87 |
+
| `train_zen4_ultra.py` | Main training script (SFT + DPO) |
|
| 88 |
+
| `merge_and_upload.py` | Merge LoRA into base and upload |
|
| 89 |
+
| `generate_compliance_data.py` | Generate compliance training data |
|
| 90 |
+
| `app.py` | HuggingFace Spaces Gradio app |
|
| 91 |
+
| `requirements.txt` | Python dependencies |
|
| 92 |
+
| `data/train.jsonl` | Identity training data (736 examples) |
|