Update README.md
Browse files
README.md
CHANGED
|
@@ -7,4 +7,47 @@ datasets:
|
|
| 7 |
- TeichAI/gpt-5.1-codex-max-1000x
|
| 8 |
base_model:
|
| 9 |
- unsloth/gpt-oss-20b
|
| 10 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
- TeichAI/gpt-5.1-codex-max-1000x
|
| 8 |
base_model:
|
| 9 |
- unsloth/gpt-oss-20b
|
| 10 |
+
---
|
| 11 |
+
# gpt-oss-20b-Coding-Distill
|
| 12 |
+
This project uses Unsloth for fine-tuning. All training data is converted to OpenAI Harmony format before training, but there may be cases where the output format doesn't conform to the OpenAI Harmony specification.
|
| 13 |
+
|
| 14 |
+
## Overview
|
| 15 |
+
This project implements a sophisticated multi-phase fine-tuning pipeline for the GPT-OSS-20B model, leveraging conversation data from multiple state-of-the-art AI models to create a balanced, high-performance language model optimized for:
|
| 16 |
+
|
| 17 |
+
- **Advanced Coding** (via GPT-5.2-codex-max)
|
| 18 |
+
- **Complex Reasoning** (via Claude 4.5 Opus and GPT-5.2 high reasoning)
|
| 19 |
+
- **Balanced General Intelligence** (via Claude 4.5 Sonnet)
|
| 20 |
+
|
| 21 |
+
**Why This Approach?**
|
| 22 |
+
Traditional fine-tuning often suffers from:
|
| 23 |
+
|
| 24 |
+
- **Catastrophic forgetting** when training on sequential datasets
|
| 25 |
+
- **Imbalanced capabilities** from single-source training
|
| 26 |
+
- **Style inconsistencies** across different task types
|
| 27 |
+
|
| 28 |
+
Our multi-phase approach with strategic layer freezing, replay buffers, and EWC regularization addresses these challenges systematically.
|
| 29 |
+
|
| 30 |
+
## Architecture
|
| 31 |
+
```text
|
| 32 |
+
GPT-OSS-20B Base Model
|
| 33 |
+
│
|
| 34 |
+
├─── Phase 1: Foundation Training
|
| 35 |
+
│ ├─ Data: GPT-5.2-codex-max (1000) + Claude 4.5 Opus (250) + Claude 4.5 Sonnet (250)
|
| 36 |
+
│ ├─ Layers: MLP + Attention
|
| 37 |
+
│ └─ Goal: Establish coding + reasoning foundation
|
| 38 |
+
│
|
| 39 |
+
├─── Phase 1.5: Knowledge Consolidation (Week 5)
|
| 40 |
+
│ ├─ Data: Mixed replay of Phase 1 data
|
| 41 |
+
│ ├─ Layers: MLP + Attention
|
| 42 |
+
│ └─ Goal: Prevent early forgetting
|
| 43 |
+
│
|
| 44 |
+
├─── Phase 2: Specialization Training (Weeks 6-8)
|
| 45 |
+
│ ├─ Data: Claude Sonnet (250) + GPT-5.2 high (250) + Replay (150)
|
| 46 |
+
│ ├─ Layers: MLP + Adapter
|
| 47 |
+
│ └─ Goal: Integrate balanced reasoning + maintain coding
|
| 48 |
+
│
|
| 49 |
+
└─── Phase 2.5: Gradual Unfreezing (Week 8, Optional)
|
| 50 |
+
├─ Data: Full mixed dataset
|
| 51 |
+
├─ Layers: Upper Attention layers + MLP + Adapter
|
| 52 |
+
└─ Goal: Fine-tune attention patterns if needed
|
| 53 |
+
```
|