midorin-Linux commited on
Commit
dfbd05c
·
verified ·
1 Parent(s): 2d4deab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -7,4 +7,47 @@ datasets:
7
  - TeichAI/gpt-5.1-codex-max-1000x
8
  base_model:
9
  - unsloth/gpt-oss-20b
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - TeichAI/gpt-5.1-codex-max-1000x
8
  base_model:
9
  - unsloth/gpt-oss-20b
10
+ ---
11
+ # gpt-oss-20b-Coding-Distill
12
+ This project uses Unsloth for fine-tuning. All training data is converted to OpenAI Harmony format before training, but there may be cases where the output format doesn't conform to the OpenAI Harmony specification.
13
+
14
+ ## Overview
15
+ This project implements a sophisticated multi-phase fine-tuning pipeline for the GPT-OSS-20B model, leveraging conversation data from multiple state-of-the-art AI models to create a balanced, high-performance language model optimized for:
16
+
17
+ - **Advanced Coding** (via GPT-5.2-codex-max)
18
+ - **Complex Reasoning** (via Claude 4.5 Opus and GPT-5.2 high reasoning)
19
+ - **Balanced General Intelligence** (via Claude 4.5 Sonnet)
20
+
21
+ **Why This Approach?**
22
+ Traditional fine-tuning often suffers from:
23
+
24
+ - **Catastrophic forgetting** when training on sequential datasets
25
+ - **Imbalanced capabilities** from single-source training
26
+ - **Style inconsistencies** across different task types
27
+
28
+ Our multi-phase approach with strategic layer freezing, replay buffers, and EWC regularization addresses these challenges systematically.
29
+
30
+ ## Architecture
31
+ ```text
32
+ GPT-OSS-20B Base Model
33
+
34
+ ├─── Phase 1: Foundation Training
35
+ │ ├─ Data: GPT-5.2-codex-max (1000) + Claude 4.5 Opus (250) + Claude 4.5 Sonnet (250)
36
+ │ ├─ Layers: MLP + Attention
37
+ │ └─ Goal: Establish coding + reasoning foundation
38
+
39
+ ├─── Phase 1.5: Knowledge Consolidation (Week 5)
40
+ │ ├─ Data: Mixed replay of Phase 1 data
41
+ │ ├─ Layers: MLP + Attention
42
+ │ └─ Goal: Prevent early forgetting
43
+
44
+ ├─── Phase 2: Specialization Training (Weeks 6-8)
45
+ │ ├─ Data: Claude Sonnet (250) + GPT-5.2 high (250) + Replay (150)
46
+ │ ├─ Layers: MLP + Adapter
47
+ │ └─ Goal: Integrate balanced reasoning + maintain coding
48
+
49
+ └─── Phase 2.5: Gradual Unfreezing (Week 8, Optional)
50
+ ├─ Data: Full mixed dataset
51
+ ├─ Layers: Upper Attention layers + MLP + Adapter
52
+ └─ Goal: Fine-tune attention patterns if needed
53
+ ```