midorin-Linux
/

gpt-oss-20b-Coding-Distill

Text Generation

Model card Files Files and versions

midorin-Linux commited on Jan 9

Commit

dfbd05c

·

verified ·

1 Parent(s): 2d4deab

Update README.md

Files changed (1) hide show

README.md +44 -1

README.md CHANGED Viewed

@@ -7,4 +7,47 @@ datasets:
 - TeichAI/gpt-5.1-codex-max-1000x
 base_model:
 - unsloth/gpt-oss-20b
----

 - TeichAI/gpt-5.1-codex-max-1000x
 base_model:
 - unsloth/gpt-oss-20b
+---
+# gpt-oss-20b-Coding-Distill
+This project uses Unsloth for fine-tuning. All training data is converted to OpenAI Harmony format before training, but there may be cases where the output format doesn't conform to the OpenAI Harmony specification.
+## Overview
+This project implements a sophisticated multi-phase fine-tuning pipeline for the GPT-OSS-20B model, leveraging conversation data from multiple state-of-the-art AI models to create a balanced, high-performance language model optimized for:
+- **Advanced Coding** (via GPT-5.2-codex-max)
+- **Complex Reasoning** (via Claude 4.5 Opus and GPT-5.2 high reasoning)
+- **Balanced General Intelligence** (via Claude 4.5 Sonnet)
+**Why This Approach?**
+Traditional fine-tuning often suffers from:
+- **Catastrophic forgetting** when training on sequential datasets
+- **Imbalanced capabilities** from single-source training
+- **Style inconsistencies** across different task types
+Our multi-phase approach with strategic layer freezing, replay buffers, and EWC regularization addresses these challenges systematically.
+## Architecture
+```text
+GPT-OSS-20B Base Model
+│
+├─── Phase 1: Foundation Training
+│    ├─ Data: GPT-5.2-codex-max (1000) + Claude 4.5 Opus (250) + Claude 4.5 Sonnet (250)
+│    ├─ Layers: MLP + Attention
+│    └─ Goal: Establish coding + reasoning foundation
+│
+├─── Phase 1.5: Knowledge Consolidation (Week 5)
+│    ├─ Data: Mixed replay of Phase 1 data
+│    ├─ Layers: MLP + Attention
+│    └─ Goal: Prevent early forgetting
+│
+├─── Phase 2: Specialization Training (Weeks 6-8)
+│    ├─ Data: Claude Sonnet (250) + GPT-5.2 high (250) + Replay (150)
+│    ├─ Layers: MLP + Adapter
+│    └─ Goal: Integrate balanced reasoning + maintain coding
+│
+└─── Phase 2.5: Gradual Unfreezing (Week 8, Optional)
+     ├─ Data: Full mixed dataset
+     ├─ Layers: Upper Attention layers + MLP + Adapter
+     └─ Goal: Fine-tune attention patterns if needed
+```