midorin-Linux
/

gpt-oss-20b-Coding-Distill-GGUF

Text Generation

Model card Files Files and versions

midorin-Linux commited on 11 days ago

Commit

6d26e10

·

verified ·

1 Parent(s): 939bd3a

Update README.md

Files changed (1) hide show

README.md +0 -29

README.md CHANGED Viewed

@@ -38,32 +38,3 @@ Traditional fine-tuning often suffers from:
 - **Catastrophic forgetting** when training on sequential datasets
 - **Imbalanced capabilities** from single-source training
 - **Style inconsistencies** across different task types
-Our multi-phase approach with strategic layer freezing, replay buffers, and EWC regularization addresses these challenges systematically.
-## Architecture
-```text
-GPT-OSS-20B Base Model
-│
-├─── Phase 1: Foundation Training
-│    ├─ Data: GPT-5.2-codex-max (1000) + Claude 4.5 Opus (250) + Claude 4.5 Sonnet (250)
-│    ├─ Layers: MLP + Attention
-│    └─ Goal: Establish coding + reasoning foundation
-│
-├─── Phase 1.5: Knowledge Consolidation
-│    ├─ Data: Mixed replay of Phase 1 data
-│    ├─ Layers: MLP + Attention
-│    └─ Goal: Prevent early forgetting
-│
-├─── Phase 2: Specialization Training
-│    ├─ Data: Claude Sonnet (250) + GPT-5.2 high (250) + Replay (150)
-│    ├─ Layers: MLP + Adapter
-│    └─ Goal: Integrate balanced reasoning + maintain coding
-│
-└─── Phase 2.5: Gradual Unfreezing
-     ├─ Data: Full mixed dataset
-     ├─ Layers: Upper Attention layers + MLP + Adapter
-     └─ Goal: Fine-tune attention patterns if needed
-```

 - **Catastrophic forgetting** when training on sequential datasets
 - **Imbalanced capabilities** from single-source training
 - **Style inconsistencies** across different task types