OpceanAI
/

Yuuki-3.7

@@ -1,223 +1,331 @@
----
 language:
-- code
 license: apache-2.0
 tags:
-- code-generation
-- mobile-training
-- pytorch
-- transformers
-- distilgpt2
-- zero-budget-ai
 datasets:
-- bigcode/the-stack-smol-xl
 metrics:
-- perplexity
 model-index:
-- name: Yuuki v0.1
-  results:
-  - task:
-      type: text-generation
-    dataset:
-      name: The Stack
-      type: bigcode/the-stack-smol-xl
 ---
-# 🌸 Yuuki v0.1 - The $0 Code LLM
-> **⚠️ WORK IN PROGRESS** - Currently training on mobile CPU (Day 3/42)
-## 🎯 The Mission
-**Prove that you DON'T need expensive GPUs to train LLMs.**
-Yuuki is a code generation model trained entirely on a **$150 Android phone** with:
-- ❌ No cloud compute
-- ❌ No GPU
-- ❌ No data center
-- ✅ Just determination and time
-### The Setup
 Hardware: Snapdragon 685 (8-core ARM CPU)
 RAM: 6GB
 Storage: 128GB
 NPU: Hexagon 686 (1 TOPS)
 GPU: Adreno 610 (243 GFLOPS) - NOT USED for training
 Cost: $0 in compute
-## 📊 Current Status
-| Metric | Value |
-|--------|-------|
-| **Progress** | 1,417 / 37,500 steps (3.78%) |
-| **Epoch** | 0.08 / 2.0 |
-| **Current Loss** | ~1.70 - 2.23 |
-| **Best Loss** | 1.7053 ⭐ |
-| **Training Time** | ~3 days |
-| **ETA** | ~39 days remaining |
-| **Speed** | ~100 sec/step |
-### Loss Progression
 Step    0: Loss 3.35  (baseline)
 Step  500: Loss 2.50  ↓ -25%
 Step 1000: Loss 2.00  ↓ -40%
 Step 1265: Loss 1.83  ↓ -45%
 Step 1292: Loss 1.71  ↓ -49% ⭐ RECORD
 Step 1417: Loss 2.23  (current, oscillating 1.7-2.3)
-## 🎓 What Yuuki Knows (So Far)
 Due to alphabetically-ordered dataset:
-| Language | Exposure | Quality | Status |
-|----------|----------|---------|--------|
-| **Agda** | High | 85/100 | ✅ Excellent |
-| **C** | Starting | 30/100 | ⏳ Learning |
-| **Assembly** | Low | 5/100 | 🌱 Minimal |
-| **Python** | None | 0/100 | ❌ Not reached yet |
-### Example Output (Step 1,300)
-**Agda prompt:** `module Main where`
-```agda
-module Main where (x, f) in a
-open import Cubical.Sigma
-open import Cubical.Sigma.Core
-open import Cubical.Foundations.H
-✅ Real Agda libraries! The model learned actual Cubical type theory modules.
-🛠️ Training Configuration
-Model: DistilGPT-2 (82M parameters)
-Dataset: The Stack (75,000 examples)
-Batch size: 1
-Gradient accumulation: 4
-Effective batch: 4
-Learning rate: 5e-5
-Max length: 256 tokens
-Optimizer: AdamW
-Epochs: 2
-Total tokens: ~30M (2 epochs)
-Why so slow?
-100 seconds/step × 37,500 steps = 3,750,000 seconds
-= 1,042 hours
-= 43.4 days
-= ~6 weeks of continuous training
-No GPU acceleration. Pure CPU grinding. 💪
-📈 Roadmap
-v0.1 (Current - Proof of Concept)
-[x] Setup training pipeline
-[x] Start training (Step 0)
-[x] Reach Step 1,000
-[x] Break loss 2.0 barrier
-[x] Break loss 1.8 barrier ⭐
-[ ] Checkpoint 2,500 (7%)
-[ ] Checkpoint 5,000 (13%)
-[ ] Checkpoint 10,000 (27%)
-[ ] Checkpoint 18,750 (50% - Epoch 1 complete)
-[ ] Checkpoint 37,500 (100% - DONE)
-[ ] Quantize to INT8
-[ ] Convert to ONNX
-[ ] Publish final model
-ETA: Mid-March 2026
-v0.2 (The Full Dataset)
-Dataset: 786,387 examples (full Stack)
-Duration: 418 days (~14 months)
-Epochs: 2.0
-Total tokens: ~314M
-Dataset fix: SHUFFLED (not alphabetical)
-Languages: All 80+ languages balanced
-Start: March 2026
-End: May 2027
-v0.3+ (PC Era)
-Hardware upgrade: RTX 4060/4070
-Larger models: 350M-1B parameters
-Faster training: ~30x speedup
-Advanced techniques: LoRA, QLoRA, etc.
-💡 Philosophy
-"The barrier to AI isn't money. It's mindset."
-This project demonstrates:
-✅ You CAN train LLMs without GPUs
-✅ Patience > Hardware
-✅ $0 budget is enough to start
-✅ Limited resources inspire creativity
-✅ Anyone can contribute to AI
-The Statement vs The Execution
-v0.1-v0.2 (Mobile): "You don't need expensive hardware"
-v0.3+ (PC): "Now let's build something competitive"
-Start with what you have. Upgrade when you can. Never let hardware stop you.
-🚀 Usage (After Training Completes)
-Basic Usage
-from transformers import AutoModelForCausalLM, AutoTokenizer
-# Load model
-model = AutoModelForCausalLM.from_pretrained("OpceanAI/Yuuki")
-tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki")
-# Generate code
-prompt = "def fibonacci(n):"
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=100)
-code = tokenizer.decode(outputs[0])
-print(code)
-Quantized (4x faster, 4x smaller)
-# Coming after training completes
-model = AutoModelForCausalLM.from_pretrained(
-    "OpceanAI/Yuuki",
-    subfolder="yuuki-v0.1-int8"
-)
-⚠️ Known Limitations
-Dataset order: Alphabetical (not shuffled) - learns early languages best
-Token count: Only ~30M tokens (vs GPT-2's 40B)
-Training speed: Very slow (~100 sec/step)
-Model size: Small (82M params)
-Language coverage: Incomplete due to alphabetical ordering
-These will be addressed in v0.2 with shuffled dataset.
-🔬 Technical Details
-Why Mobile Training Works
-CPU Training (100 sec/step):
-- Forward pass: 40 sec
-- Backward pass: 40 sec
-- Optimizer: 20 sec
-Total: ~100 sec
-vs GPU Training (0.5 sec/step):
-- 200x faster
-- But costs $0.50-$2.00/hour
-- 42 days = $500-$2,000
-Mobile: FREE but SLOW
-GPU: FAST but EXPENSIVE
-For proof of concept: Mobile wins. 🏆
-Training Challenges Overcome
-Memory management: Gradient accumulation (4 steps)
-Thermal throttling: Periodic breaks, room cooling
-Battery life: Always plugged in
-Storage: Careful checkpoint management
-Interruptions: Resume from checkpoints
-Patience: 100 sec/step × 37,500 = mental fortitude
-📊 Benchmarks (Post-Training)
-Coming soon after training completes (~March 2026).
-Expected performance:
-Agda: 85-95/100 (primary language)
-C: 85-92/100 (secondary language)
-Assembly: 75-85/100 (tertiary)
-Python: 10-20/100 (barely seen due to alphabet order)
-🙏 Acknowledgments
-Anthropic Claude: Technical guidance and debugging assistance
-HuggingFace: Infrastructure and transformers library
-BigCode: The Stack dataset
-The ML community: For saying "you need GPUs" - best motivation 😏
-📜 License
-Apache 2.0 - See LICENSE file.
-You can use Yuuki commercially, modify it, distribute it. Just give credit. ✅
-🔗 Links
-GitHub: (Coming soon)
-Twitter: (Coming soon)
-Progress updates: Check this model card
-📅 Updates
-2026-01-29: Training started
-2026-01-29: Step 1,000 reached - Loss 2.00
-2026-01-29: Step 1,292 - NEW RECORD Loss 1.7053
-2026-01-29: Repository created on HuggingFace
-Last updated: 2026-01-29
 Follow the journey of training an LLM with $0 budget. One step at a time. 🌸

 language:
+code
 license: apache-2.0
 tags:
+code-generation
+mobile-training
+pytorch
+transformers
+distilgpt2
+zero-budget-ai
 datasets:
+bigcode/the-stack-smol-xl
 metrics:
+perplexity
 model-index:
+name: Yuuki v0.1
+results:
+task:
+type: text-generation
+dataset:
+name: The Stack
+type: bigcode/the-stack-smol-xl
+metrics:
+name: perplexity
+type: perplexity
+value: 5.50
 ---
+🌸 Yuuki v0.1 - The $0 Code LLM
+> ⚠️ WORK IN PROGRESS - Currently training on mobile CPU (Day 3/42)
+🎯 The Mission
+Prove that you DON'T need expensive GPUs to train LLMs.
+Yuuki is a code generation model trained entirely on a $150 Android phone with:
+❌ No cloud compute
+❌ No GPU
+❌ No data center
+✅ Just determination and time
+The Setup
 Hardware: Snapdragon 685 (8-core ARM CPU)
 RAM: 6GB
 Storage: 128GB
 NPU: Hexagon 686 (1 TOPS)
 GPU: Adreno 610 (243 GFLOPS) - NOT USED for training
 Cost: $0 in compute
+📊 Current Status
+Metric	Value
+Progress	1,417 / 37,500 steps (3.78%)
+Epoch	0.08 / 2.0
+Current Loss	~1.70 - 2.23
+Best Loss	1.7053 ⭐
+Training Time	~3 days
+ETA	~39 days remaining
+Speed	~100 sec/step
+Loss Progression
 Step    0: Loss 3.35  (baseline)
 Step  500: Loss 2.50  ↓ -25%
 Step 1000: Loss 2.00  ↓ -40%
 Step 1265: Loss 1.83  ↓ -45%
 Step 1292: Loss 1.71  ↓ -49% ⭐ RECORD
 Step 1417: Loss 2.23  (current, oscillating 1.7-2.3)
+🎓 What Yuuki Knows (So Far)
 Due to alphabetically-ordered dataset:
+Language	Exposure	Quality	Status
+Agda	High	85/100	✅ Excellent
+C	Starting	30/100	⏳ Learning
+Assembly	Low	5/100	🌱 Minimal
+Python	None	0/100	❌ Not reached yet
+Example Output (Step 1,300)
+Agda prompt: module Main where
+module Main where (x, f) in a
+open import Cubical.Sigma
+open import Cubical.Sigma.Core
+open import Cubical.Foundations.H
+✅ Real Agda libraries! The model learned actual Cubical type theory modules.
+🛠️ Training Configuration
+Model: DistilGPT-2 (82M parameters)
+Dataset: The Stack (75,000 examples)
+Batch size: 1
+Gradient accumulation: 4
+Effective batch: 4
+Learning rate: 5e-5
+Max length: 256 tokens
+Optimizer: AdamW
+Epochs: 2
+Total tokens: ~30M (2 epochs)
+Why so slow?
+100 seconds/step × 37,500 steps = 3,750,000 seconds
+= 1,042 hours
+= 43.4 days
+= ~6 weeks of continuous training
+No GPU acceleration. Pure CPU grinding. 💪
+📈 Roadmap
+v0.1 (Current - Proof of Concept)
+[x] Setup training pipeline
+[x] Start training (Step 0)
+[x] Reach Step 1,000
+[x] Break loss 2.0 barrier
+[x] Break loss 1.8 barrier ⭐
+[ ] Checkpoint 2,500 (7%)
+[ ] Checkpoint 5,000 (13%)
+[ ] Checkpoint 10,000 (27%)
+[ ] Checkpoint 18,750 (50% - Epoch 1 complete)
+[ ] Checkpoint 37,500 (100% - DONE)
+[ ] Quantize to INT8
+[ ] Convert to ONNX
+[ ] Publish final model
+ETA: Mid-March 2026
+v0.2 (The Full Dataset)
+Dataset: 786,387 examples (full Stack)
+Duration: 418 days (~14 months)
+Epochs: 2.0
+Total tokens: ~314M
+Dataset fix: SHUFFLED (not alphabetical)
+Languages: All 80+ languages balanced
+Start: March 2026
+End: May 2027
+v0.3+ (PC Era)
+Hardware upgrade: RTX 4060/4070
+Larger models: 350M-1B parameters
+Faster training: ~30x speedup
+Advanced techniques: LoRA, QLoRA, etc.
+💡 Philosophy
+"The barrier to AI isn't money. It's mindset."
+This project demonstrates: ✅ You CAN train LLMs without GPUs
+✅ Patience > Hardware
+✅ $0 budget is enough to start
+✅ Limited resources inspire creativity
+✅ Anyone can contribute to AI
+🚀 Usage (After Training Completes)
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load model
+model = AutoModelForCausalLM.from_pretrained("OpceanAI/Yuuki")
+tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki")
+# Generate code
+prompt = "def fibonacci(n):"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=100)
+code = tokenizer.decode(outputs[0])
+print(code)
+Quantized (4x faster, 4x smaller)
+Coming after training completes
+model = AutoModelForCausalLM.from_pretrained(
+    "OpceanAI/Yuuki",
+    subfolder="yuuki-v0.1-int8"
+)
+⚠️ Known Limitations
+Dataset order: Alphabetical (not shuffled) - learns early languages best
+Token count: Only ~30M tokens (vs GPT-2's 40B)
+Training speed: Very slow (~100 sec/step)
+Model size: Small (82M params)
+Language coverage: Incomplete due to alphabetical ordering
+These will be addressed in v0.2 with shuffled dataset.
+🔬 Technical Details
+CPU Training (100 sec/step):
+Forward pass: 40 sec
+Backward pass: 40 sec
+Optimizer: 20 sec
+Total: ~100 sec
+vs GPU Training (0.5 sec/step):
+200x faster
+But costs $0.50-$2.00/hour
+42 days = $500-$2,000
+Mobile: FREE but SLOW
+GPU: FAST but EXPENSIVE
+For proof of concept: Mobile wins. 🏆
+📊 Benchmarks (Post-Training)
+Coming soon after training completes (~March 2026).
+Expected performance:
+Agda: 85-95/100 (primary language)
+C: 85-92/100 (secondary language)
+Assembly: 75-85/100 (tertiary)
+Python: 10-20/100 (barely seen due to alphabet order)
+🙏 Acknowledgments
+HuggingFace: Infrastructure and transformers library
+BigCode: The Stack dataset
+The ML community: For saying "you need GPUs" - best motivation 😏
+📜 License
+Apache 2.0 - See LICENSE file. You can use Yuuki commercially, modify it, distribute it. Just give credit. ✅
+🔗 Links
+GitHub: (https://github.com/aguitauwu)
+Discord: (https://discord.gg/j8zV2u8k)
+Progress updates: Check this model card
+📅 Updates
+2026-01-29: Training started
+2026-01-29: Step 1,000 reached - Loss 2.00
+2026-01-29: Step 1,292 - NEW RECORD Loss 1.7053
+2026-01-29: Repository created on HuggingFace
+Last updated: 2026-01-29
 Follow the journey of training an LLM with $0 budget. One step at a time. 🌸