--- license: apache-2.0 datasets: - bigcode/the-stack language: - es - en base_model: - openai-community/gpt2 pipeline_tag: text-generation tags: - code new_version: OpceanAI/Yuuki-the-best-model library_name: transformers --- ⚠️ ## Notice on Current Model Scope Please note that **Yuuki**, in its current state, represents **approximately 3.7%** of the total training planned for **version v0.1**. At this stage, Yuuki should be considered an **early and incomplete snapshot** of the model. The full **v0.1 release**, which will include the remaining training stages, additional refinements, and stabilization, will be released at a later time. As such, performance, behavior, or capability assessments based on the current version of Yuuki **do not reflect** the final characteristics of the v0.1 model. Further updates will be provided as development progresses. 🌸 Yuuki v0.1 - The $0 Code LLM > ⚠️ WORK IN PROGRESS - Currently training on mobile CPU (Day 3/42) 🎯 The Mission Prove that you DON'T need expensive GPUs to train LLMs. Yuuki is a code generation model trained entirely on a $150 Android phone with: ❌ No cloud compute ❌ No GPU ❌ No data center ✅ Just determination and time The Setup Hardware: Snapdragon 685 (8-core ARM CPU) RAM: 6GB Storage: 128GB NPU: Hexagon 686 (1 TOPS) GPU: Adreno 610 (243 GFLOPS) - NOT USED for training Cost: $0 in compute 📊 Current Status Metric Value Progress 1,417 / 37,500 steps (3.78%) Epoch 0.08 / 2.0 Current Loss ~1.70 - 2.23 Best Loss 1.7053 ⭐ Training Time ~3 days ETA ~39 days remaining Speed ~100 sec/step Loss Progression Step 0: Loss 3.35 (baseline) Step 500: Loss 2.50 ↓ -25% Step 1000: Loss 2.00 ↓ -40% Step 1265: Loss 1.83 ↓ -45% Step 1292: Loss 1.71 ↓ -49% ⭐ RECORD Step 1417: Loss 2.23 (current, oscillating 1.7-2.3) 🎓 What Yuuki Knows (So Far) Due to alphabetically-ordered dataset: Language Exposure Quality Status Agda High 85/100 ✅ Excellent C Starting 30/100 ⏳ Learning Assembly Low 5/100 🌱 Minimal Python None 0/100 ❌ Not reached yet Example Output (Step 1,300) Agda prompt: module Main where module Main where (x, f) in a open import Cubical.Sigma open import Cubical.Sigma.Core open import Cubical.Foundations.H ✅ Real Agda libraries! The model learned actual Cubical type theory modules. 🛠️ Training Configuration Model: DistilGPT-2 (82M parameters) Dataset: The Stack (75,000 examples) Batch size: 1 Gradient accumulation: 4 Effective batch: 4 Learning rate: 5e-5 Max length: 256 tokens Optimizer: AdamW Epochs: 2 Total tokens: ~30M (2 epochs) Why so slow? 100 seconds/step × 37,500 steps = 3,750,000 seconds = 1,042 hours = 43.4 days = ~6 weeks of continuous training No GPU acceleration. Pure CPU grinding. 💪 📈 Roadmap v0.1 (Current - Proof of Concept) [x] Setup training pipeline [x] Start training (Step 0) [x] Reach Step 1,000 [x] Break loss 2.0 barrier [x] Break loss 1.8 barrier ⭐ [ ] Checkpoint 2,500 (7%) [ ] Checkpoint 5,000 (13%) [ ] Checkpoint 10,000 (27%) [ ] Checkpoint 18,750 (50% - Epoch 1 complete) [ ] Checkpoint 37,500 (100% - DONE) [ ] Quantize to INT8 [ ] Convert to ONNX [ ] Publish final model ETA: Mid-March 2026 v0.2 (The Full Dataset) Dataset: 786,387 examples (full Stack) Duration: 418 days (~14 months) Epochs: 2.0 Total tokens: ~314M Dataset fix: SHUFFLED (not alphabetical) Languages: All 80+ languages balanced Start: March 2026 End: May 2027 v0.3+ (PC Era) Hardware upgrade: RTX 4060/4070 Larger models: 350M-1B parameters Faster training: ~30x speedup Advanced techniques: LoRA, QLoRA, etc. 💡 Philosophy "The barrier to AI isn't money. It's mindset." This project demonstrates: ✅ You CAN train LLMs without GPUs ✅ Patience > Hardware ✅ $0 budget is enough to start ✅ Limited resources inspire creativity ✅ Anyone can contribute to AI 🚀 Usage (After Training Completes) from transformers import AutoModelForCausalLM, AutoTokenizer # Load model model = AutoModelForCausalLM.from_pretrained("OpceanAI/Yuuki") tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki") # Generate code prompt = "def fibonacci(n):" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=100) code = tokenizer.decode(outputs[0]) print(code) Quantized (4x faster, 4x smaller) Coming after training completes model = AutoModelForCausalLM.from_pretrained( "OpceanAI/Yuuki", subfolder="yuuki-v0.1-int8" ) ⚠️ Known Limitations Dataset order: Alphabetical (not shuffled) - learns early languages best Token count: Only ~30M tokens (vs GPT-2's 40B) Training speed: Very slow (~100 sec/step) Model size: Small (82M params) Language coverage: Incomplete due to alphabetical ordering These will be addressed in v0.2 with shuffled dataset. 🔬 Technical Details CPU Training (100 sec/step): Forward pass: 40 sec Backward pass: 40 sec Optimizer: 20 sec Total: ~100 sec vs GPU Training (0.5 sec/step): 200x faster But costs $0.50-$2.00/hour 42 days = $500-$2,000 Mobile: FREE but SLOW GPU: FAST but EXPENSIVE For proof of concept: Mobile wins. 🏆 📊 Benchmarks (Post-Training) Coming soon after training completes (~March 2026). Expected performance: Agda: 85-95/100 (primary language) C: 85-92/100 (secondary language) Assembly: 75-85/100 (tertiary) Python: 10-20/100 (barely seen due to alphabet order) 🙏 Acknowledgments HuggingFace: Infrastructure and transformers library BigCode: The Stack dataset The ML community: For saying "you need GPUs" - best motivation 😏 📜 License Apache 2.0 - See LICENSE file. You can use Yuuki commercially, modify it, distribute it. Just give credit. ✅ 🔗 Links GitHub: (https://github.com/aguitauwu) Discord: (https://discord.gg/j8zV2u8k) Progress updates: Check this model card 📅 Updates 2026-01-29: Training started 2026-01-29: Step 1,000 reached - Loss 2.00 2026-01-29: Step 1,292 - NEW RECORD Loss 1.7053 2026-01-29: Repository created on HuggingFace Last updated: 2026-01-29 Follow the journey of training an LLM with $0 budget. One step at a time. 🌸