Yuuki-3.7 / README.md
OpceanAI's picture
Update README.md
eb97ccd verified
---
license: apache-2.0
datasets:
- bigcode/the-stack
language:
- es
- en
base_model:
- openai-community/gpt2
pipeline_tag: text-generation
tags:
- code
new_version: OpceanAI/Yuuki-the-best-model
library_name: transformers
---
⚠️ ## Notice on Current Model Scope
Please note that **Yuuki**, in its current state, represents **approximately 3.7%** of the total training planned for **version v0.1**.
At this stage, Yuuki should be considered an **early and incomplete snapshot** of the model. The full **v0.1 release**, which will include the remaining training stages, additional refinements, and stabilization, will be released at a later time.
As such, performance, behavior, or capability assessments based on the current version of Yuuki **do not reflect** the final characteristics of the v0.1 model.
Further updates will be provided as development progresses.
🌸 Yuuki v0.1 - The $0 Code LLM
> ⚠️ WORK IN PROGRESS - Currently training on mobile CPU (Day 3/42)
🎯 The Mission
Prove that you DON'T need expensive GPUs to train LLMs.
Yuuki is a code generation model trained entirely on a $150 Android phone with:
❌ No cloud compute
❌ No GPU
❌ No data center
βœ… Just determination and time
The Setup
Hardware: Snapdragon 685 (8-core ARM CPU)
RAM: 6GB
Storage: 128GB
NPU: Hexagon 686 (1 TOPS)
GPU: Adreno 610 (243 GFLOPS) - NOT USED for training
Cost: $0 in compute
πŸ“Š Current Status
Metric Value
Progress 1,417 / 37,500 steps (3.78%)
Epoch 0.08 / 2.0
Current Loss ~1.70 - 2.23
Best Loss 1.7053 ⭐
Training Time ~3 days
ETA ~39 days remaining
Speed ~100 sec/step
Loss Progression
Step 0: Loss 3.35 (baseline)
Step 500: Loss 2.50 ↓ -25%
Step 1000: Loss 2.00 ↓ -40%
Step 1265: Loss 1.83 ↓ -45%
Step 1292: Loss 1.71 ↓ -49% ⭐ RECORD
Step 1417: Loss 2.23 (current, oscillating 1.7-2.3)
πŸŽ“ What Yuuki Knows (So Far)
Due to alphabetically-ordered dataset:
Language Exposure Quality Status
Agda High 85/100 βœ… Excellent
C Starting 30/100 ⏳ Learning
Assembly Low 5/100 🌱 Minimal
Python None 0/100 ❌ Not reached yet
Example Output (Step 1,300)
Agda prompt: module Main where
module Main where (x, f) in a
open import Cubical.Sigma
open import Cubical.Sigma.Core
open import Cubical.Foundations.H
βœ… Real Agda libraries! The model learned actual Cubical type theory modules.
πŸ› οΈ Training Configuration
Model: DistilGPT-2 (82M parameters)
Dataset: The Stack (75,000 examples)
Batch size: 1
Gradient accumulation: 4
Effective batch: 4
Learning rate: 5e-5
Max length: 256 tokens
Optimizer: AdamW
Epochs: 2
Total tokens: ~30M (2 epochs)
Why so slow?
100 seconds/step Γ— 37,500 steps = 3,750,000 seconds
= 1,042 hours
= 43.4 days
= ~6 weeks of continuous training
No GPU acceleration. Pure CPU grinding. πŸ’ͺ
πŸ“ˆ Roadmap
v0.1 (Current - Proof of Concept)
[x] Setup training pipeline
[x] Start training (Step 0)
[x] Reach Step 1,000
[x] Break loss 2.0 barrier
[x] Break loss 1.8 barrier ⭐
[ ] Checkpoint 2,500 (7%)
[ ] Checkpoint 5,000 (13%)
[ ] Checkpoint 10,000 (27%)
[ ] Checkpoint 18,750 (50% - Epoch 1 complete)
[ ] Checkpoint 37,500 (100% - DONE)
[ ] Quantize to INT8
[ ] Convert to ONNX
[ ] Publish final model
ETA: Mid-March 2026
v0.2 (The Full Dataset)
Dataset: 786,387 examples (full Stack)
Duration: 418 days (~14 months)
Epochs: 2.0
Total tokens: ~314M
Dataset fix: SHUFFLED (not alphabetical)
Languages: All 80+ languages balanced
Start: March 2026
End: May 2027
v0.3+ (PC Era)
Hardware upgrade: RTX 4060/4070
Larger models: 350M-1B parameters
Faster training: ~30x speedup
Advanced techniques: LoRA, QLoRA, etc.
πŸ’‘ Philosophy
"The barrier to AI isn't money. It's mindset."
This project demonstrates: βœ… You CAN train LLMs without GPUs
βœ… Patience > Hardware
βœ… $0 budget is enough to start
βœ… Limited resources inspire creativity
βœ… Anyone can contribute to AI
πŸš€ Usage (After Training Completes)
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained("OpceanAI/Yuuki")
tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki")
# Generate code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
code = tokenizer.decode(outputs[0])
print(code)
Quantized (4x faster, 4x smaller)
Coming after training completes
model = AutoModelForCausalLM.from_pretrained(
"OpceanAI/Yuuki",
subfolder="yuuki-v0.1-int8"
)
⚠️ Known Limitations
Dataset order: Alphabetical (not shuffled) - learns early languages best
Token count: Only ~30M tokens (vs GPT-2's 40B)
Training speed: Very slow (~100 sec/step)
Model size: Small (82M params)
Language coverage: Incomplete due to alphabetical ordering
These will be addressed in v0.2 with shuffled dataset.
πŸ”¬ Technical Details
CPU Training (100 sec/step):
Forward pass: 40 sec
Backward pass: 40 sec
Optimizer: 20 sec
Total: ~100 sec
vs GPU Training (0.5 sec/step):
200x faster
But costs $0.50-$2.00/hour
42 days = $500-$2,000
Mobile: FREE but SLOW
GPU: FAST but EXPENSIVE
For proof of concept: Mobile wins. πŸ†
πŸ“Š Benchmarks (Post-Training)
Coming soon after training completes (~March 2026).
Expected performance:
Agda: 85-95/100 (primary language)
C: 85-92/100 (secondary language)
Assembly: 75-85/100 (tertiary)
Python: 10-20/100 (barely seen due to alphabet order)
πŸ™ Acknowledgments
HuggingFace: Infrastructure and transformers library
BigCode: The Stack dataset
The ML community: For saying "you need GPUs" - best motivation 😏
πŸ“œ License
Apache 2.0 - See LICENSE file. You can use Yuuki commercially, modify it, distribute it. Just give credit. βœ…
πŸ”— Links
GitHub: (https://github.com/aguitauwu)
Discord: (https://discord.gg/j8zV2u8k)
Progress updates: Check this model card
πŸ“… Updates
2026-01-29: Training started
2026-01-29: Step 1,000 reached - Loss 2.00
2026-01-29: Step 1,292 - NEW RECORD Loss 1.7053
2026-01-29: Repository created on HuggingFace
Last updated: 2026-01-29
Follow the journey of training an LLM with $0 budget. One step at a time. 🌸