|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- bigcode/the-stack |
|
|
language: |
|
|
- es |
|
|
- en |
|
|
base_model: |
|
|
- openai-community/gpt2 |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- code |
|
|
new_version: OpceanAI/Yuuki-the-best-model |
|
|
library_name: transformers |
|
|
--- |
|
|
β οΈ ## Notice on Current Model Scope |
|
|
|
|
|
Please note that **Yuuki**, in its current state, represents **approximately 3.7%** of the total training planned for **version v0.1**. |
|
|
|
|
|
At this stage, Yuuki should be considered an **early and incomplete snapshot** of the model. The full **v0.1 release**, which will include the remaining training stages, additional refinements, and stabilization, will be released at a later time. |
|
|
|
|
|
As such, performance, behavior, or capability assessments based on the current version of Yuuki **do not reflect** the final characteristics of the v0.1 model. |
|
|
|
|
|
Further updates will be provided as development progresses. |
|
|
|
|
|
πΈ Yuuki v0.1 - The $0 Code LLM |
|
|
|
|
|
> β οΈ WORK IN PROGRESS - Currently training on mobile CPU (Day 3/42) |
|
|
|
|
|
|
|
|
|
|
|
π― The Mission |
|
|
|
|
|
Prove that you DON'T need expensive GPUs to train LLMs. |
|
|
|
|
|
Yuuki is a code generation model trained entirely on a $150 Android phone with: |
|
|
|
|
|
β No cloud compute |
|
|
|
|
|
β No GPU |
|
|
|
|
|
β No data center |
|
|
|
|
|
β
Just determination and time |
|
|
|
|
|
|
|
|
The Setup |
|
|
|
|
|
Hardware: Snapdragon 685 (8-core ARM CPU) |
|
|
RAM: 6GB |
|
|
Storage: 128GB |
|
|
NPU: Hexagon 686 (1 TOPS) |
|
|
GPU: Adreno 610 (243 GFLOPS) - NOT USED for training |
|
|
Cost: $0 in compute |
|
|
|
|
|
π Current Status |
|
|
|
|
|
Metric Value |
|
|
|
|
|
Progress 1,417 / 37,500 steps (3.78%) |
|
|
Epoch 0.08 / 2.0 |
|
|
Current Loss ~1.70 - 2.23 |
|
|
Best Loss 1.7053 β |
|
|
Training Time ~3 days |
|
|
ETA ~39 days remaining |
|
|
Speed ~100 sec/step |
|
|
|
|
|
|
|
|
Loss Progression |
|
|
|
|
|
Step 0: Loss 3.35 (baseline) |
|
|
Step 500: Loss 2.50 β -25% |
|
|
Step 1000: Loss 2.00 β -40% |
|
|
Step 1265: Loss 1.83 β -45% |
|
|
Step 1292: Loss 1.71 β -49% β RECORD |
|
|
Step 1417: Loss 2.23 (current, oscillating 1.7-2.3) |
|
|
|
|
|
π What Yuuki Knows (So Far) |
|
|
|
|
|
Due to alphabetically-ordered dataset: |
|
|
|
|
|
Language Exposure Quality Status |
|
|
|
|
|
Agda High 85/100 β
Excellent |
|
|
C Starting 30/100 β³ Learning |
|
|
Assembly Low 5/100 π± Minimal |
|
|
Python None 0/100 β Not reached yet |
|
|
|
|
|
|
|
|
Example Output (Step 1,300) |
|
|
|
|
|
Agda prompt: module Main where |
|
|
|
|
|
module Main where (x, f) in a |
|
|
|
|
|
open import Cubical.Sigma |
|
|
open import Cubical.Sigma.Core |
|
|
open import Cubical.Foundations.H |
|
|
|
|
|
β
Real Agda libraries! The model learned actual Cubical type theory modules. |
|
|
|
|
|
π οΈ Training Configuration |
|
|
Model: DistilGPT-2 (82M parameters) |
|
|
Dataset: The Stack (75,000 examples) |
|
|
Batch size: 1 |
|
|
Gradient accumulation: 4 |
|
|
Effective batch: 4 |
|
|
Learning rate: 5e-5 |
|
|
Max length: 256 tokens |
|
|
Optimizer: AdamW |
|
|
Epochs: 2 |
|
|
Total tokens: ~30M (2 epochs) |
|
|
|
|
|
Why so slow? |
|
|
100 seconds/step Γ 37,500 steps = 3,750,000 seconds |
|
|
= 1,042 hours |
|
|
= 43.4 days |
|
|
= ~6 weeks of continuous training |
|
|
No GPU acceleration. Pure CPU grinding. πͺ |
|
|
|
|
|
π Roadmap |
|
|
|
|
|
v0.1 (Current - Proof of Concept) |
|
|
|
|
|
[x] Setup training pipeline |
|
|
|
|
|
[x] Start training (Step 0) |
|
|
|
|
|
[x] Reach Step 1,000 |
|
|
|
|
|
[x] Break loss 2.0 barrier |
|
|
|
|
|
[x] Break loss 1.8 barrier β |
|
|
|
|
|
[ ] Checkpoint 2,500 (7%) |
|
|
|
|
|
[ ] Checkpoint 5,000 (13%) |
|
|
|
|
|
[ ] Checkpoint 10,000 (27%) |
|
|
|
|
|
[ ] Checkpoint 18,750 (50% - Epoch 1 complete) |
|
|
|
|
|
[ ] Checkpoint 37,500 (100% - DONE) |
|
|
|
|
|
[ ] Quantize to INT8 |
|
|
|
|
|
[ ] Convert to ONNX |
|
|
|
|
|
[ ] Publish final model |
|
|
|
|
|
ETA: Mid-March 2026 |
|
|
|
|
|
|
|
|
v0.2 (The Full Dataset) |
|
|
|
|
|
Dataset: 786,387 examples (full Stack) |
|
|
|
|
|
Duration: 418 days (~14 months) |
|
|
|
|
|
Epochs: 2.0 |
|
|
|
|
|
Total tokens: ~314M |
|
|
|
|
|
Dataset fix: SHUFFLED (not alphabetical) |
|
|
|
|
|
Languages: All 80+ languages balanced |
|
|
|
|
|
Start: March 2026 |
|
|
|
|
|
End: May 2027 |
|
|
|
|
|
|
|
|
v0.3+ (PC Era) |
|
|
|
|
|
Hardware upgrade: RTX 4060/4070 |
|
|
|
|
|
Larger models: 350M-1B parameters |
|
|
|
|
|
Faster training: ~30x speedup |
|
|
|
|
|
Advanced techniques: LoRA, QLoRA, etc. |
|
|
|
|
|
|
|
|
|
|
|
π‘ Philosophy |
|
|
"The barrier to AI isn't money. It's mindset." |
|
|
This project demonstrates: β
You CAN train LLMs without GPUs |
|
|
β
Patience > Hardware |
|
|
β
$0 budget is enough to start |
|
|
β
Limited resources inspire creativity |
|
|
β
Anyone can contribute to AI |
|
|
|
|
|
π Usage (After Training Completes) |
|
|
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Load model |
|
|
model = AutoModelForCausalLM.from_pretrained("OpceanAI/Yuuki") |
|
|
tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki") |
|
|
|
|
|
# Generate code |
|
|
prompt = "def fibonacci(n):" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=100) |
|
|
code = tokenizer.decode(outputs[0]) |
|
|
print(code) |
|
|
|
|
|
Quantized (4x faster, 4x smaller) |
|
|
|
|
|
Coming after training completes |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"OpceanAI/Yuuki", |
|
|
subfolder="yuuki-v0.1-int8" |
|
|
) |
|
|
|
|
|
β οΈ Known Limitations |
|
|
|
|
|
Dataset order: Alphabetical (not shuffled) - learns early languages best |
|
|
|
|
|
Token count: Only ~30M tokens (vs GPT-2's 40B) |
|
|
|
|
|
Training speed: Very slow (~100 sec/step) |
|
|
|
|
|
Model size: Small (82M params) |
|
|
|
|
|
Language coverage: Incomplete due to alphabetical ordering |
|
|
These will be addressed in v0.2 with shuffled dataset. |
|
|
|
|
|
|
|
|
π¬ Technical Details |
|
|
|
|
|
CPU Training (100 sec/step): |
|
|
|
|
|
Forward pass: 40 sec |
|
|
|
|
|
Backward pass: 40 sec |
|
|
|
|
|
Optimizer: 20 sec |
|
|
|
|
|
Total: ~100 sec |
|
|
|
|
|
|
|
|
vs GPU Training (0.5 sec/step): |
|
|
|
|
|
200x faster |
|
|
|
|
|
But costs $0.50-$2.00/hour |
|
|
|
|
|
42 days = $500-$2,000 |
|
|
|
|
|
|
|
|
Mobile: FREE but SLOW |
|
|
|
|
|
GPU: FAST but EXPENSIVE |
|
|
|
|
|
For proof of concept: Mobile wins. π |
|
|
|
|
|
|
|
|
π Benchmarks (Post-Training) |
|
|
|
|
|
Coming soon after training completes (~March 2026). |
|
|
Expected performance: |
|
|
|
|
|
Agda: 85-95/100 (primary language) |
|
|
|
|
|
C: 85-92/100 (secondary language) |
|
|
|
|
|
Assembly: 75-85/100 (tertiary) |
|
|
|
|
|
Python: 10-20/100 (barely seen due to alphabet order) |
|
|
|
|
|
|
|
|
|
|
|
π Acknowledgments |
|
|
|
|
|
HuggingFace: Infrastructure and transformers library |
|
|
|
|
|
BigCode: The Stack dataset |
|
|
|
|
|
The ML community: For saying "you need GPUs" - best motivation π |
|
|
|
|
|
|
|
|
π License |
|
|
|
|
|
Apache 2.0 - See LICENSE file. You can use Yuuki commercially, modify it, distribute it. Just give credit. β
|
|
|
|
|
|
|
|
|
π Links |
|
|
|
|
|
GitHub: (https://github.com/aguitauwu) |
|
|
|
|
|
Discord: (https://discord.gg/j8zV2u8k) |
|
|
|
|
|
Progress updates: Check this model card |
|
|
|
|
|
|
|
|
π
Updates |
|
|
|
|
|
2026-01-29: Training started |
|
|
|
|
|
2026-01-29: Step 1,000 reached - Loss 2.00 |
|
|
|
|
|
2026-01-29: Step 1,292 - NEW RECORD Loss 1.7053 |
|
|
|
|
|
2026-01-29: Repository created on HuggingFace |
|
|
|
|
|
Last updated: 2026-01-29 |
|
|
|
|
|
|
|
|
Follow the journey of training an LLM with $0 budget. One step at a time. πΈ |