Gogs
commited on
Commit
·
ff5af85
1
Parent(s):
343a2ad
🌸 Initial Yuuki v0.1 setup - Training in progress (Step 1,417)
Browse files- LICENSE +17 -0
- NOTICE +23 -0
- README.md +223 -3
- config.json +45 -0
- merges.txt +0 -0
- special_tokens_map.json +6 -0
- tokenizer.json +0 -0
- tokenizer_config.json +21 -0
- vocab.json +0 -0
LICENSE
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Apache License
|
| 2 |
+
Version 2.0, January 2004
|
| 3 |
+
http://www.apache.org/licenses/
|
| 4 |
+
|
| 5 |
+
Copyright 2026 OpceanAI
|
| 6 |
+
|
| 7 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
| 8 |
+
you may not use this file except in compliance with the License.
|
| 9 |
+
You may obtain a copy of the License at
|
| 10 |
+
|
| 11 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 12 |
+
|
| 13 |
+
Unless required by applicable law or agreed to in writing, software
|
| 14 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
| 15 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 16 |
+
See the License for the specific language governing permissions and
|
| 17 |
+
limitations under the License.
|
NOTICE
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Yuuki - Mobile-Trained Code Language Model
|
| 2 |
+
Copyright 2026 OpceanAI
|
| 3 |
+
|
| 4 |
+
This product includes a language model trained entirely on a mobile device
|
| 5 |
+
(Qualcomm Snapdragon 685) over 42 days with zero GPU budget.
|
| 6 |
+
|
| 7 |
+
Training Details:
|
| 8 |
+
- Base model: DistilGPT-2 (82M parameters)
|
| 9 |
+
- Training period: January-March 2026
|
| 10 |
+
- Hardware: Android device (Snapdragon 685, 6GB RAM)
|
| 11 |
+
- Dataset: The Stack (75,000 examples for v0.1)
|
| 12 |
+
- Total cost: $0 in cloud/GPU compute
|
| 13 |
+
|
| 14 |
+
Third-party Components:
|
| 15 |
+
- Transformers library by HuggingFace (Apache 2.0)
|
| 16 |
+
- PyTorch (BSD-3-Clause)
|
| 17 |
+
- The Stack dataset by BigCode (BigCode OpenRAIL-M)
|
| 18 |
+
- DistilGPT-2 base model (Apache 2.0)
|
| 19 |
+
|
| 20 |
+
Special Thanks:
|
| 21 |
+
- My snapdragon 685
|
| 22 |
+
- HuggingFace for infrastructure
|
| 23 |
+
- The ML community
|
README.md
CHANGED
|
@@ -1,3 +1,223 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- code
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- code-generation
|
| 7 |
+
- mobile-training
|
| 8 |
+
- pytorch
|
| 9 |
+
- transformers
|
| 10 |
+
- distilgpt2
|
| 11 |
+
- zero-budget-ai
|
| 12 |
+
datasets:
|
| 13 |
+
- bigcode/the-stack-smol-xl
|
| 14 |
+
metrics:
|
| 15 |
+
- perplexity
|
| 16 |
+
model-index:
|
| 17 |
+
- name: Yuuki v0.1
|
| 18 |
+
results:
|
| 19 |
+
- task:
|
| 20 |
+
type: text-generation
|
| 21 |
+
dataset:
|
| 22 |
+
name: The Stack
|
| 23 |
+
type: bigcode/the-stack-smol-xl
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
# 🌸 Yuuki v0.1 - The $0 Code LLM
|
| 27 |
+
|
| 28 |
+
> **⚠️ WORK IN PROGRESS** - Currently training on mobile CPU (Day 3/42)
|
| 29 |
+
|
| 30 |
+
## 🎯 The Mission
|
| 31 |
+
|
| 32 |
+
**Prove that you DON'T need expensive GPUs to train LLMs.**
|
| 33 |
+
|
| 34 |
+
Yuuki is a code generation model trained entirely on a **$150 Android phone** with:
|
| 35 |
+
- ❌ No cloud compute
|
| 36 |
+
- ❌ No GPU
|
| 37 |
+
- ❌ No data center
|
| 38 |
+
- ✅ Just determination and time
|
| 39 |
+
|
| 40 |
+
### The Setup
|
| 41 |
+
Hardware: Snapdragon 685 (8-core ARM CPU)
|
| 42 |
+
RAM: 6GB
|
| 43 |
+
Storage: 128GB
|
| 44 |
+
NPU: Hexagon 686 (1 TOPS)
|
| 45 |
+
GPU: Adreno 610 (243 GFLOPS) - NOT USED for training
|
| 46 |
+
Cost: $0 in compute
|
| 47 |
+
## 📊 Current Status
|
| 48 |
+
|
| 49 |
+
| Metric | Value |
|
| 50 |
+
|--------|-------|
|
| 51 |
+
| **Progress** | 1,417 / 37,500 steps (3.78%) |
|
| 52 |
+
| **Epoch** | 0.08 / 2.0 |
|
| 53 |
+
| **Current Loss** | ~1.70 - 2.23 |
|
| 54 |
+
| **Best Loss** | 1.7053 ⭐ |
|
| 55 |
+
| **Training Time** | ~3 days |
|
| 56 |
+
| **ETA** | ~39 days remaining |
|
| 57 |
+
| **Speed** | ~100 sec/step |
|
| 58 |
+
|
| 59 |
+
### Loss Progression
|
| 60 |
+
Step 0: Loss 3.35 (baseline)
|
| 61 |
+
Step 500: Loss 2.50 ↓ -25%
|
| 62 |
+
Step 1000: Loss 2.00 ↓ -40%
|
| 63 |
+
Step 1265: Loss 1.83 ↓ -45%
|
| 64 |
+
Step 1292: Loss 1.71 ↓ -49% ⭐ RECORD
|
| 65 |
+
Step 1417: Loss 2.23 (current, oscillating 1.7-2.3)
|
| 66 |
+
## 🎓 What Yuuki Knows (So Far)
|
| 67 |
+
|
| 68 |
+
Due to alphabetically-ordered dataset:
|
| 69 |
+
|
| 70 |
+
| Language | Exposure | Quality | Status |
|
| 71 |
+
|----------|----------|---------|--------|
|
| 72 |
+
| **Agda** | High | 85/100 | ✅ Excellent |
|
| 73 |
+
| **C** | Starting | 30/100 | ⏳ Learning |
|
| 74 |
+
| **Assembly** | Low | 5/100 | 🌱 Minimal |
|
| 75 |
+
| **Python** | None | 0/100 | ❌ Not reached yet |
|
| 76 |
+
|
| 77 |
+
### Example Output (Step 1,300)
|
| 78 |
+
|
| 79 |
+
**Agda prompt:** `module Main where`
|
| 80 |
+
|
| 81 |
+
```agda
|
| 82 |
+
module Main where (x, f) in a
|
| 83 |
+
|
| 84 |
+
open import Cubical.Sigma
|
| 85 |
+
open import Cubical.Sigma.Core
|
| 86 |
+
open import Cubical.Foundations.H
|
| 87 |
+
✅ Real Agda libraries! The model learned actual Cubical type theory modules.
|
| 88 |
+
🛠️ Training Configuration
|
| 89 |
+
Model: DistilGPT-2 (82M parameters)
|
| 90 |
+
Dataset: The Stack (75,000 examples)
|
| 91 |
+
Batch size: 1
|
| 92 |
+
Gradient accumulation: 4
|
| 93 |
+
Effective batch: 4
|
| 94 |
+
Learning rate: 5e-5
|
| 95 |
+
Max length: 256 tokens
|
| 96 |
+
Optimizer: AdamW
|
| 97 |
+
Epochs: 2
|
| 98 |
+
Total tokens: ~30M (2 epochs)
|
| 99 |
+
Why so slow?
|
| 100 |
+
100 seconds/step × 37,500 steps = 3,750,000 seconds
|
| 101 |
+
= 1,042 hours
|
| 102 |
+
= 43.4 days
|
| 103 |
+
= ~6 weeks of continuous training
|
| 104 |
+
No GPU acceleration. Pure CPU grinding. 💪
|
| 105 |
+
📈 Roadmap
|
| 106 |
+
v0.1 (Current - Proof of Concept)
|
| 107 |
+
[x] Setup training pipeline
|
| 108 |
+
[x] Start training (Step 0)
|
| 109 |
+
[x] Reach Step 1,000
|
| 110 |
+
[x] Break loss 2.0 barrier
|
| 111 |
+
[x] Break loss 1.8 barrier ⭐
|
| 112 |
+
[ ] Checkpoint 2,500 (7%)
|
| 113 |
+
[ ] Checkpoint 5,000 (13%)
|
| 114 |
+
[ ] Checkpoint 10,000 (27%)
|
| 115 |
+
[ ] Checkpoint 18,750 (50% - Epoch 1 complete)
|
| 116 |
+
[ ] Checkpoint 37,500 (100% - DONE)
|
| 117 |
+
[ ] Quantize to INT8
|
| 118 |
+
[ ] Convert to ONNX
|
| 119 |
+
[ ] Publish final model
|
| 120 |
+
ETA: Mid-March 2026
|
| 121 |
+
v0.2 (The Full Dataset)
|
| 122 |
+
Dataset: 786,387 examples (full Stack)
|
| 123 |
+
Duration: 418 days (~14 months)
|
| 124 |
+
Epochs: 2.0
|
| 125 |
+
Total tokens: ~314M
|
| 126 |
+
Dataset fix: SHUFFLED (not alphabetical)
|
| 127 |
+
Languages: All 80+ languages balanced
|
| 128 |
+
Start: March 2026
|
| 129 |
+
End: May 2027
|
| 130 |
+
v0.3+ (PC Era)
|
| 131 |
+
Hardware upgrade: RTX 4060/4070
|
| 132 |
+
Larger models: 350M-1B parameters
|
| 133 |
+
Faster training: ~30x speedup
|
| 134 |
+
Advanced techniques: LoRA, QLoRA, etc.
|
| 135 |
+
💡 Philosophy
|
| 136 |
+
"The barrier to AI isn't money. It's mindset."
|
| 137 |
+
This project demonstrates:
|
| 138 |
+
✅ You CAN train LLMs without GPUs
|
| 139 |
+
✅ Patience > Hardware
|
| 140 |
+
✅ $0 budget is enough to start
|
| 141 |
+
✅ Limited resources inspire creativity
|
| 142 |
+
✅ Anyone can contribute to AI
|
| 143 |
+
The Statement vs The Execution
|
| 144 |
+
v0.1-v0.2 (Mobile): "You don't need expensive hardware"
|
| 145 |
+
v0.3+ (PC): "Now let's build something competitive"
|
| 146 |
+
Start with what you have. Upgrade when you can. Never let hardware stop you.
|
| 147 |
+
🚀 Usage (After Training Completes)
|
| 148 |
+
Basic Usage
|
| 149 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 150 |
+
|
| 151 |
+
# Load model
|
| 152 |
+
model = AutoModelForCausalLM.from_pretrained("OpceanAI/Yuuki")
|
| 153 |
+
tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki")
|
| 154 |
+
|
| 155 |
+
# Generate code
|
| 156 |
+
prompt = "def fibonacci(n):"
|
| 157 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
| 158 |
+
outputs = model.generate(**inputs, max_length=100)
|
| 159 |
+
code = tokenizer.decode(outputs[0])
|
| 160 |
+
print(code)
|
| 161 |
+
Quantized (4x faster, 4x smaller)
|
| 162 |
+
# Coming after training completes
|
| 163 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 164 |
+
"OpceanAI/Yuuki",
|
| 165 |
+
subfolder="yuuki-v0.1-int8"
|
| 166 |
+
)
|
| 167 |
+
⚠️ Known Limitations
|
| 168 |
+
Dataset order: Alphabetical (not shuffled) - learns early languages best
|
| 169 |
+
Token count: Only ~30M tokens (vs GPT-2's 40B)
|
| 170 |
+
Training speed: Very slow (~100 sec/step)
|
| 171 |
+
Model size: Small (82M params)
|
| 172 |
+
Language coverage: Incomplete due to alphabetical ordering
|
| 173 |
+
These will be addressed in v0.2 with shuffled dataset.
|
| 174 |
+
🔬 Technical Details
|
| 175 |
+
Why Mobile Training Works
|
| 176 |
+
CPU Training (100 sec/step):
|
| 177 |
+
- Forward pass: 40 sec
|
| 178 |
+
- Backward pass: 40 sec
|
| 179 |
+
- Optimizer: 20 sec
|
| 180 |
+
Total: ~100 sec
|
| 181 |
+
|
| 182 |
+
vs GPU Training (0.5 sec/step):
|
| 183 |
+
- 200x faster
|
| 184 |
+
- But costs $0.50-$2.00/hour
|
| 185 |
+
- 42 days = $500-$2,000
|
| 186 |
+
|
| 187 |
+
Mobile: FREE but SLOW
|
| 188 |
+
GPU: FAST but EXPENSIVE
|
| 189 |
+
|
| 190 |
+
For proof of concept: Mobile wins. 🏆
|
| 191 |
+
Training Challenges Overcome
|
| 192 |
+
Memory management: Gradient accumulation (4 steps)
|
| 193 |
+
Thermal throttling: Periodic breaks, room cooling
|
| 194 |
+
Battery life: Always plugged in
|
| 195 |
+
Storage: Careful checkpoint management
|
| 196 |
+
Interruptions: Resume from checkpoints
|
| 197 |
+
Patience: 100 sec/step × 37,500 = mental fortitude
|
| 198 |
+
📊 Benchmarks (Post-Training)
|
| 199 |
+
Coming soon after training completes (~March 2026).
|
| 200 |
+
Expected performance:
|
| 201 |
+
Agda: 85-95/100 (primary language)
|
| 202 |
+
C: 85-92/100 (secondary language)
|
| 203 |
+
Assembly: 75-85/100 (tertiary)
|
| 204 |
+
Python: 10-20/100 (barely seen due to alphabet order)
|
| 205 |
+
🙏 Acknowledgments
|
| 206 |
+
Anthropic Claude: Technical guidance and debugging assistance
|
| 207 |
+
HuggingFace: Infrastructure and transformers library
|
| 208 |
+
BigCode: The Stack dataset
|
| 209 |
+
The ML community: For saying "you need GPUs" - best motivation 😏
|
| 210 |
+
📜 License
|
| 211 |
+
Apache 2.0 - See LICENSE file.
|
| 212 |
+
You can use Yuuki commercially, modify it, distribute it. Just give credit. ✅
|
| 213 |
+
🔗 Links
|
| 214 |
+
GitHub: (Coming soon)
|
| 215 |
+
Twitter: (Coming soon)
|
| 216 |
+
Progress updates: Check this model card
|
| 217 |
+
📅 Updates
|
| 218 |
+
2026-01-29: Training started
|
| 219 |
+
2026-01-29: Step 1,000 reached - Loss 2.00
|
| 220 |
+
2026-01-29: Step 1,292 - NEW RECORD Loss 1.7053
|
| 221 |
+
2026-01-29: Repository created on HuggingFace
|
| 222 |
+
Last updated: 2026-01-29
|
| 223 |
+
Follow the journey of training an LLM with $0 budget. One step at a time. 🌸
|
config.json
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_num_labels": 1,
|
| 3 |
+
"activation_function": "gelu_new",
|
| 4 |
+
"architectures": [
|
| 5 |
+
"GPT2LMHeadModel"
|
| 6 |
+
],
|
| 7 |
+
"attn_pdrop": 0.1,
|
| 8 |
+
"bos_token_id": 50256,
|
| 9 |
+
"dtype": "float32",
|
| 10 |
+
"embd_pdrop": 0.1,
|
| 11 |
+
"eos_token_id": 50256,
|
| 12 |
+
"id2label": {
|
| 13 |
+
"0": "LABEL_0"
|
| 14 |
+
},
|
| 15 |
+
"initializer_range": 0.02,
|
| 16 |
+
"label2id": {
|
| 17 |
+
"LABEL_0": 0
|
| 18 |
+
},
|
| 19 |
+
"layer_norm_epsilon": 1e-05,
|
| 20 |
+
"model_type": "gpt2",
|
| 21 |
+
"n_ctx": 1024,
|
| 22 |
+
"n_embd": 768,
|
| 23 |
+
"n_head": 12,
|
| 24 |
+
"n_inner": null,
|
| 25 |
+
"n_layer": 6,
|
| 26 |
+
"n_positions": 1024,
|
| 27 |
+
"reorder_and_upcast_attn": false,
|
| 28 |
+
"resid_pdrop": 0.1,
|
| 29 |
+
"scale_attn_by_inverse_layer_idx": false,
|
| 30 |
+
"scale_attn_weights": true,
|
| 31 |
+
"summary_activation": null,
|
| 32 |
+
"summary_first_dropout": 0.1,
|
| 33 |
+
"summary_proj_to_labels": true,
|
| 34 |
+
"summary_type": "cls_index",
|
| 35 |
+
"summary_use_proj": true,
|
| 36 |
+
"task_specific_params": {
|
| 37 |
+
"text-generation": {
|
| 38 |
+
"do_sample": true,
|
| 39 |
+
"max_length": 50
|
| 40 |
+
}
|
| 41 |
+
},
|
| 42 |
+
"transformers_version": "4.57.3",
|
| 43 |
+
"use_cache": true,
|
| 44 |
+
"vocab_size": 50257
|
| 45 |
+
}
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": "<|endoftext|>",
|
| 3 |
+
"eos_token": "<|endoftext|>",
|
| 4 |
+
"pad_token": "<|endoftext|>",
|
| 5 |
+
"unk_token": "<|endoftext|>"
|
| 6 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"added_tokens_decoder": {
|
| 4 |
+
"50256": {
|
| 5 |
+
"content": "<|endoftext|>",
|
| 6 |
+
"lstrip": false,
|
| 7 |
+
"normalized": true,
|
| 8 |
+
"rstrip": false,
|
| 9 |
+
"single_word": false,
|
| 10 |
+
"special": true
|
| 11 |
+
}
|
| 12 |
+
},
|
| 13 |
+
"bos_token": "<|endoftext|>",
|
| 14 |
+
"clean_up_tokenization_spaces": false,
|
| 15 |
+
"eos_token": "<|endoftext|>",
|
| 16 |
+
"extra_special_tokens": {},
|
| 17 |
+
"model_max_length": 1024,
|
| 18 |
+
"pad_token": "<|endoftext|>",
|
| 19 |
+
"tokenizer_class": "GPT2Tokenizer",
|
| 20 |
+
"unk_token": "<|endoftext|>"
|
| 21 |
+
}
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|