OLMo3-190M-zh-nano
为零基础 AI 大模型研发训练营(llm001)L04 Nano 模型(26M 参数,20 步测试训练)。
模型配置(Full 的等比缩小版)
| 参数 | Full (190M) | Nano (26M) |
|---|---|---|
| hidden_size | 768 | 192 |
| num_layers | 12 | 6 |
| num_heads | 12 | 3 |
| intermediate_size | 3072 | 768 |
| vocab_size | 48000 | 48000 |
训练配置
- 数据:cmz1024/llm101-olmo3-zh-demo-data (500M tokens)
- 训练:H100, max_steps=20, bs=32×4=128, lr=1e-3, bf16
用法
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("cmz1024/olmo3-190m-zh-nano")
tok = AutoTokenizer.from_pretrained("cmz1024/olmo3-190m-zh-nano")
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Bush233/olmo3-190m-zh-nano
Unable to build the model tree, the base model loops to the model itself. Learn more.