OLMo3-190M-zh-nano

为零基础 AI 大模型研发训练营(llm001)L04 Nano 模型(26M 参数,20 步测试训练)。

模型配置(Full 的等比缩小版)

参数 Full (190M) Nano (26M)
hidden_size 768 192
num_layers 12 6
num_heads 12 3
intermediate_size 3072 768
vocab_size 48000 48000

训练配置

  • 数据:cmz1024/llm101-olmo3-zh-demo-data (500M tokens)
  • 训练:H100, max_steps=20, bs=32×4=128, lr=1e-3, bf16

用法

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("cmz1024/olmo3-190m-zh-nano")
tok = AutoTokenizer.from_pretrained("cmz1024/olmo3-190m-zh-nano")
Downloads last month
19
Safetensors
Model size
22M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bush233/olmo3-190m-zh-nano

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using Bush233/olmo3-190m-zh-nano 1