Nwna
/

olmo3-190m-zh-full

Model card Files Files and versions

OLMo3-190M-zh-full

为零基础 AI 大模型研发训练营（llm001）L04 Full 模型（190M 参数，完整 1 epoch 训练）。

模型配置

hidden_size: 768, num_layers: 12, num_heads: 12, intermediate_size: 3072
vocab_size: 48000, sliding_window: 4096

训练配置

数据：cmz1024/llm101-olmo3-zh-demo-data
训练：H100, num_train_epochs=1.0, bs=16×8=128, lr=5e-4→5e-5, bf16
训练集：1,586,532 条 2048-token 序列，约 3.25B train tokens
最终指标：train_loss≈3.91, eval_loss≈3.41

用法

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Nwna/olmo3-190m-zh-full")
tok = AutoTokenizer.from_pretrained("Nwna/olmo3-190m-zh-full")

Downloads last month: 26

Safetensors

Model size

0.2B params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nwna/olmo3-190m-zh-full

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using Nwna/olmo3-190m-zh-full 1