OLMo3-190M-zh-full

这是 OLMo3-190M 中文预训练模型的中间训练 checkpoint，已替代此前 20-step smoke test 版本。

模型状态

checkpoint: checkpoint-1100
说明：这是训练过程中保存的中间状态，不是完整 epoch 正常结束后的 final 模型。
用途：可用于中间效果检查、继续训练，或作为比 20 步测试版本更充分训练的阶段性模型。

模型配置

hidden_size: 768
num_layers: 12
num_heads: 12
intermediate_size: 3072
vocab_size: 48000
sliding_window: 4096

训练配置

数据：cmz1024/llm101-olmo3-zh-demo-data
checkpoint step: 1100
per_device_train_batch_size: 24
gradient_accumulation_steps: 5
effective batch per GPU: 120
learning_rate: 5.0e-4
lr_scheduler_type: cosine
warmup_ratio: 0.02
bf16: true
gradient_checkpointing: false

使用方式

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "complexly/olmo3-190m-zh-full"

tok = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)

注意

该模型来自训练中断前保存的 checkpoint-1100。如果用于严肃评估或继续训练，建议同时参考仓库中的 training_config_olmo3_full.yaml。

Downloads last month: 42

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for complexly/olmo3-190m-zh-full

Unable to build the model tree, the base model loops to the model itself. Learn more.

complexly
/

olmo3-190m-zh-full