OLMo3-190M-zh-full

这是 OLMo3-190M 中文预训练模型的中间训练 checkpoint,已替代此前 20-step smoke test 版本。

模型状态

  • checkpoint: checkpoint-1100
  • 说明:这是训练过程中保存的中间状态,不是完整 epoch 正常结束后的 final 模型。
  • 用途:可用于中间效果检查、继续训练,或作为比 20 步测试版本更充分训练的阶段性模型。

模型配置

  • hidden_size: 768
  • num_layers: 12
  • num_heads: 12
  • intermediate_size: 3072
  • vocab_size: 48000
  • sliding_window: 4096

训练配置

  • 数据:cmz1024/llm101-olmo3-zh-demo-data
  • checkpoint step: 1100
  • per_device_train_batch_size: 24
  • gradient_accumulation_steps: 5
  • effective batch per GPU: 120
  • learning_rate: 5.0e-4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.02
  • bf16: true
  • gradient_checkpointing: false

使用方式

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "complexly/olmo3-190m-zh-full"

tok = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)

注意

该模型来自训练中断前保存的 checkpoint-1100。如果用于严肃评估或继续训练,建议同时参考仓库中的 training_config_olmo3_full.yaml

Downloads last month
42
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for complexly/olmo3-190m-zh-full

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using complexly/olmo3-190m-zh-full 1