OLMo3-190M-zh-full
这是 OLMo3-190M 中文预训练模型的中间训练 checkpoint,已替代此前 20-step smoke test 版本。
模型状态
- checkpoint: checkpoint-1100
- 说明:这是训练过程中保存的中间状态,不是完整 epoch 正常结束后的 final 模型。
- 用途:可用于中间效果检查、继续训练,或作为比 20 步测试版本更充分训练的阶段性模型。
模型配置
- hidden_size: 768
- num_layers: 12
- num_heads: 12
- intermediate_size: 3072
- vocab_size: 48000
- sliding_window: 4096
训练配置
- 数据:cmz1024/llm101-olmo3-zh-demo-data
- checkpoint step: 1100
- per_device_train_batch_size: 24
- gradient_accumulation_steps: 5
- effective batch per GPU: 120
- learning_rate: 5.0e-4
- lr_scheduler_type: cosine
- warmup_ratio: 0.02
- bf16: true
- gradient_checkpointing: false
使用方式
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "complexly/olmo3-190m-zh-full"
tok = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
注意
该模型来自训练中断前保存的 checkpoint-1100。如果用于严肃评估或继续训练,建议同时参考仓库中的 training_config_olmo3_full.yaml。
- Downloads last month
- 42
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for complexly/olmo3-190m-zh-full
Unable to build the model tree, the base model loops to the model itself. Learn more.