--- base_model: cmz1024/olmo3-190m-zh-full license: apache-2.0 language: - zh tags: - llm001 - olmo3 - chinese - pretrained --- # OLMo3-190M-zh-full 为零基础 AI 大模型研发训练营(llm001)L04 Full 模型(190M 参数,1 epoch完整训练)。完整训练该模型training loss 3.521, eval loss 3.450。 ## 模型配置 - hidden_size: 768, num_layers: 12, num_heads: 12, intermediate_size: 3072 - vocab_size: 48000, sliding_window: 4096 ## 训练配置 - 数据:cmz1024/llm101-olmo3-zh-demo-data (500M tokens),但使用42ailab/OLMo3-190M-zh版本tokenizer重新转换 - 训练:A800, max_steps=-1, bs=24×5=120, lr=5e-4, bf16 ## 用法 ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("complexly/olmo3-190m-zh-full") tok = AutoTokenizer.from_pretrained("complexly/olmo3-190m-zh-full") ```