File size: 917 Bytes
0618a75
 
 
 
 
 
 
 
 
 
 
 
 
 
c6e332c
0618a75
 
 
 
 
 
 
 
c6e332c
0618a75
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---

base_model: cmz1024/olmo3-190m-zh-full
license: apache-2.0
language:
- zh
tags:
- llm001
- olmo3
- chinese
- pretrained
---


# OLMo3-190M-zh-full

为零基础 AI 大模型研发训练营(llm001)L04 Full 模型(190M 参数,1 epoch完整训练)。完整训练该模型training loss 3.521, eval loss 3.450。

## 模型配置

- hidden_size: 768, num_layers: 12, num_heads: 12, intermediate_size: 3072
- vocab_size: 48000, sliding_window: 4096

## 训练配置

- 数据:cmz1024/llm101-olmo3-zh-demo-data (500M tokens),但使用42ailab/OLMo3-190M-zh版本tokenizer重新转换
- 训练:A800, max_steps=-1, bs=24×5=120, lr=5e-4, bf16



## 用法



```python

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("complexly/olmo3-190m-zh-full")
tok = AutoTokenizer.from_pretrained("complexly/olmo3-190m-zh-full")

```