|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- HuggingFaceFW/fineweb-edu |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
|
|
|
# 🩵 Rain-100M — Model Card |
|
|
|
|
|
**Rain-100M** is an experimental language model trained from scratch based on the **Qwen3 architecture**. |
|
|
|
|
|
## 🧠 Training & Data |
|
|
|
|
|
* **Training corpus**: `HuggingFaceFW/fineweb-edu` |
|
|
* **Total tokens**: ~**3B** |
|
|
* **Language**: English only |
|
|
* **Tokenizer**: Newly trained **16k BPE** (optimized for small/compact models) |
|
|
* **Max sequence length**: 4096 |
|
|
|
|
|
**Sample training metrics**: |
|
|
|
|
|
```text |
|
|
train/grad_norm: 0.6640625 |
|
|
train/learning_rate: 0.00000000002171853813 |
|
|
train/loss: 3.4459 |
|
|
``` |
|
|
|
|
|
## 🏗️ Architecture (Qwen3-style) |
|
|
|
|
|
* **Parameters**: ~100M |
|
|
* **Layers**: 12 Transformer layers |
|
|
* **Hidden size**: 768 |
|
|
* **Attention heads**: 12 |
|
|
* **MLP dimension**: 2048 |
|
|
* **Activation**: SiLU |
|
|
* **Weight dtype**: bfloat16 |
|
|
* **RMSNorm eps**: 1e-6 |
|
|
* **RoPE θ**: 10000 |
|
|
* **Inference framework**: `transformers` |
|
|
|
|
|
## ⚠️ Limitations |
|
|
|
|
|
* Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases. |
|
|
* No system-level alignment or safety fine-tuning has been applied. |
|
|
|
|
|
## 📄 License |
|
|
|
|
|
When using this model locally, please also comply with the licenses of the `fineweb-edu` dataset and the `transformers` / Qwen3-related components. |
|
|
|
|
|
--- |
|
|
|
|
|
# 中文版本 |
|
|
|
|
|
# 🩵 Rain-100M — Model Card |
|
|
|
|
|
**Rain-100M** 是一个基于 **Qwen3 架构** 从零训练的实验语言模型。 |
|
|
|
|
|
## 🧠 训练与数据 |
|
|
|
|
|
* **训练语料**:HuggingFaceFW/fineweb-edu |
|
|
* **Tokens 数量**:约 **3B** |
|
|
* **语言**:仅英语 |
|
|
* **Tokenizer**:全新训练的 **16k BPE**(面向轻量模型优化) |
|
|
* **最大序列长度**:4096 |
|
|
|
|
|
**训练参数**: |
|
|
|
|
|
``` |
|
|
|
|
|
train/grad_norm:0.6640625 |
|
|
train/learning_rate:0.00000000002171853813 |
|
|
train/loss:3.4459 |
|
|
|
|
|
``` |
|
|
|
|
|
## 🏗️ 模型结构(Qwen3 规格) |
|
|
|
|
|
* **参数量**:约 100M |
|
|
* **层数**:12 层 Transformer |
|
|
* **隐藏维度**:768 |
|
|
* **注意力头数**:12 |
|
|
* **中间层维度**:2048 |
|
|
* **激活函数**:SiLU |
|
|
* **权重类型**:bfloat16 |
|
|
* **RMSNorm eps**:1e-6 |
|
|
* **RoPE θ**:10000 |
|
|
* **推理框架**:transformers |
|
|
|
|
|
## ⚠️ 限制 |
|
|
|
|
|
* 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务; |
|
|
* 无系统对齐与安全强化。 |
|
|
|
|
|
## 📄 License |
|
|
|
|
|
请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。 |