metadata
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
pipeline_tag: text-generation
library_name: transformers
🩵 Rain-100M — Model Card
Rain-100M is an experimental language model trained from scratch based on the Qwen3 architecture.
🧠 Training & Data
- Training corpus:
HuggingFaceFW/fineweb-edu - Total tokens: ~3B
- Language: English only
- Tokenizer: Newly trained 16k BPE (optimized for small/compact models)
- Max sequence length: 4096
Sample training metrics:
train/grad_norm: 0.6640625
train/learning_rate: 0.00000000002171853813
train/loss: 3.4459
🏗️ Architecture (Qwen3-style)
- Parameters: ~100M
- Layers: 12 Transformer layers
- Hidden size: 768
- Attention heads: 12
- MLP dimension: 2048
- Activation: SiLU
- Weight dtype: bfloat16
- RMSNorm eps: 1e-6
- RoPE θ: 10000
- Inference framework:
transformers
⚠️ Limitations
- Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
- No system-level alignment or safety fine-tuning has been applied.
📄 License
When using this model locally, please also comply with the licenses of the fineweb-edu dataset and the transformers / Qwen3-related components.
中文版本
🩵 Rain-100M — Model Card
Rain-100M 是一个基于 Qwen3 架构 从零训练的实验语言模型。
🧠 训练与数据
- 训练语料:HuggingFaceFW/fineweb-edu
- Tokens 数量:约 3B
- 语言:仅英语
- Tokenizer:全新训练的 16k BPE(面向轻量模型优化)
- 最大序列长度:4096
训练参数:
train/grad_norm:0.6640625
train/learning_rate:0.00000000002171853813
train/loss:3.4459
🏗️ 模型结构(Qwen3 规格)
- 参数量:约 100M
- 层数:12 层 Transformer
- 隐藏维度:768
- 注意力头数:12
- 中间层维度:2048
- 激活函数:SiLU
- 权重类型:bfloat16
- RMSNorm eps:1e-6
- RoPE θ:10000
- 推理框架:transformers
⚠️ 限制
- 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务;
- 无系统对齐与安全强化。
📄 License
请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。