Rain-100M / README.md
raincandy-u's picture
Update README.md
ee26b01 verified
metadata
license: apache-2.0
datasets:
  - HuggingFaceFW/fineweb-edu
language:
  - en
pipeline_tag: text-generation
library_name: transformers

🩵 Rain-100M — Model Card

Rain-100M is an experimental language model trained from scratch based on the Qwen3 architecture.

🧠 Training & Data

  • Training corpus: HuggingFaceFW/fineweb-edu
  • Total tokens: ~3B
  • Language: English only
  • Tokenizer: Newly trained 16k BPE (optimized for small/compact models)
  • Max sequence length: 4096

Sample training metrics:

train/grad_norm: 0.6640625
train/learning_rate: 0.00000000002171853813
train/loss: 3.4459

🏗️ Architecture (Qwen3-style)

  • Parameters: ~100M
  • Layers: 12 Transformer layers
  • Hidden size: 768
  • Attention heads: 12
  • MLP dimension: 2048
  • Activation: SiLU
  • Weight dtype: bfloat16
  • RMSNorm eps: 1e-6
  • RoPE θ: 10000
  • Inference framework: transformers

⚠️ Limitations

  • Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
  • No system-level alignment or safety fine-tuning has been applied.

📄 License

When using this model locally, please also comply with the licenses of the fineweb-edu dataset and the transformers / Qwen3-related components.


中文版本

🩵 Rain-100M — Model Card

Rain-100M 是一个基于 Qwen3 架构 从零训练的实验语言模型。

🧠 训练与数据

  • 训练语料:HuggingFaceFW/fineweb-edu
  • Tokens 数量:约 3B
  • 语言:仅英语
  • Tokenizer:全新训练的 16k BPE(面向轻量模型优化)
  • 最大序列长度:4096

训练参数


train/grad_norm:0.6640625
train/learning_rate:0.00000000002171853813
train/loss:3.4459

🏗️ 模型结构(Qwen3 规格)

  • 参数量:约 100M
  • 层数:12 层 Transformer
  • 隐藏维度:768
  • 注意力头数:12
  • 中间层维度:2048
  • 激活函数:SiLU
  • 权重类型:bfloat16
  • RMSNorm eps:1e-6
  • RoPE θ:10000
  • 推理框架:transformers

⚠️ 限制

  • 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务;
  • 无系统对齐与安全强化。

📄 License

请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。