--- license: apache-2.0 datasets: - HuggingFaceFW/fineweb-edu language: - en pipeline_tag: text-generation library_name: transformers --- # 🩵 Rain-100M — Model Card **Rain-100M** is an experimental language model trained from scratch based on the **Qwen3 architecture**. ## 🧠 Training & Data * **Training corpus**: `HuggingFaceFW/fineweb-edu` * **Total tokens**: ~**3B** * **Language**: English only * **Tokenizer**: Newly trained **16k BPE** (optimized for small/compact models) * **Max sequence length**: 4096 **Sample training metrics**: ```text train/grad_norm: 0.6640625 train/learning_rate: 0.00000000002171853813 train/loss: 3.4459 ``` ## 🏗️ Architecture (Qwen3-style) * **Parameters**: ~100M * **Layers**: 12 Transformer layers * **Hidden size**: 768 * **Attention heads**: 12 * **MLP dimension**: 2048 * **Activation**: SiLU * **Weight dtype**: bfloat16 * **RMSNorm eps**: 1e-6 * **RoPE θ**: 10000 * **Inference framework**: `transformers` ## ⚠️ Limitations * Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases. * No system-level alignment or safety fine-tuning has been applied. ## 📄 License When using this model locally, please also comply with the licenses of the `fineweb-edu` dataset and the `transformers` / Qwen3-related components. --- # 中文版本 # 🩵 Rain-100M — Model Card **Rain-100M** 是一个基于 **Qwen3 架构** 从零训练的实验语言模型。 ## 🧠 训练与数据 * **训练语料**:HuggingFaceFW/fineweb-edu * **Tokens 数量**:约 **3B** * **语言**:仅英语 * **Tokenizer**:全新训练的 **16k BPE**(面向轻量模型优化) * **最大序列长度**:4096 **训练参数**: ``` train/grad_norm:0.6640625 train/learning_rate:0.00000000002171853813 train/loss:3.4459 ``` ## 🏗️ 模型结构(Qwen3 规格) * **参数量**:约 100M * **层数**:12 层 Transformer * **隐藏维度**:768 * **注意力头数**:12 * **中间层维度**:2048 * **激活函数**:SiLU * **权重类型**:bfloat16 * **RMSNorm eps**:1e-6 * **RoPE θ**:10000 * **推理框架**:transformers ## ⚠️ 限制 * 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务; * 无系统对齐与安全强化。 ## 📄 License 请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。