raincandy-u
/

Rain-100M

Text Generation

text-generation-inference

Model card Files Files and versions

Rain-100M / README.md

raincandy-u's picture

Update README.md

ee26b01 verified 1 day ago

|

history blame contribute delete

2.47 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	---


	# 🩵 Rain-100M — Model Card

	Rain-100M is an experimental language model trained from scratch based on the Qwen3 architecture.

	## 🧠 Training & Data

	* Training corpus: `HuggingFaceFW/fineweb-edu`
	* Total tokens: ~3B
	* Language: English only
	* Tokenizer: Newly trained 16k BPE (optimized for small/compact models)
	* Max sequence length: 4096

	Sample training metrics:

	```text
	train/grad_norm: 0.6640625
	train/learning_rate: 0.00000000002171853813
	train/loss: 3.4459
	```

	## 🏗️ Architecture (Qwen3-style)

	* Parameters: ~100M
	* Layers: 12 Transformer layers
	* Hidden size: 768
	* Attention heads: 12
	* MLP dimension: 2048
	* Activation: SiLU
	* Weight dtype: bfloat16
	* RMSNorm eps: 1e-6
	* RoPE θ: 10000
	* Inference framework: `transformers`

	## ⚠️ Limitations

	* Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
	* No system-level alignment or safety fine-tuning has been applied.

	## 📄 License

	When using this model locally, please also comply with the licenses of the `fineweb-edu` dataset and the `transformers` / Qwen3-related components.

	---

	# 中文版本

	# 🩵 Rain-100M — Model Card

	Rain-100M 是一个基于 Qwen3 架构从零训练的实验语言模型。

	## 🧠 训练与数据

	* 训练语料：HuggingFaceFW/fineweb-edu
	* Tokens 数量：约 3B
	* 语言：仅英语
	* Tokenizer：全新训练的 16k BPE（面向轻量模型优化）
	* 最大序列长度：4096

	训练参数：

	```

	train/grad_norm:0.6640625
	train/learning_rate:0.00000000002171853813
	train/loss:3.4459

	```

	## 🏗️ 模型结构（Qwen3 规格）

	* 参数量：约 100M
	* 层数：12 层 Transformer
	* 隐藏维度：768
	* 注意力头数：12
	* 中间层维度：2048
	* 激活函数：SiLU
	* 权重类型：bfloat16
	* RMSNorm eps：1e-6
	* RoPE θ：10000
	* 推理框架：transformers

	## ⚠️ 限制

	* 仅使用英文语料，小语言能力有限，不适合作为通用聊天或安全敏感任务；
	* 无系统对齐与安全强化。

	## 📄 License

	请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。