Rize-0.5-tiny / README.md
masatof's picture
Update readme.
88c901e verified
|
raw
history blame
2.14 kB
---
license: mit
language:
- en
- ja
---
# Model Card
## Overview
Rize is a causal language model for pretraining research and general text generation.
It uses a Transformer decoder architecture with Mixture-of-Experts (MoE) layers.
The model is designed for research and experimental development.
## Model Size and Architecture
This tiny model has about **4 billion total parameters** and about **1 billion active parameters per token**.
Main architecture points:
- decoder-only Transformer
- 19 hidden layers
- hidden size of 1536
- 12 attention heads
- 64 routed experts
- top-4 expert routing per token
- 1 shared expert
- vocabulary size of 163,840
- maximum context length of 8,192 tokens
## Intended Use
This model is intended for:
- language modeling research
- evaluation of training settings and architectures
- general text generation benchmarks
This model is not intended to be used as a source of factual truth or professional advice.
## Training
The model is trained with autoregressive next-token prediction on text data.
It is developed as a research model and may change across checkpoints, runs, and configurations.
## Capabilities
- text continuation
- general question answering
- instruction-style response generation
- multilingual text handling, depending on training data
## Limitations
- may generate incorrect or misleading information
- may reflect biases in training data
- may produce unsafe, harmful, or inappropriate text
- performance may vary across languages and domains
- not optimized for high-stakes decisions
## Safety and Responsible Use
Users should review outputs before any real-world use.
The model should not be used on its own for:
- medical advice
- legal advice
- financial advice
- safety-critical decisions
- sensitive personal decisions
Human oversight is required.
## Disclaimer
This model is provided for research and experimental purposes only.
The FA Research Team makes no guarantees regarding accuracy, completeness, reliability, safety, or fitness for a particular purpose.
Use of this model and its outputs is at the user’s own risk.
## Contact
FA Research Team