| --- |
| license: mit |
| language: |
| - en |
| - ja |
| --- |
| # Model Card |
| ## Overview |
| Rize is a causal language model for pretraining research and general text generation. |
| It uses a Transformer decoder architecture with Mixture-of-Experts (MoE) layers. |
| The model is designed for research and experimental development. |
|
|
| ## Model Size and Architecture |
| This tiny model has about **4 billion total parameters** and about **1 billion active parameters per token**. |
|
|
| Main architecture points: |
| - decoder-only Transformer |
| - 19 hidden layers |
| - hidden size of 1536 |
| - 12 attention heads |
| - 64 routed experts |
| - top-4 expert routing per token |
| - 1 shared expert |
| - vocabulary size of 163,840 |
| - maximum context length of 8,192 tokens |
|
|
| ## Intended Use |
| This model is intended for: |
| - language modeling research |
| - evaluation of training settings and architectures |
| - general text generation benchmarks |
|
|
| This model is not intended to be used as a source of factual truth or professional advice. |
|
|
| ## Training |
| The model is trained with autoregressive next-token prediction on text data. |
| It is developed as a research model and may change across checkpoints, runs, and configurations. |
|
|
| ## Capabilities |
| - text continuation |
| - general question answering |
| - instruction-style response generation |
| - multilingual text handling, depending on training data |
|
|
| ## Limitations |
| - may generate incorrect or misleading information |
| - may reflect biases in training data |
| - may produce unsafe, harmful, or inappropriate text |
| - performance may vary across languages and domains |
| - not optimized for high-stakes decisions |
|
|
| ## Safety and Responsible Use |
| Users should review outputs before any real-world use. |
| The model should not be used on its own for: |
| - medical advice |
| - legal advice |
| - financial advice |
| - safety-critical decisions |
| - sensitive personal decisions |
|
|
| Human oversight is required. |
|
|
| ## Disclaimer |
| This model is provided for research and experimental purposes only. |
| The FA Research Team makes no guarantees regarding accuracy, completeness, reliability, safety, or fitness for a particular purpose. |
| Use of this model and its outputs is at the user’s own risk. |
|
|
| ## Contact |
| FA Research Team |