Update readme.

88c901e verified 1 day ago

2.14 kB

license: mit
language:
  - en
  - ja

Model Card

Overview

Rize is a causal language model for pretraining research and general text generation. It uses a Transformer decoder architecture with Mixture-of-Experts (MoE) layers. The model is designed for research and experimental development.

Model Size and Architecture

This tiny model has about 4 billion total parameters and about 1 billion active parameters per token.

Main architecture points:

decoder-only Transformer
19 hidden layers
hidden size of 1536
12 attention heads
64 routed experts
top-4 expert routing per token
1 shared expert
vocabulary size of 163,840
maximum context length of 8,192 tokens

Intended Use

This model is intended for:

language modeling research
evaluation of training settings and architectures
general text generation benchmarks

This model is not intended to be used as a source of factual truth or professional advice.

Training

The model is trained with autoregressive next-token prediction on text data. It is developed as a research model and may change across checkpoints, runs, and configurations.

Capabilities

text continuation
general question answering
instruction-style response generation
multilingual text handling, depending on training data

Limitations

may generate incorrect or misleading information
may reflect biases in training data
may produce unsafe, harmful, or inappropriate text
performance may vary across languages and domains
not optimized for high-stakes decisions

Safety and Responsible Use

Users should review outputs before any real-world use. The model should not be used on its own for:

medical advice
legal advice
financial advice
safety-critical decisions
sensitive personal decisions

Human oversight is required.

Disclaimer

This model is provided for research and experimental purposes only. The FA Research Team makes no guarantees regarding accuracy, completeness, reliability, safety, or fitness for a particular purpose. Use of this model and its outputs is at the user’s own risk.

Contact

FA Research Team