Rize-0.5-tiny / README.md
masatof's picture
Update readme.
88c901e verified
|
raw
history blame
2.14 kB
metadata
license: mit
language:
  - en
  - ja

Model Card

Overview

Rize is a causal language model for pretraining research and general text generation. It uses a Transformer decoder architecture with Mixture-of-Experts (MoE) layers. The model is designed for research and experimental development.

Model Size and Architecture

This tiny model has about 4 billion total parameters and about 1 billion active parameters per token.

Main architecture points:

  • decoder-only Transformer
  • 19 hidden layers
  • hidden size of 1536
  • 12 attention heads
  • 64 routed experts
  • top-4 expert routing per token
  • 1 shared expert
  • vocabulary size of 163,840
  • maximum context length of 8,192 tokens

Intended Use

This model is intended for:

  • language modeling research
  • evaluation of training settings and architectures
  • general text generation benchmarks

This model is not intended to be used as a source of factual truth or professional advice.

Training

The model is trained with autoregressive next-token prediction on text data. It is developed as a research model and may change across checkpoints, runs, and configurations.

Capabilities

  • text continuation
  • general question answering
  • instruction-style response generation
  • multilingual text handling, depending on training data

Limitations

  • may generate incorrect or misleading information
  • may reflect biases in training data
  • may produce unsafe, harmful, or inappropriate text
  • performance may vary across languages and domains
  • not optimized for high-stakes decisions

Safety and Responsible Use

Users should review outputs before any real-world use. The model should not be used on its own for:

  • medical advice
  • legal advice
  • financial advice
  • safety-critical decisions
  • sensitive personal decisions

Human oversight is required.

Disclaimer

This model is provided for research and experimental purposes only. The FA Research Team makes no guarantees regarding accuracy, completeness, reliability, safety, or fitness for a particular purpose. Use of this model and its outputs is at the user’s own risk.

Contact

FA Research Team