Metis-1.3-base / README.md
GiuliannoV's picture
Add files using upload-large-folder tool
cbad111 verified

Metis-1.3 Base

Metis-1.3 Base is a 201M-parameter English-first decoder model from the Metis family. It uses a hybrid Mamba2 plus attention backbone and was trained as the base model for later chat and reasoning post-training.

Model summary

  • Family: Metis
  • Stage: Base pretraining checkpoint exported for Hugging Face
  • Parameters: 201,490,560
  • Architecture: Mamba2-attention hybrid decoder
  • Context length: 4096
  • Vocabulary size: 8192
  • Dtype: bfloat16 weights

Architecture

Metis-1.3 uses 28 decoder blocks:

  • 21 Mamba2 blocks
  • 7 attention blocks at layers 3, 7, 11, 15, 19, 23, and 27
  • Grouped-query attention with 18 query heads and 6 KV heads
  • Tied input and output embeddings
  • RMSNorm

This release is intended as the raw pretrained base for further instruction tuning and reasoning tuning.

Training

  • Pretraining target: 12B train tokens
  • Tokenizer: custom 8k tokenizer trained on an 8M-document sample
  • Intended data mix: English-first web, educational, math, and code-heavy mixture

Intended use

This checkpoint is mainly useful for:

  • research and experimentation on small hybrid Mamba models
  • base-model comparison against the chat and think variants
  • continued fine-tuning or alignment work

It is not the most user-friendly conversational variant. For direct assistant use, the chat or think releases are likely better starting points.

Limitations

  • This is a small model and will still make factual, reasoning, and instruction-following mistakes.
  • The eval setup used in this project is lightweight and should be treated as a smoke test rather than a comprehensive benchmark.
  • English is the intended primary language.

Files

  • model.safetensors
  • config.json
  • generation_config.json
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json

License

This release inherits the licensing and attribution obligations of the upstream training data sources used in the Metis pipeline. Review dataset licenses and usage constraints before production use.