YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Metis-1.3 Base

Metis-1.3 Base is a 201M-parameter English-first decoder model from the Metis family. It uses a hybrid Mamba2 plus attention backbone and was trained as the base model for later chat and reasoning post-training.

Model summary

  • Family: Metis
  • Stage: Base pretraining checkpoint exported for Hugging Face
  • Parameters: 201,490,560
  • Architecture: Mamba2-attention hybrid decoder
  • Context length: 4096
  • Vocabulary size: 8192
  • Dtype: bfloat16 weights

Architecture

Metis-1.3 uses 28 decoder blocks:

  • 21 Mamba2 blocks
  • 7 attention blocks at layers 3, 7, 11, 15, 19, 23, and 27
  • Grouped-query attention with 18 query heads and 6 KV heads
  • Tied input and output embeddings
  • RMSNorm

This release is intended as the raw pretrained base for further instruction tuning and reasoning tuning.

Training

  • Pretraining target: 12B train tokens
  • Tokenizer: custom 8k tokenizer trained on an 8M-document sample
  • Intended data mix: English-first web, educational, math, and code-heavy mixture

Intended use

This checkpoint is mainly useful for:

  • research and experimentation on small hybrid Mamba models
  • base-model comparison against the chat and think variants
  • continued fine-tuning or alignment work

It is not the most user-friendly conversational variant. For direct assistant use, the chat or think releases are likely better starting points.

Limitations

  • This is a small model and will still make factual, reasoning, and instruction-following mistakes.
  • The eval setup used in this project is lightweight and should be treated as a smoke test rather than a comprehensive benchmark.
  • English is the intended primary language.

Files

  • model.safetensors
  • config.json
  • generation_config.json
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json

License

This release inherits the licensing and attribution obligations of the upstream training data sources used in the Metis pipeline. Review dataset licenses and usage constraints before production use.

Downloads last month
24
Safetensors
Model size
0.2B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support