YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Metis-1.3 Base
Metis-1.3 Base is a 201M-parameter English-first decoder model from the Metis family. It uses a hybrid Mamba2 plus attention backbone and was trained as the base model for later chat and reasoning post-training.
Model summary
- Family: Metis
- Stage: Base pretraining checkpoint exported for Hugging Face
- Parameters: 201,490,560
- Architecture: Mamba2-attention hybrid decoder
- Context length: 4096
- Vocabulary size: 8192
- Dtype: bfloat16 weights
Architecture
Metis-1.3 uses 28 decoder blocks:
- 21 Mamba2 blocks
- 7 attention blocks at layers 3, 7, 11, 15, 19, 23, and 27
- Grouped-query attention with 18 query heads and 6 KV heads
- Tied input and output embeddings
- RMSNorm
This release is intended as the raw pretrained base for further instruction tuning and reasoning tuning.
Training
- Pretraining target: 12B train tokens
- Tokenizer: custom 8k tokenizer trained on an 8M-document sample
- Intended data mix: English-first web, educational, math, and code-heavy mixture
Intended use
This checkpoint is mainly useful for:
- research and experimentation on small hybrid Mamba models
- base-model comparison against the chat and think variants
- continued fine-tuning or alignment work
It is not the most user-friendly conversational variant. For direct assistant use, the chat or think releases are likely better starting points.
Limitations
- This is a small model and will still make factual, reasoning, and instruction-following mistakes.
- The eval setup used in this project is lightweight and should be treated as a smoke test rather than a comprehensive benchmark.
- English is the intended primary language.
Files
model.safetensorsconfig.jsongeneration_config.jsontokenizer.jsontokenizer_config.jsonspecial_tokens_map.json
License
This release inherits the licensing and attribution obligations of the upstream training data sources used in the Metis pipeline. Review dataset licenses and usage constraints before production use.
- Downloads last month
- 24