# Metis-1.3 Base Metis-1.3 Base is a 201M-parameter English-first decoder model from the Metis family. It uses a hybrid Mamba2 plus attention backbone and was trained as the base model for later chat and reasoning post-training. ## Model summary - Family: Metis - Stage: Base pretraining checkpoint exported for Hugging Face - Parameters: 201,490,560 - Architecture: Mamba2-attention hybrid decoder - Context length: 4096 - Vocabulary size: 8192 - Dtype: bfloat16 weights ## Architecture Metis-1.3 uses 28 decoder blocks: - 21 Mamba2 blocks - 7 attention blocks at layers 3, 7, 11, 15, 19, 23, and 27 - Grouped-query attention with 18 query heads and 6 KV heads - Tied input and output embeddings - RMSNorm This release is intended as the raw pretrained base for further instruction tuning and reasoning tuning. ## Training - Pretraining target: 12B train tokens - Tokenizer: custom 8k tokenizer trained on an 8M-document sample - Intended data mix: English-first web, educational, math, and code-heavy mixture ## Intended use This checkpoint is mainly useful for: - research and experimentation on small hybrid Mamba models - base-model comparison against the chat and think variants - continued fine-tuning or alignment work It is not the most user-friendly conversational variant. For direct assistant use, the chat or think releases are likely better starting points. ## Limitations - This is a small model and will still make factual, reasoning, and instruction-following mistakes. - The eval setup used in this project is lightweight and should be treated as a smoke test rather than a comprehensive benchmark. - English is the intended primary language. ## Files - `model.safetensors` - `config.json` - `generation_config.json` - `tokenizer.json` - `tokenizer_config.json` - `special_tokens_map.json` ## License This release inherits the licensing and attribution obligations of the upstream training data sources used in the Metis pipeline. Review dataset licenses and usage constraints before production use.