| # Metis-1.3 Base |
|
|
| Metis-1.3 Base is a 201M-parameter English-first decoder model from the Metis family. It uses a hybrid Mamba2 plus attention backbone and was trained as the base model for later chat and reasoning post-training. |
|
|
| ## Model summary |
|
|
| - Family: Metis |
| - Stage: Base pretraining checkpoint exported for Hugging Face |
| - Parameters: 201,490,560 |
| - Architecture: Mamba2-attention hybrid decoder |
| - Context length: 4096 |
| - Vocabulary size: 8192 |
| - Dtype: bfloat16 weights |
|
|
| ## Architecture |
|
|
| Metis-1.3 uses 28 decoder blocks: |
|
|
| - 21 Mamba2 blocks |
| - 7 attention blocks at layers 3, 7, 11, 15, 19, 23, and 27 |
| - Grouped-query attention with 18 query heads and 6 KV heads |
| - Tied input and output embeddings |
| - RMSNorm |
|
|
| This release is intended as the raw pretrained base for further instruction tuning and reasoning tuning. |
|
|
| ## Training |
|
|
| - Pretraining target: 12B train tokens |
| - Tokenizer: custom 8k tokenizer trained on an 8M-document sample |
| - Intended data mix: English-first web, educational, math, and code-heavy mixture |
|
|
| ## Intended use |
|
|
| This checkpoint is mainly useful for: |
|
|
| - research and experimentation on small hybrid Mamba models |
| - base-model comparison against the chat and think variants |
| - continued fine-tuning or alignment work |
|
|
| It is not the most user-friendly conversational variant. For direct assistant use, the chat or think releases are likely better starting points. |
|
|
| ## Limitations |
|
|
| - This is a small model and will still make factual, reasoning, and instruction-following mistakes. |
| - The eval setup used in this project is lightweight and should be treated as a smoke test rather than a comprehensive benchmark. |
| - English is the intended primary language. |
|
|
| ## Files |
|
|
| - `model.safetensors` |
| - `config.json` |
| - `generation_config.json` |
| - `tokenizer.json` |
| - `tokenizer_config.json` |
| - `special_tokens_map.json` |
|
|
| ## License |
|
|
| This release inherits the licensing and attribution obligations of the upstream training data sources used in the Metis pipeline. Review dataset licenses and usage constraints before production use. |
|
|