| --- |
| language: |
| - en |
| library_name: pytorch |
| tags: |
| - metis |
| - lernex |
| - causal-lm |
| - base-model |
| - education |
| - reasoning |
| - mixture-of-recursion |
| - custom-code |
| pipeline_tag: text-generation |
| base_model: [] |
| --- |
| |
| # Metis-1.4 Base |
|
|
| **The model that never quit.** |
|
|
| Metis-1.4 Base is a compact ~504M-parameter research language model from **Lernex**, built as a step toward the Metis line of efficient learning, reasoning, and tutoring models. |
|
|
| This upload replaces the earlier experimental Metis-1.4 base artifact with the corrected current base export. The earlier run used an incorrect objective during training; this revision comes from the repaired pipeline using standard next-token prediction, the optimized H100 dense pretraining path, and sequence-level static MoR continued pretraining. |
|
|
| ## What This Release Is |
|
|
| This is the **base checkpoint**. It is not the final chat or thinking model. |
|
|
| Use it as: |
|
|
| - a research base for continued training and post-training experiments |
| - a compact model for studying Lernex's Metis architecture direction |
| - a foundation checkpoint for the Metis-1.4 chat and thinking releases |
|
|
| The post-trained Chat SFT, Reasoning SFT, reward, Chat DPO, and Think DPO stages are still part of the full Metis-1.4 pipeline. |
|
|
| ## Architecture |
|
|
| Metis-1.4 Base uses a custom Metis MoR decoder stack: |
|
|
| | Field | Value | |
| |---|---:| |
| | Parameters | ~503.8M | |
| | Context length | 1024 tokens | |
| | Layers | 19 shared transformer layers | |
| | Hidden size | 1536 | |
| | Attention heads | 24 | |
| | KV heads | 8 | |
| | Head dim | 64 | |
| | Vocab size | 16,384 | |
| | Activation | SwiGLU | |
| | Weight dtype | BF16 export | |
| | MoR max depth | 3 | |
| | Effective max layer count | 57 | |
|
|
| ## Training Notes |
|
|
| The current base was trained with: |
|
|
| - repaired **next-token prediction** objective |
| - optimized H100 pretraining stack |
| - fused dense transformer path improvements |
| - static dense base pretraining |
| - sequence-level static MoR during continued pretraining |
| - exported BF16 weights in `safetensors` |
|
|
| The final CPT checkpoint ended with validation loss around `2.4341` and perplexity around `11.41` on the continued-pretraining validation split. This number is not directly comparable to instruction or benchmark performance; it is primarily a training-health metric for the base/CPT mixture. |
|
|
| ## Files |
|
|
| - `model.safetensors` - exported base weights |
| - `config.json` - Metis architecture/config metadata |
| - `generation_config.json` - basic generation defaults |
| - `tokenizer.json` - tokenizer |
| - `tokenizer_config.json` - tokenizer metadata |
| - `special_tokens_map.json` - tokenizer special token ids |
|
|
| ## Important Compatibility Note |
|
|
| Metis-1.4 uses a custom architecture: `metis_mor_transformer` / `MetisMoRLMHeadModel`. |
|
|
| This repository contains the weights and config, but loading requires the Metis runtime/model code from Lernex's training stack or an adapter that implements the same architecture. It is not intended to be a drop-in vanilla Transformers architecture checkpoint yet. |
|
|
| ## Status |
|
|
| This is a research release from an active training run. The base is being shared early so others can inspect and experiment with the corrected model artifact while the post-training pipeline continues. |
|
|
| ## Intended Use |
|
|
| Metis-1.4 Base is intended for research, evaluation, and downstream training. It is not instruction tuned and should not be treated as an aligned assistant. For interactive use, prefer the post-trained Metis-1.4 chat/think checkpoints once released. |
|
|
| ## About Lernex |
|
|
| Lernex is building learning systems that adapt around the learner: tutoring, practice, explanations, memory, and model research shaped around education. Metis-1.4 is a pivotal step in the Metis research line toward a compact, efficient model stack that can be trained, inspected, deployed, and improved end to end. |
|
|