README.md · Lernex/Metis-1.4-base at main

Metis-1.4-base / README.md

GiuliannoV

Refresh Metis-1.4 base corrected release

c9e1571 verified 24 days ago

preview code

raw

history blame contribute delete

3.77 kB

	---
	language:
	- en
	library_name: pytorch
	tags:
	- metis
	- lernex
	- causal-lm
	- base-model
	- education
	- reasoning
	- mixture-of-recursion
	- custom-code
	pipeline_tag: text-generation
	base_model: []
	---

	# Metis-1.4 Base

	The model that never quit.

	Metis-1.4 Base is a compact ~504M-parameter research language model from Lernex, built as a step toward the Metis line of efficient learning, reasoning, and tutoring models.

	This upload replaces the earlier experimental Metis-1.4 base artifact with the corrected current base export. The earlier run used an incorrect objective during training; this revision comes from the repaired pipeline using standard next-token prediction, the optimized H100 dense pretraining path, and sequence-level static MoR continued pretraining.

	## What This Release Is

	This is the base checkpoint. It is not the final chat or thinking model.

	Use it as:

	- a research base for continued training and post-training experiments
	- a compact model for studying Lernex's Metis architecture direction
	- a foundation checkpoint for the Metis-1.4 chat and thinking releases

	The post-trained Chat SFT, Reasoning SFT, reward, Chat DPO, and Think DPO stages are still part of the full Metis-1.4 pipeline.

	## Architecture

	Metis-1.4 Base uses a custom Metis MoR decoder stack:

	\| Field \| Value \|
	\|---\|---:\|
	\| Parameters \| ~503.8M \|
	\| Context length \| 1024 tokens \|
	\| Layers \| 19 shared transformer layers \|
	\| Hidden size \| 1536 \|
	\| Attention heads \| 24 \|
	\| KV heads \| 8 \|
	\| Head dim \| 64 \|
	\| Vocab size \| 16,384 \|
	\| Activation \| SwiGLU \|
	\| Weight dtype \| BF16 export \|
	\| MoR max depth \| 3 \|
	\| Effective max layer count \| 57 \|

	## Training Notes

	The current base was trained with:

	- repaired next-token prediction objective
	- optimized H100 pretraining stack
	- fused dense transformer path improvements
	- static dense base pretraining
	- sequence-level static MoR during continued pretraining
	- exported BF16 weights in `safetensors`

	The final CPT checkpoint ended with validation loss around `2.4341` and perplexity around `11.41` on the continued-pretraining validation split. This number is not directly comparable to instruction or benchmark performance; it is primarily a training-health metric for the base/CPT mixture.

	## Files

	- `model.safetensors` - exported base weights
	- `config.json` - Metis architecture/config metadata
	- `generation_config.json` - basic generation defaults
	- `tokenizer.json` - tokenizer
	- `tokenizer_config.json` - tokenizer metadata
	- `special_tokens_map.json` - tokenizer special token ids

	## Important Compatibility Note

	Metis-1.4 uses a custom architecture: `metis_mor_transformer` / `MetisMoRLMHeadModel`.

	This repository contains the weights and config, but loading requires the Metis runtime/model code from Lernex's training stack or an adapter that implements the same architecture. It is not intended to be a drop-in vanilla Transformers architecture checkpoint yet.

	## Status

	This is a research release from an active training run. The base is being shared early so others can inspect and experiment with the corrected model artifact while the post-training pipeline continues.

	## Intended Use

	Metis-1.4 Base is intended for research, evaluation, and downstream training. It is not instruction tuned and should not be treated as an aligned assistant. For interactive use, prefer the post-trained Metis-1.4 chat/think checkpoints once released.

	## About Lernex

	Lernex is building learning systems that adapt around the learner: tutoring, practice, explanations, memory, and model research shaped around education. Metis-1.4 is a pivotal step in the Metis research line toward a compact, efficient model stack that can be trained, inspected, deployed, and improved end to end.