metadata
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- from-scratch
- raw-checkpoint
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceH4/ultrachat_200k
AlterEgo-373M - raw checkpoint
This repository holds the original AlterEgo checkpoint, in the model's own from-scratch architecture - i.e. the model exactly as it was trained, before any format conversion.
- Want to just use the model? Use the Hugging Face /
transformers-native version instead:jbomdev/AlterEgo. It's a numerically-lossless conversion of this checkpoint toLlamaForCausalLM(verified, max logit difference ~1e-6), and works out of the box withtransformers, vLLM, and GGUF tooling. Full architecture, training details, benchmarks, and usage are documented on that model card. - Want the raw weights / the original architecture? That's this repo. The checkpoint is a PyTorch state dict saved under the
"model"key. Load and run it with the model definition and inference code in the training repo: github.com/J-bom/AlterEgo.
In short: alterego_raw is the original; jbomdev/AlterEgo is the converted, ready-to-use version.
Model summary
A 373M-parameter, decoder-only transformer (Llama-style: GQA, RoPE, SwiGLU, RMSNorm) pre-trained from scratch on ~10B tokens of FineWeb-Edu and instruction-tuned on UltraChat-200K (ChatML). See the main model card for architecture, training curves, hyperparameters, evaluation, and limitations.