File size: 1,651 Bytes

dc405ad
 
097d934
 
 
 
 
 
 
 
 
dc405ad
097d934
 
 
 
 
5fd0c78
097d934
 
5fd0c78
097d934
 
 
5fd0c78

---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- from-scratch
- raw-checkpoint
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceH4/ultrachat_200k
---

# AlterEgo-373M - raw checkpoint

This repository holds the **original** AlterEgo checkpoint, in the model's **own from-scratch architecture** - i.e. the model exactly as it was trained, before any format conversion.

- **Want to just use the model?** Use the Hugging Face / `transformers`-native version instead: **[`jbomdev/AlterEgo`](https://huggingface.co/jbomdev/AlterEgo)**. It's a **numerically-lossless conversion** of this checkpoint to `LlamaForCausalLM` (verified, max logit difference ~1e-6), and works out of the box with `transformers`, vLLM, and GGUF tooling. Full architecture, training details, benchmarks, and usage are documented on that model card.
- **Want the raw weights / the original architecture?** That's this repo. The checkpoint is a PyTorch state dict saved under the `"model"` key. Load and run it with the model definition and inference code in the training repo: **[github.com/J-bom/AlterEgo](https://github.com/J-bom/AlterEgo)**.

In short: **`alterego_raw` is the original; [`jbomdev/AlterEgo`](https://huggingface.co/jbomdev/AlterEgo) is the converted, ready-to-use version.**

## Model summary

A 373M-parameter, decoder-only transformer (Llama-style: GQA, RoPE, SwiGLU, RMSNorm) pre-trained from scratch on ~10B tokens of FineWeb-Edu and instruction-tuned on UltraChat-200K (ChatML). See the [main model card](https://huggingface.co/jbomdev/AlterEgo) for architecture, training curves, hyperparameters, evaluation, and limitations.