--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - from-scratch - raw-checkpoint datasets: - HuggingFaceFW/fineweb-edu - HuggingFaceH4/ultrachat_200k --- # AlterEgo-373M - raw checkpoint This repository holds the **original** AlterEgo checkpoint, in the model's **own from-scratch architecture** - i.e. the model exactly as it was trained, before any format conversion. - **Want to just use the model?** Use the Hugging Face / `transformers`-native version instead: **[`jbomdev/AlterEgo`](https://huggingface.co/jbomdev/AlterEgo)**. It's a **numerically-lossless conversion** of this checkpoint to `LlamaForCausalLM` (verified, max logit difference ~1e-6), and works out of the box with `transformers`, vLLM, and GGUF tooling. Full architecture, training details, benchmarks, and usage are documented on that model card. - **Want the raw weights / the original architecture?** That's this repo. The checkpoint is a PyTorch state dict saved under the `"model"` key. Load and run it with the model definition and inference code in the training repo: **[github.com/J-bom/AlterEgo](https://github.com/J-bom/AlterEgo)**. In short: **`alterego_raw` is the original; [`jbomdev/AlterEgo`](https://huggingface.co/jbomdev/AlterEgo) is the converted, ready-to-use version.** ## Model summary A 373M-parameter, decoder-only transformer (Llama-style: GQA, RoPE, SwiGLU, RMSNorm) pre-trained from scratch on ~10B tokens of FineWeb-Edu and instruction-tuned on UltraChat-200K (ChatML). See the [main model card](https://huggingface.co/jbomdev/AlterEgo) for architecture, training curves, hyperparameters, evaluation, and limitations.