Update README.md

5fd0c78 verified 8 days ago

1.65 kB

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
tags:
  - from-scratch
  - raw-checkpoint
datasets:
  - HuggingFaceFW/fineweb-edu
  - HuggingFaceH4/ultrachat_200k

AlterEgo-373M - raw checkpoint

This repository holds the original AlterEgo checkpoint, in the model's own from-scratch architecture - i.e. the model exactly as it was trained, before any format conversion.

Want to just use the model? Use the Hugging Face / transformers-native version instead: jbomdev/AlterEgo. It's a numerically-lossless conversion of this checkpoint to LlamaForCausalLM (verified, max logit difference ~1e-6), and works out of the box with transformers, vLLM, and GGUF tooling. Full architecture, training details, benchmarks, and usage are documented on that model card.
Want the raw weights / the original architecture? That's this repo. The checkpoint is a PyTorch state dict saved under the "model" key. Load and run it with the model definition and inference code in the training repo: github.com/J-bom/AlterEgo.

In short: alterego_raw is the original; jbomdev/AlterEgo is the converted, ready-to-use version.

Model summary

A 373M-parameter, decoder-only transformer (Llama-style: GQA, RoPE, SwiGLU, RMSNorm) pre-trained from scratch on ~10B tokens of FineWeb-Edu and instruction-tuned on UltraChat-200K (ChatML). See the main model card for architecture, training curves, hyperparameters, evaluation, and limitations.