jbomdev
/

AlterEgo_raw

Text Generation

Model card Files Files and versions

AlterEgo_raw / README.md

jbomdev's picture

Update README.md

5fd0c78 verified 10 days ago

|

History Blame Contribute Delete

1.65 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- from-scratch
	- raw-checkpoint
	datasets:
	- HuggingFaceFW/fineweb-edu
	- HuggingFaceH4/ultrachat_200k
	---

	# AlterEgo-373M - raw checkpoint

	This repository holds the original AlterEgo checkpoint, in the model's own from-scratch architecture - i.e. the model exactly as it was trained, before any format conversion.

	- Want to just use the model? Use the Hugging Face / `transformers`-native version instead: [`jbomdev/AlterEgo`](https://huggingface.co/jbomdev/AlterEgo). It's a numerically-lossless conversion of this checkpoint to `LlamaForCausalLM` (verified, max logit difference ~1e-6), and works out of the box with `transformers`, vLLM, and GGUF tooling. Full architecture, training details, benchmarks, and usage are documented on that model card.
	- Want the raw weights / the original architecture? That's this repo. The checkpoint is a PyTorch state dict saved under the `"model"` key. Load and run it with the model definition and inference code in the training repo: [github.com/J-bom/AlterEgo](https://github.com/J-bom/AlterEgo).

	In short: `alterego_raw` is the original; [`jbomdev/AlterEgo`](https://huggingface.co/jbomdev/AlterEgo) is the converted, ready-to-use version.

	## Model summary

	A 373M-parameter, decoder-only transformer (Llama-style: GQA, RoPE, SwiGLU, RMSNorm) pre-trained from scratch on ~10B tokens of FineWeb-Edu and instruction-tuned on UltraChat-200K (ChatML). See the [main model card](https://huggingface.co/jbomdev/AlterEgo) for architecture, training curves, hyperparameters, evaluation, and limitations.