Instructions to use jbomdev/AlterEgo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jbomdev/AlterEgo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jbomdev/AlterEgo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jbomdev/AlterEgo")
model = AutoModelForCausalLM.from_pretrained("jbomdev/AlterEgo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jbomdev/AlterEgo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jbomdev/AlterEgo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jbomdev/AlterEgo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jbomdev/AlterEgo

SGLang

How to use jbomdev/AlterEgo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jbomdev/AlterEgo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jbomdev/AlterEgo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jbomdev/AlterEgo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jbomdev/AlterEgo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jbomdev/AlterEgo with Docker Model Runner:
```
docker model run hf.co/jbomdev/AlterEgo
```

AlterEgo / README.md

jbomdev

Update README.md

3f17d90 verified 6 days ago

preview code

Raw

History Blame Contribute Delete

10 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- text-generation
	- causal-lm
	- from-scratch
	- llama
	- grouped-query-attention
	- rope
	- swiglu
	- chatml
	datasets:
	- HuggingFaceFW/fineweb-edu
	- HuggingFaceH4/ultrachat_200k
	model-index:
	- name: AlterEgo-373M
	results:
	- task: {type: text-generation}
	dataset: {name: lambada_openai, type: lambada_openai}
	metrics: [{type: acc, value: 0.3161}]
	- task: {type: text-generation}
	dataset: {name: hellaswag, type: hellaswag}
	metrics: [{type: acc_norm, value: 0.38}]
	- task: {type: text-generation}
	dataset: {name: arc_easy, type: arc_easy}
	metrics: [{type: acc_norm, value: 0.5269}]
	- task: {type: text-generation}
	dataset: {name: arc_challenge, type: arc_challenge}
	metrics: [{type: acc_norm, value: 0.273}]
	- task: {type: text-generation}
	dataset: {name: piqa, type: piqa}
	metrics: [{type: acc_norm, value: 0.6567}]
	- task: {type: text-generation}
	dataset: {name: winogrande, type: winogrande}
	metrics: [{type: acc, value: 0.513}]
	- task: {type: text-generation}
	dataset: {name: openbookqa, type: openbookqa}
	metrics: [{type: acc_norm, value: 0.322}]
	- task: {type: text-generation}
	dataset: {name: sciq, type: sciq}
	metrics: [{type: acc_norm, value: 0.722}]
	- task: {type: text-generation}
	dataset: {name: boolq, type: boolq}
	metrics: [{type: acc, value: 0.6177}]
	---

	<div align="center">

	# 🧠 AlterEgo-373M

	A 373-million-parameter language model designed, trained, and served entirely from scratch.

	[![Code](https://img.shields.io/badge/GitHub-AlterEgo%20(training)-181717?logo=github)](https://github.com/J-bom/AlterEgo)
	[![Platform](https://img.shields.io/badge/GitHub-LLME%20(platform)-181717?logo=github)](https://github.com/J-bom/LLME)
	[![Architecture](https://img.shields.io/badge/arch-Llama--style-blue)]()
	[![Params](https://img.shields.io/badge/params-373M-green)]()
	[![support](https://img.shields.io/badge/Also%20supports-GGUF-orange)](https://huggingface.co/jbomdev/AlterEgo-GGUF)

	</div>

	---

	## Introduction

	AlterEgo is a small, decoder-only language model built from the ground up - not a fine-tune of an existing model. Every part was written from zero: the transformer architecture, the training loop, the tokenizer wiring, and the KV-cached inference engine. It was pre-trained on ~10B tokens of high-quality educational web text and then instruction-tuned for chat.

	It is the model at the heart of [LLME](https://github.com/J-bom/LLME), a self-hosted, end-to-end-encrypted LLM platform (think LM Studio / Open WebUI / Ollama, also built from scratch). LLME can serve AlterEgo alongside `llama.cpp` GGUF models and the Gemini API; AlterEgo is the "house" model it was designed around.

	This repository contains the model. The training and architecture code lives in the [AlterEgo repo](https://github.com/J-bom/AlterEgo); the serving platform lives in the [LLME repo](https://github.com/J-bom/LLME).

	> Two formats are published. This repo is the Hugging Face `LlamaForCausalLM` conversion, for drop-in use with `transformers`, vLLM, and GGUF tooling. The original checkpoint - in AlterEgo's own from-scratch architecture, exactly as trained - is published separately as [`jbomdev/alterego_raw`](https://huggingface.co/jbomdev/AlterEgo_raw). This version is a numerically-lossless conversion of it (verified: max logit difference ~1e-6).

	> What it is and isn't. AlterEgo is a research / learning artifact - a demonstration of the full modern LLM pipeline (architecture → pretraining → SFT → serving) at a scale one person can train on a single GPU. It is not a production assistant and won't compete with billion-parameter models. See [Limitations](#limitations).

	## Architecture

	A modern Llama-style decoder (and, thanks to that, it loads as a standard `LlamaForCausalLM`).

	\| Component \| Choice \|
	\|---\|---\|
	\| Type \| Decoder-only transformer (autoregressive) \|
	\| Parameters \| ~373M (input/output embeddings tied) \|
	\| Layers \| 24 \|
	\| Model dimension \| 1024 \|
	\| Attention \| Grouped-Query Attention - 16 query heads / 4 KV heads (head dim 64) \|
	\| Positional encoding \| Rotary embeddings (RoPE), θ = 10,000 \|
	\| Normalization \| RMSNorm (pre-norm) \|
	\| Feed-forward \| SwiGLU, hidden dim 2816 \|
	\| Context length \| 2048 \|
	\| Vocabulary \| 100,352 \|
	\| Tokenizer \| `cl100k_base` (tiktoken) extended with ChatML special tokens \|

	## Training

	AlterEgo was trained in two stages on a single NVIDIA RTX 4090.

	### Stage 1 - Pretraining

	Pre-trained on [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (HuggingFaceFW), a quality-filtered educational subset of CommonCrawl.

	![Pretraining loss](assets/pretraining_loss.png)

	![Training dynamics](assets/training_dynamics.png)

	The grad-norm settling to ~0.26 and the smooth cosine-shaped loss indicate stable training with no divergence.

	### Stage 2 - Supervised fine-tuning

	Instruction-tuned on [UltraChat-200K](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) (HuggingFaceH4), formatted as multi-turn ChatML.

	![SFT loss](assets/sft_loss.png)

	### Hyperparameters

	\| \| Pretraining \| SFT \|
	\|---\|---\|---\|
	\| Dataset \| FineWeb-Edu \| UltraChat-200K \|
	\| Tokens / steps \| ~10B / 19,073 \| ~64M / 244 \|
	\| Global batch \| 524,288 tokens (micro 2 × 2048 × 128 grad-accum) \| same scheme \|
	\| Optimizer \| AdamW (β = 0.9, 0.95; ε = 1e-8; fused) \| same \|
	\| Weight decay \| 0.1 (decoupled; excluded from norms/biases) \| same \|
	\| LR schedule \| linear warmup (1,900 steps) → cosine decay \| cosine \|
	\| Peak / min LR \| 3e-4 → 3e-5 \| low (fine-tune range) \|
	\| Grad clipping \| global-norm 1.0 \| 1.0 \|
	\| Precision \| bfloat16 autocast \| bfloat16 \|
	\| Throughput / wall-clock \| ~32k tok/s · ~86 GPU-h (3.6 days) \| ~39k tok/s · ~28 min \|
	\| Other \| `torch.compile`, gradient checkpointing, FlashAttention (SDPA) \| same \|
	\| Final loss (train / val) \| 2.94 / 2.89 \| 1.83 / 1.81 \|

	## Evaluation

	Benchmarked with [EleutherAI's lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (0-shot).

	\| Benchmark \| Metric \| AlterEgo-373M \| Random \|
	\|---\|---\|---\|---\|
	\| lambada_openai \| acc \| 31.6% \| ~0% \|
	\| hellaswag \| acc_norm \| 38.0% \| 25% \|
	\| arc_easy \| acc_norm \| 52.7% \| 25% \|
	\| arc_challenge \| acc_norm \| 27.3% \| 25% \|
	\| piqa \| acc_norm \| 65.7% \| 50% \|
	\| winogrande \| acc \| 51.3% \| 50% \|
	\| openbookqa \| acc_norm \| 32.2% \| 25% \|
	\| sciq \| acc_norm \| 72.2% \| 25% \|
	\| boolq \| acc \| 61.8% \| 50% \|

	For a 373M model trained on ~10B tokens these are solid: clearly above chance on science and commonsense (SciQ, PIQA, BoolQ, ARC-easy, HellaSwag) and on next-word prediction (LAMBADA — perplexity 62.3), with the expected near-chance results on the hardest reasoning sets (ARC-challenge, WinoGrande).

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	tok = AutoTokenizer.from_pretrained("jbomdev/AlterEgo")
	model = AutoModelForCausalLM.from_pretrained("jbomdev/AlterEgo", torch_dtype=torch.bfloat16)

	messages = [
	{"role": "system", "content":
	"You are Alter Ego, a small AI built from scratch. You're casual and direct. "
	"You're not great with facts, math, or current events - when you don't know "
	"something, just say so. You're better at chatting than at answering questions."},
	{"role": "user", "content": "Tell me something interesting about the ocean."},
	]
	ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")

	out = model.generate(
	ids,
	max_new_tokens=200,
	do_sample=True,
	temperature=0.7,
	top_k=50,
	top_p=1.0,
	repetition_penalty=1.1,
	)
	print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
	```


	### Recommended generation settings

	These are the defaults AlterEgo was tuned and served with in LLME:

	\| Parameter \| Value \|
	\|---\|---\|
	\| `temperature` \| 0.7 \|
	\| `top_k` \| 50 \|
	\| `top_p` \| 1.0 \|
	\| `repetition_penalty` \| 1.1 \|
	\| `max_new_tokens` \| 200 \|

	Lower the temperature toward 0.3–0.5 for steadier, more focused replies; it stops on `<\|im_end\|>` or `<\|endoftext\|>`.

	### Chat format

	AlterEgo uses ChatML:

	```
	<\|im_start\|>system
	{system prompt}<\|im_end\|>
	<\|im_start\|>user
	{message}<\|im_end\|>
	<\|im_start\|>assistant
	```

	### Run it locally (GGUF)

	Feel free to use my pre-made GGUF's and quants by visiting [The GGUF's and quants page](https://huggingface.co/jbomdev/AlterEgo-GGUF).
	Or running the model with [ollama](https://ollama.com/jbomdev/alterego).

	Also, Because it's standard Llama format, you can convert to GGUF for Ollama / LM Studio / llama.cpp yourself:

	```bash
	python llama.cpp/convert_hf_to_gguf.py ./AlterEgo --outfile alterego-f16.gguf --outtype f16
	```





	## Limitations

	AlterEgo is a 373M-parameter model trained on a modest token budget, and it behaves like one:

	- Capability - it can be factually wrong, repeat itself, and lose coherence on long or complex prompts. By its own (default) system prompt, it is "better at chatting than at answering questions."
	- Language - English only.
	- Safety - it is not safety- or preference-tuned (no RLHF/DPO). It can produce incorrect, biased, or undesirable content and must not be deployed in user-facing settings without additional safeguards.
	- Bias - it inherits biases from FineWeb-Edu (web text) and UltraChat.

	## License

	Released under the Apache 2.0 license. Training data is governed by the respective licenses of FineWeb-Edu and UltraChat-200K.

	## Citation

	```bibtex
	@misc{alterego2026,
	title = {AlterEgo: A 373M language model trained from scratch},
	author = {J-bom},
	year = {2026},
	url = {https://github.com/J-bom/AlterEgo}
	}
	```

	Credits - datasets: FineWeb-Edu (HuggingFaceFW), UltraChat-200K (HuggingFaceH4). Architecture follows the modern Llama-style design (RoPE, GQA, SwiGLU, RMSNorm); implementation, training, and serving by the author.